AI & ML interests

None defined yet.

pcuenqย 
posted an update about 12 hours ago
view post
Post
551
๐Ÿ‘‰ What happened in AI in 2025? ๐Ÿ‘ˆ

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1๏ธโƒฃ Q1 โ€” Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2๏ธโƒฃ Q2 โ€” Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3๏ธโƒฃ Q3 โ€” "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4๏ธโƒฃ Q4 โ€” Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 ๐Ÿคฏ

Credits
๐Ÿ™ NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

๐Ÿซก @reach-vb for the original idea, design and recipe

๐Ÿ™Œ @ariG23498 and yours truly for compiling and verifying the 2025 edition

๐Ÿฅณ Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! ๐Ÿฅ‚
Molbapย 
posted an update 3 months ago
view post
Post
3304
๐Ÿš€ New blog: Maintain the unmaintainable โ€“ 1M+ Python LOC, 400+ models

How do you stop a million-line library built by thousands of contributors from collapsing under its own weight?
At ๐Ÿค— Transformers, we do it with explicit software-engineering tenets, principles that make the codebase hackable at scale.

๐Ÿ” Inside the post:
โ€“ One Model, One File: readability first โ€” you can still open a modeling file and see the full logic, top to bottom.
โ€“ Modular Transformers: visible inheritance that cuts maintenance cost by ~15ร— while keeping models readable.
โ€“ Config-Driven Performance: FlashAttention, tensor parallelism, and attention scheduling are config-level features, not rewrites.

Written with @lysandre ,@pcuenq and @yonigozlan , this is a deep dive into how Transformers stays fast, open, and maintainable.

Read it here โ†’ transformers-community/Transformers-tenets
ngxsonย 
posted an update 10 months ago
view post
Post
5618
A comprehensive matrix for which format should you use.

Read more on my blog post: https://huggingface.co/blog/ngxson/common-ai-model-formats

| Hardware        | GGUF      | PyTorch                | Safetensors              | ONNX  |
|-----------------|-----------|------------------------|--------------------------|-------|
| CPU             | โœ… (best) | ๐ŸŸก                      | ๐ŸŸก                       | โœ…    |
| GPU             | โœ…        | โœ…                      | โœ…                       | โœ…    |
| Mobile          | โœ…        | ๐ŸŸก (via executorch)     | โŒ                       | โœ…    |
| Apple silicon   | โœ…        | ๐ŸŸก                      | โœ… (via MLX framework)   | โœ…    |
  • 1 reply
ยท
ngxsonย 
posted an update 12 months ago
ngxsonย 
posted an update 12 months ago
view post
Post
3970
Check out my collection of pre-made GGUF LoRA adapters!

This allow you to use both normal + abliterated version of popular models like llama, qwen, etc, without having to double to amount of VRAM usage.

ngxson/gguf_lora_collection
ยท
ngxsonย 
posted an update 12 months ago
pcuenqย 
posted an update over 1 year ago
view post
Post
10264
OpenELM in Core ML

Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.

I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194

The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.

With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to float16. However, there's some precision loss somewhere and generation doesn't work in float16 mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95

I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)

Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!
ยท
Molbapย 
posted an update almost 2 years ago
view post
Post
5524
๐Ÿš€๐Ÿš€ Exciting times for the document AI community!

We're thrilled to announce the release of some of the largest OCR datasets available to the public.
๐Ÿ”ฅ With over 26 million pages , 18 billion text tokens, and 6TB of data, these resources are a significant leap forward for document AI research.

Here's how to access these datasets quickly:

from datasets import load_dataset

pdfa_dataset = load_dataset('pixparse/pdfa-eng-wds', streaming=True)
IDL_dataset = load_dataset('pixparse/idl-wds', streaming=True)

This enables you to stream them directly, integrating seamlessly with your projects using the Hugging Face datasets library. On the hub, you can find them here:

pixparse/pdfa-eng-wds
pixparse/idl-wds

For lean data loading, the new [chug](https://github.com/huggingface/chug) library offers a solution with pdf decoding:


import chug

task_cfg = chug.DataTaskDocReadCfg(
    page_sampling='all',
)
data_cfg = chug.DataCfg(
    source='pixparse/pdfa-eng-wds',
    split='train',
    batch_size=None,
    format='hfids',
    num_workers=0,
)
data_loader = chug.create_loader(
    data_cfg,
    task_cfg,
)
sample = next(iter(data_loader))



We owe a huge thank you to Peter Wyatt, Kate Tasker, Rachel Taketa, Ali Furkan Biten, Ruben Tito, and their colleagues for their contributions. Their work putting these datasets together has been invaluable. ๐Ÿค—

Looking Ahead:

We're on a mission to enhance document AI capabilities, and these datasets are just the beginning. With your engagement and innovation, we're confident in the community's ability to develop robust OCR solutions. We encourage you to explore these datasets, experiment with the code, and contribute to the collective progress in document AI.

For detailed information on usage and licensing, please refer to the dataset cards on the Hugging Face hub.
ยท