Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpAll HF Hub posts
Post
1785
Small models, BIG impact: SmolLM is here! ๐๐ฌ
We're launching a series of small but mighty language models:
๐๏ธ Super fast - runs on laptops, phones, you name it!
๐ 3 sizes: 130M, 350M, and 1.5B parameters
๐ฅ Outperforms same size models from Meta, Microsoft, and Qwen
๐ Fully open-source: datasets, training code, models
๐๐๐ฒ ๐๐๐๐ญ๐ฎ๐ซ๐๐ฌ
- Trained on FineWeb-Edu and Cosmopedia v2 (largest synthetic pre-training dataset)
- No cloud needed - run locally for privacy and energy efficiency
- Everything is public, from data curation to training steps
๐๐จ๐ญ๐๐ง๐ญ๐ข๐๐ฅ ๐ฎ๐ฌ๐ ๐๐๐ฌ๐๐ฌ
- On-device autocomplete
- Local request parsing
- Custom fine-tuning for specific needs without the need for expensive GPUs
๐๐จ ๐๐๐๐ฉ๐๐ซ
๐ Check it out: https://huggingface.co/collections/HuggingFaceTB/smollm-models-6695016cad7167254ce15966
๐ Run the 360M model in your browser, 100 % private: HuggingFaceTB/SmolLM-360M-Instruct-WebGPU
๐ Read the blog explaining everything in detail: huggingface.co/blog/smollm
Kudos to the stellar team who worked on this project: @loubnabnl @anton-l @eliebak @lvwerra
We're launching a series of small but mighty language models:
๐๏ธ Super fast - runs on laptops, phones, you name it!
๐ 3 sizes: 130M, 350M, and 1.5B parameters
๐ฅ Outperforms same size models from Meta, Microsoft, and Qwen
๐ Fully open-source: datasets, training code, models
๐๐๐ฒ ๐๐๐๐ญ๐ฎ๐ซ๐๐ฌ
- Trained on FineWeb-Edu and Cosmopedia v2 (largest synthetic pre-training dataset)
- No cloud needed - run locally for privacy and energy efficiency
- Everything is public, from data curation to training steps
๐๐จ๐ญ๐๐ง๐ญ๐ข๐๐ฅ ๐ฎ๐ฌ๐ ๐๐๐ฌ๐๐ฌ
- On-device autocomplete
- Local request parsing
- Custom fine-tuning for specific needs without the need for expensive GPUs
๐๐จ ๐๐๐๐ฉ๐๐ซ
๐ Check it out: https://huggingface.co/collections/HuggingFaceTB/smollm-models-6695016cad7167254ce15966
๐ Run the 360M model in your browser, 100 % private: HuggingFaceTB/SmolLM-360M-Instruct-WebGPU
๐ Read the blog explaining everything in detail: huggingface.co/blog/smollm
Kudos to the stellar team who worked on this project: @loubnabnl @anton-l @eliebak @lvwerra
Post
1033
What an eventful day in Open Source LLMs today:
Mistral released Codestral Mamba ๐
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!
Model checkpoint: mistralai/mamba-codestral-7B-v0.1
Hugging Face dropped SmolLM ๐ค
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!
Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966
Mistral released Mathstral 7B โ
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license
Model checkpoint: mistralai/mathstral-7B-v0.1
Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! ๐ค
What's your favourite from the release today?
Mistral released Codestral Mamba ๐
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!
Model checkpoint: mistralai/mamba-codestral-7B-v0.1
Hugging Face dropped SmolLM ๐ค
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!
Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966
Mistral released Mathstral 7B โ
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license
Model checkpoint: mistralai/mathstral-7B-v0.1
Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! ๐ค
What's your favourite from the release today?
Post
2668
๐ฅ ๐ฅ๐ฅ
Excited to announce WizardLM new Paper: Auto Evol-Instruct!
๐ฆ Twitter: https://x.com/WizardLM_AI/status/1812857977122202087
๐ Paper: https://arxiv.org/pdf/2406.00770
๐ค 1. Fully AI-Powered Pipeline
Auto Evol-Instruct automatically involves an iterative process of optimizing an Evol-Instruct V1 into an optimal one. The pipeline consists of two critical stages: Evol Trajectory Analysis, where the optimizer LLM analyzes the issues and failures exposed in instruction evolution performed by the evol LLM, and Evolving Method Optimization, where the optimizer LLM addresses these issues to progressively develop an effective evolving method. The optimal evolving method is then used to convert the entire instruction dataset into more diverse and complex forms, facilitating improved instruction tuning.
๐2. Scaling Evol-Instruct with Arena Learning
With Auto Evol-Instruct, the evolutionary synthesis data of WizardLM-2 has scaled up from WizardLM-1 to dozens of domains, covering tasks in all aspects of large language models. This allows Arena Learning to train and learn from an almost infinite pool of high-difficulty instruction data, fully unlocking all the potential of Arena Learning.
Excited to announce WizardLM new Paper: Auto Evol-Instruct!
๐ฆ Twitter: https://x.com/WizardLM_AI/status/1812857977122202087
๐ Paper: https://arxiv.org/pdf/2406.00770
๐ค 1. Fully AI-Powered Pipeline
Auto Evol-Instruct automatically involves an iterative process of optimizing an Evol-Instruct V1 into an optimal one. The pipeline consists of two critical stages: Evol Trajectory Analysis, where the optimizer LLM analyzes the issues and failures exposed in instruction evolution performed by the evol LLM, and Evolving Method Optimization, where the optimizer LLM addresses these issues to progressively develop an effective evolving method. The optimal evolving method is then used to convert the entire instruction dataset into more diverse and complex forms, facilitating improved instruction tuning.
๐2. Scaling Evol-Instruct with Arena Learning
With Auto Evol-Instruct, the evolutionary synthesis data of WizardLM-2 has scaled up from WizardLM-1 to dozens of domains, covering tasks in all aspects of large language models. This allows Arena Learning to train and learn from an almost infinite pool of high-difficulty instruction data, fully unlocking all the potential of Arena Learning.
gokaygokay
posted an update
1 day ago
Post
1184
I've made a creative version of Tile Upscaler
- gokaygokay/TileUpscalerV2
- https://github.com/gokayfem/Tile-Upscaler
- New tiling strategy
- Now it's closer to Clarity Upscaler
- It has more parameters to play and it has more room to fail because of that
- You should try different resolutions, strength and controlnet strength
Original Tile Upscaler
- gokaygokay/Tile-Upscaler
- gokaygokay/TileUpscalerV2
- https://github.com/gokayfem/Tile-Upscaler
- New tiling strategy
- Now it's closer to Clarity Upscaler
- It has more parameters to play and it has more room to fail because of that
- You should try different resolutions, strength and controlnet strength
Original Tile Upscaler
- gokaygokay/Tile-Upscaler
Post
1970
Exciting news for audio AI enthusiasts! ๐๏ธ๐
The Emilia dataset dropped last week, and it's a cool one:
- 101k+ hours of high-quality audio
- 6 languages: ๐จ๐ณ ๐บ๐ธ ๐ฏ๐ต ๐ฐ๐ท ๐ฉ๐ช ๐ซ๐ท
- Diverse content: talk shows, interviews, debates, sports commentary, audiobooks
This dataset could improve multilingual speech generation and recognition. Opens up many possibilities for global media, language learning, and accessibility!
Explore it: amphion/Emilia
#AIAudio
The Emilia dataset dropped last week, and it's a cool one:
- 101k+ hours of high-quality audio
- 6 languages: ๐จ๐ณ ๐บ๐ธ ๐ฏ๐ต ๐ฐ๐ท ๐ฉ๐ช ๐ซ๐ท
- Diverse content: talk shows, interviews, debates, sports commentary, audiobooks
This dataset could improve multilingual speech generation and recognition. Opens up many possibilities for global media, language learning, and accessibility!
Explore it: amphion/Emilia
#AIAudio
Post
767
๐คฏ Ghost 8B Beta emerges as a clear leader, surpassing even proprietary models like xAI Grok 1, OpenAI GPT 3.5, and Mistral Mixtral 8x7B. This dominance extends to its parity with Mistral Medium, further solidifying its position as a top-tier language model. Furthermore, Ghost 8B Beta stands out as one of only three models employing the zero-shot method for evaluation, alongside Claude 2 and Claude 3, showcasing its unique capabilities and potential for groundbreaking applications.
---
๐ฌ Chat with the model here:
- Playground with Ghost 8B Beta (ฮฒ, 8k): lamhieu/ghost-8b-beta-8k
- Playground with Ghost 8B Beta (ฮฒ, 128k): lamhieu/ghost-8b-beta-128k
- Official website: https://ghost-x.org/docs/models/ghost-8b-beta/
---
๐ฌ Chat with the model here:
- Playground with Ghost 8B Beta (ฮฒ, 8k): lamhieu/ghost-8b-beta-8k
- Playground with Ghost 8B Beta (ฮฒ, 128k): lamhieu/ghost-8b-beta-128k
- Official website: https://ghost-x.org/docs/models/ghost-8b-beta/
Post
4234
It's time to switch from bge to Memex! We introduce Memex: OCR-free Visual Document Embedding Model as Your Personal Librarian.
The model only takes images as document-side inputs and produce vectors representing document pages. Memex is trained with over 200k query-visual document pairs, including textual document, visual document, arxiv figures, plots, charts, industry documents, textbooks, ebooks, and openly-available PDFs, etc. Its performance is on a par with our ablation text embedding model on text-oriented documents, and an advantages on visually-intensive documents.
Our model is capable of:
๐ Help you read a long visually-intensive or text-oriented PDF document and find the pages that answer your question.
๐ค Help you build a personal library and retireve book pages from a large collection of books.
๐คฉ It has only 2.8B parameters, and has the potential to run on your PC.
๐ต It works like human: read and comprehend with vision and remember multimodal information in hippocampus.
The model is open-sourced at RhapsodyAI/minicpm-visual-embedding-v0
Everyone is welcome to try our online demo at bokesyo/minicpm-visual-embeeding-v0-demo
The model only takes images as document-side inputs and produce vectors representing document pages. Memex is trained with over 200k query-visual document pairs, including textual document, visual document, arxiv figures, plots, charts, industry documents, textbooks, ebooks, and openly-available PDFs, etc. Its performance is on a par with our ablation text embedding model on text-oriented documents, and an advantages on visually-intensive documents.
Our model is capable of:
๐ Help you read a long visually-intensive or text-oriented PDF document and find the pages that answer your question.
๐ค Help you build a personal library and retireve book pages from a large collection of books.
๐คฉ It has only 2.8B parameters, and has the potential to run on your PC.
๐ต It works like human: read and comprehend with vision and remember multimodal information in hippocampus.
The model is open-sourced at RhapsodyAI/minicpm-visual-embedding-v0
Everyone is welcome to try our online demo at bokesyo/minicpm-visual-embeeding-v0-demo
Post
845
Micrograd in pure C๐ค
Port of Karpathy's micrograd in pure C.
Yo C does not negotiate with memory ๐
Code: https://github.com/Jaykef/micrograd.c
Port of Karpathy's micrograd in pure C.
Yo C does not negotiate with memory ๐
Code: https://github.com/Jaykef/micrograd.c
Post
921
Cool things this week from
@huggingface
!
๐AI math olympiad winner NuminaMath is here!
๐คAnnouncing New Hugging Face and Keras NLP integration
โจUI overhaul to HF tokens!
๐ง Embed our dataset viewer on any webpage!
https://huggingface.co/blog/winning-aimo-progress-prize
https://huggingface.co/blog/keras-nlp-integration
https://huggingface.co/settings/tokens
https://x.com/julien_c/status/1812099420726456457
Check out the full list on our discord! ๐
https://discord.com/invite/JfAtkvEtRb
๐AI math olympiad winner NuminaMath is here!
๐คAnnouncing New Hugging Face and Keras NLP integration
โจUI overhaul to HF tokens!
๐ง Embed our dataset viewer on any webpage!
https://huggingface.co/blog/winning-aimo-progress-prize
https://huggingface.co/blog/keras-nlp-integration
https://huggingface.co/settings/tokens
https://x.com/julien_c/status/1812099420726456457
Check out the full list on our discord! ๐
https://discord.com/invite/JfAtkvEtRb