Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

not-lain 
posted an update 2 days ago
view post
Post
3223
I am now a huggingface fellow ๐Ÿฅณ
ยท
fdaudens 
posted an update 1 day ago
view post
Post
1785
Small models, BIG impact: SmolLM is here! ๐Ÿš€๐Ÿ”ฌ

We're launching a series of small but mighty language models:
๐ŸŽ๏ธ Super fast - runs on laptops, phones, you name it!
๐Ÿ“ 3 sizes: 130M, 350M, and 1.5B parameters
๐Ÿฅ‡ Outperforms same size models from Meta, Microsoft, and Qwen
๐Ÿ”“ Fully open-source: datasets, training code, models

๐Š๐ž๐ฒ ๐Ÿ๐ž๐š๐ญ๐ฎ๐ซ๐ž๐ฌ
- Trained on FineWeb-Edu and Cosmopedia v2 (largest synthetic pre-training dataset)
- No cloud needed - run locally for privacy and energy efficiency
- Everything is public, from data curation to training steps

๐๐จ๐ญ๐ž๐ง๐ญ๐ข๐š๐ฅ ๐ฎ๐ฌ๐ž ๐œ๐š๐ฌ๐ž๐ฌ
- On-device autocomplete
- Local request parsing
- Custom fine-tuning for specific needs without the need for expensive GPUs

๐†๐จ ๐๐ž๐ž๐ฉ๐ž๐ซ
๐Ÿ‘‰ Check it out: https://huggingface.co/collections/HuggingFaceTB/smollm-models-6695016cad7167254ce15966
๐Ÿ‘‰ Run the 360M model in your browser, 100 % private: HuggingFaceTB/SmolLM-360M-Instruct-WebGPU
๐Ÿ‘‰ Read the blog explaining everything in detail: huggingface.co/blog/smollm

Kudos to the stellar team who worked on this project: @loubnabnl @anton-l @eliebak @lvwerra
reach-vb 
posted an update 1 day ago
view post
Post
1033
What an eventful day in Open Source LLMs today:

Mistral released Codestral Mamba ๐Ÿ
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!

Model checkpoint: mistralai/mamba-codestral-7B-v0.1

Hugging Face dropped SmolLM ๐Ÿค
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!

Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966

Mistral released Mathstral 7B โˆ‘
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license

Model checkpoint: mistralai/mathstral-7B-v0.1

Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! ๐Ÿค—

What's your favourite from the release today?
  • 1 reply
ยท
WizardLM 
posted an update 3 days ago
view post
Post
2668
๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ
Excited to announce WizardLM new Paper: Auto Evol-Instruct!

๐Ÿฆ Twitter: https://x.com/WizardLM_AI/status/1812857977122202087

๐Ÿ“ƒ Paper: https://arxiv.org/pdf/2406.00770

๐Ÿค– 1. Fully AI-Powered Pipeline

Auto Evol-Instruct automatically involves an iterative process of optimizing an Evol-Instruct V1 into an optimal one. The pipeline consists of two critical stages: Evol Trajectory Analysis, where the optimizer LLM analyzes the issues and failures exposed in instruction evolution performed by the evol LLM, and Evolving Method Optimization, where the optimizer LLM addresses these issues to progressively develop an effective evolving method. The optimal evolving method is then used to convert the entire instruction dataset into more diverse and complex forms, facilitating improved instruction tuning.

๐Ÿ“ˆ2. Scaling Evol-Instruct with Arena Learning

With Auto Evol-Instruct, the evolutionary synthesis data of WizardLM-2 has scaled up from WizardLM-1 to dozens of domains, covering tasks in all aspects of large language models. This allows Arena Learning to train and learn from an almost infinite pool of high-difficulty instruction data, fully unlocking all the potential of Arena Learning.
gokaygokay 
posted an update 1 day ago
fdaudens 
posted an update 3 days ago
view post
Post
1970
Exciting news for audio AI enthusiasts! ๐ŸŽ™๏ธ๐ŸŒ

The Emilia dataset dropped last week, and it's a cool one:
- 101k+ hours of high-quality audio
- 6 languages: ๐Ÿ‡จ๐Ÿ‡ณ ๐Ÿ‡บ๐Ÿ‡ธ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‡ฐ๐Ÿ‡ท ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท
- Diverse content: talk shows, interviews, debates, sports commentary, audiobooks

This dataset could improve multilingual speech generation and recognition. Opens up many possibilities for global media, language learning, and accessibility!

Explore it: amphion/Emilia

#AIAudio
lamhieu 
posted an update about 21 hours ago
view post
Post
767
๐Ÿคฏ Ghost 8B Beta emerges as a clear leader, surpassing even proprietary models like xAI Grok 1, OpenAI GPT 3.5, and Mistral Mixtral 8x7B. This dominance extends to its parity with Mistral Medium, further solidifying its position as a top-tier language model. Furthermore, Ghost 8B Beta stands out as one of only three models employing the zero-shot method for evaluation, alongside Claude 2 and Claude 3, showcasing its unique capabilities and potential for groundbreaking applications.
---
๐Ÿ’ฌ Chat with the model here:
- Playground with Ghost 8B Beta (ฮฒ, 8k): lamhieu/ghost-8b-beta-8k
- Playground with Ghost 8B Beta (ฮฒ, 128k): lamhieu/ghost-8b-beta-128k
- Official website: https://ghost-x.org/docs/models/ghost-8b-beta/
  • 1 reply
ยท
bokesyo 
posted an update 3 days ago
view post
Post
4234
It's time to switch from bge to Memex! We introduce Memex: OCR-free Visual Document Embedding Model as Your Personal Librarian.

The model only takes images as document-side inputs and produce vectors representing document pages. Memex is trained with over 200k query-visual document pairs, including textual document, visual document, arxiv figures, plots, charts, industry documents, textbooks, ebooks, and openly-available PDFs, etc. Its performance is on a par with our ablation text embedding model on text-oriented documents, and an advantages on visually-intensive documents.

Our model is capable of:

๐Ÿ˜‹ Help you read a long visually-intensive or text-oriented PDF document and find the pages that answer your question.

๐Ÿค— Help you build a personal library and retireve book pages from a large collection of books.

๐Ÿคฉ It has only 2.8B parameters, and has the potential to run on your PC.

๐Ÿต It works like human: read and comprehend with vision and remember multimodal information in hippocampus.

The model is open-sourced at RhapsodyAI/minicpm-visual-embedding-v0

Everyone is welcome to try our online demo at bokesyo/minicpm-visual-embeeding-v0-demo
Jaward 
posted an update 1 day ago
lunarflu 
posted an update 1 day ago
view post
Post
921
Cool things this week from @huggingface !

๐ŸŒŽAI math olympiad winner NuminaMath is here!
๐Ÿค—Announcing New Hugging Face and Keras NLP integration
โœจUI overhaul to HF tokens!
๐ŸงŠ Embed our dataset viewer on any webpage!

https://huggingface.co/blog/winning-aimo-progress-prize
https://huggingface.co/blog/keras-nlp-integration
https://huggingface.co/settings/tokens
https://x.com/julien_c/status/1812099420726456457

Check out the full list on our discord! ๐Ÿ‘‡
https://discord.com/invite/JfAtkvEtRb