Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

not-lain

posted an update 2 days ago

Post

3223

I am now a huggingface fellow 🥳

9 replies

fdaudens

posted an update 1 day ago

Post

1785

Small models, BIG impact: SmolLM is here! 🚀🔬

We're launching a series of small but mighty language models:
🏎️ Super fast - runs on laptops, phones, you name it!
📏 3 sizes: 130M, 350M, and 1.5B parameters
🥇 Outperforms same size models from Meta, Microsoft, and Qwen
🔓 Fully open-source: datasets, training code, models

𝐊𝐞𝐲 𝐟𝐞𝐚𝐭𝐮𝐫𝐞𝐬
- Trained on FineWeb-Edu and Cosmopedia v2 (largest synthetic pre-training dataset)
- No cloud needed - run locally for privacy and energy efficiency
- Everything is public, from data curation to training steps

𝐏𝐨𝐭𝐞𝐧𝐭𝐢𝐚𝐥 𝐮𝐬𝐞 𝐜𝐚𝐬𝐞𝐬
- On-device autocomplete
- Local request parsing
- Custom fine-tuning for specific needs without the need for expensive GPUs

𝐆𝐨 𝐝𝐞𝐞𝐩𝐞𝐫
👉 Check it out: https://huggingface.co/collections/HuggingFaceTB/smollm-models-6695016cad7167254ce15966
👉 Run the 360M model in your browser, 100 % private: HuggingFaceTB/SmolLM-360M-Instruct-WebGPU
👉 Read the blog explaining everything in detail: huggingface.co/blog/smollm

Kudos to the stellar team who worked on this project: @loubnabnl @anton-l @eliebak @lvwerra

reach-vb

posted an update 1 day ago

Post

1033

What an eventful day in Open Source LLMs today:

Mistral released Codestral Mamba 🐍
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!

Model checkpoint: mistralai/mamba-codestral-7B-v0.1

Hugging Face dropped SmolLM 🤏
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!

Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966

Mistral released Mathstral 7B ∑
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license

Model checkpoint: mistralai/mathstral-7B-v0.1

Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! 🤗

What's your favourite from the release today?

1 reply

WizardLM

posted an update 3 days ago

Post

2668

🔥 🔥🔥
Excited to announce WizardLM new Paper: Auto Evol-Instruct!

🐦 Twitter: https://x.com/WizardLM_AI/status/1812857977122202087

📃 Paper: https://arxiv.org/pdf/2406.00770

🤖 1. Fully AI-Powered Pipeline

Auto Evol-Instruct automatically involves an iterative process of optimizing an Evol-Instruct V1 into an optimal one. The pipeline consists of two critical stages: Evol Trajectory Analysis, where the optimizer LLM analyzes the issues and failures exposed in instruction evolution performed by the evol LLM, and Evolving Method Optimization, where the optimizer LLM addresses these issues to progressively develop an effective evolving method. The optimal evolving method is then used to convert the entire instruction dataset into more diverse and complex forms, facilitating improved instruction tuning.

📈2. Scaling Evol-Instruct with Arena Learning

With Auto Evol-Instruct, the evolutionary synthesis data of WizardLM-2 has scaled up from WizardLM-1 to dozens of domains, covering tasks in all aspects of large language models. This allows Arena Learning to train and learn from an almost infinite pool of high-difficulty instruction data, fully unlocking all the potential of Arena Learning.

gokaygokay

posted an update 1 day ago

Post

1184

I've made a creative version of Tile Upscaler

- gokaygokay/TileUpscalerV2

- https://github.com/gokayfem/Tile-Upscaler

- New tiling strategy
- Now it's closer to Clarity Upscaler
- It has more parameters to play and it has more room to fail because of that
- You should try different resolutions, strength and controlnet strength

Original Tile Upscaler
- gokaygokay/Tile-Upscaler

fdaudens

posted an update 3 days ago

Post

1970

Exciting news for audio AI enthusiasts! 🎙️🌍

The Emilia dataset dropped last week, and it's a cool one:
- 101k+ hours of high-quality audio
- 6 languages: 🇨🇳 🇺🇸 🇯🇵 🇰🇷 🇩🇪 🇫🇷
- Diverse content: talk shows, interviews, debates, sports commentary, audiobooks

This dataset could improve multilingual speech generation and recognition. Opens up many possibilities for global media, language learning, and accessibility!

Explore it: amphion/Emilia

#AIAudio

lamhieu

posted an update about 21 hours ago

Post

767

🤯 Ghost 8B Beta emerges as a clear leader, surpassing even proprietary models like xAI Grok 1, OpenAI GPT 3.5, and Mistral Mixtral 8x7B. This dominance extends to its parity with Mistral Medium, further solidifying its position as a top-tier language model. Furthermore, Ghost 8B Beta stands out as one of only three models employing the zero-shot method for evaluation, alongside Claude 2 and Claude 3, showcasing its unique capabilities and potential for groundbreaking applications.
---
💬 Chat with the model here:
- Playground with Ghost 8B Beta (β, 8k): lamhieu/ghost-8b-beta-8k
- Playground with Ghost 8B Beta (β, 128k): lamhieu/ghost-8b-beta-128k
- Official website: https://ghost-x.org/docs/models/ghost-8b-beta/

1 reply

bokesyo

posted an update 3 days ago

Post

4234

It's time to switch from bge to Memex! We introduce Memex: OCR-free Visual Document Embedding Model as Your Personal Librarian.

The model only takes images as document-side inputs and produce vectors representing document pages. Memex is trained with over 200k query-visual document pairs, including textual document, visual document, arxiv figures, plots, charts, industry documents, textbooks, ebooks, and openly-available PDFs, etc. Its performance is on a par with our ablation text embedding model on text-oriented documents, and an advantages on visually-intensive documents.

Our model is capable of:

😋 Help you read a long visually-intensive or text-oriented PDF document and find the pages that answer your question.

🤗 Help you build a personal library and retireve book pages from a large collection of books.

🤩 It has only 2.8B parameters, and has the potential to run on your PC.

🐵 It works like human: read and comprehend with vision and remember multimodal information in hippocampus.

The model is open-sourced at RhapsodyAI/minicpm-visual-embedding-v0

Everyone is welcome to try our online demo at bokesyo/minicpm-visual-embeeding-v0-demo