Xuan Son NGUYEN’s Post

Network and Security Engineer

This is huge. We will soon be able to « just talk » to a computer like a homie, without any internet connections.

Co-founder and CSO at 🤗 Hugging Face

The Kyutai fully end-to-end audio model demo of today is a huge deal that many people missed in the room. Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice timing…) Relevant: - the model training pipeline and model archi are simple and hugely scalable, with a tiny 8+ people team like Kyutai building it in 4 months. Synthetic data is a huge enabler here - laser focus on local devices: Moshi will soon be everywhere. Frontier model builders have low incentive to let you run smaller models locally (price per token…) but non-profits like Kyutai have very different incentives. The Moshi demo is already online while the OpenAI 4o one is still in limbo. - going under 300 ms of latency while keeping Llama 8B or above quality of answers is a key enabler in terms of interactivity, it’s game changing, This feeling when the model answer your question before you even finished asking is quite crazy or when you interrupt the model while it’s talking and it react… Predictive coding in a model, instantly updated model of what you’re about to say... Basically they nailed the fundamentals. It’s here. This interactive voice tech will be everywhere. It will soon be an obvious commodity.

To view or add a comment, sign in

More Relevant Posts

Thomas Wolf

Co-founder and CSO at 🤗 Hugging Face
2w
Report this post
The Kyutai fully end-to-end audio model demo of today is a huge deal that many people missed in the room. Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice timing…) Relevant: - the model training pipeline and model archi are simple and hugely scalable, with a tiny 8+ people team like Kyutai building it in 4 months. Synthetic data is a huge enabler here - laser focus on local devices: Moshi will soon be everywhere. Frontier model builders have low incentive to let you run smaller models locally (price per token…) but non-profits like Kyutai have very different incentives. The Moshi demo is already online while the OpenAI 4o one is still in limbo. - going under 300 ms of latency while keeping Llama 8B or above quality of answers is a key enabler in terms of interactivity, it’s game changing, This feeling when the model answer your question before you even finished asking is quite crazy or when you interrupt the model while it’s talking and it react… Predictive coding in a model, instantly updated model of what you’re about to say... Basically they nailed the fundamentals. It’s here. This interactive voice tech will be everywhere. It will soon be an obvious commodity.

88 Comments
Like Comment
To view or add a comment, sign in
Pierre-Eric Jacoupy

Chief Product Officer @ Hive | MBA, Data Driven Lead
2w
Report this post
Frontier model is about doing more with less. Congrats to the Kyutai team for this audio LLM demo. This proves that using smaller compute resources for achieving same result is the way to democratize genAI while being respectful of the planet - and that’s exactly what Hive is working on with our upcoming distributed cloud approach with #hivecompute

Thomas Wolf

Co-founder and CSO at 🤗 Hugging Face
2w

The Kyutai fully end-to-end audio model demo of today is a huge deal that many people missed in the room. Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice timing…) Relevant: - the model training pipeline and model archi are simple and hugely scalable, with a tiny 8+ people team like Kyutai building it in 4 months. Synthetic data is a huge enabler here - laser focus on local devices: Moshi will soon be everywhere. Frontier model builders have low incentive to let you run smaller models locally (price per token…) but non-profits like Kyutai have very different incentives. The Moshi demo is already online while the OpenAI 4o one is still in limbo. - going under 300 ms of latency while keeping Llama 8B or above quality of answers is a key enabler in terms of interactivity, it’s game changing, This feeling when the model answer your question before you even finished asking is quite crazy or when you interrupt the model while it’s talking and it react… Predictive coding in a model, instantly updated model of what you’re about to say... Basically they nailed the fundamentals. It’s here. This interactive voice tech will be everywhere. It will soon be an obvious commodity.
Like Comment
To view or add a comment, sign in
Liam Patience

Quality Engineer | AI & Security | Lloyds Banking Group
2w Edited
Report this post
Wow Kyutai have created something amazing using only 8 people, over the course of just 6 months. While not quite #GPT4o levels, it will only ever get better over time. The weights and code will be totally open source meaning this tech will be readily available to anyone who wants to build off of it. (see press release for details: https://kyutai.org/) Lastly, the models look to run on local hardware with lighting fast latency. As Thomas Wolf states in his post below, this will eventually be baked into most apps you use today. #Kyutai #GenAI

Thomas Wolf

Co-founder and CSO at 🤗 Hugging Face
2w

The Kyutai fully end-to-end audio model demo of today is a huge deal that many people missed in the room. Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice timing…) Relevant: - the model training pipeline and model archi are simple and hugely scalable, with a tiny 8+ people team like Kyutai building it in 4 months. Synthetic data is a huge enabler here - laser focus on local devices: Moshi will soon be everywhere. Frontier model builders have low incentive to let you run smaller models locally (price per token…) but non-profits like Kyutai have very different incentives. The Moshi demo is already online while the OpenAI 4o one is still in limbo. - going under 300 ms of latency while keeping Llama 8B or above quality of answers is a key enabler in terms of interactivity, it’s game changing, This feeling when the model answer your question before you even finished asking is quite crazy or when you interrupt the model while it’s talking and it react… Predictive coding in a model, instantly updated model of what you’re about to say... Basically they nailed the fundamentals. It’s here. This interactive voice tech will be everywhere. It will soon be an obvious commodity.
Like Comment
To view or add a comment, sign in
George Brooks

Founder/CEO of Crema, a design and technology consultancy. Turning the challenges of growth into strategy, design, and technology that helps individuals and organizations thrive. Serving Built World & MSP firms.
2w
Report this post
I've sat and talked with Ai for about 45 minutes nearly a year ago. The experience was exciting, but not great. The timing was weird, the delays were annoying. I had to force the conversation in the direction I wanted. When I see demos like this, I can't help but think we are inching closer to close to a truly conversational experience. Inflection, dynamics, interruptions as it can listen and speak at the same time. We are not there yet, but I can see a future where conversational experiences become more and more realistic. Do we need this? I'm not sure. Are there ways it can be used for good? Sure. Are there ways it will lead to bad? No question. Should we keep exploring cautiously? I think so.

Thomas Wolf

Co-founder and CSO at 🤗 Hugging Face
2w

The Kyutai fully end-to-end audio model demo of today is a huge deal that many people missed in the room. Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice timing…) Relevant: - the model training pipeline and model archi are simple and hugely scalable, with a tiny 8+ people team like Kyutai building it in 4 months. Synthetic data is a huge enabler here - laser focus on local devices: Moshi will soon be everywhere. Frontier model builders have low incentive to let you run smaller models locally (price per token…) but non-profits like Kyutai have very different incentives. The Moshi demo is already online while the OpenAI 4o one is still in limbo. - going under 300 ms of latency while keeping Llama 8B or above quality of answers is a key enabler in terms of interactivity, it’s game changing, This feeling when the model answer your question before you even finished asking is quite crazy or when you interrupt the model while it’s talking and it react… Predictive coding in a model, instantly updated model of what you’re about to say... Basically they nailed the fundamentals. It’s here. This interactive voice tech will be everywhere. It will soon be an obvious commodity.
Like Comment
To view or add a comment, sign in
Naveen Gopalakrishna

Technical Specialist - AI
2w
Report this post
Competition to GPT 4o, we are moving in a direction where interactive AI will be household

Thomas Wolf

Co-founder and CSO at 🤗 Hugging Face
2w

The Kyutai fully end-to-end audio model demo of today is a huge deal that many people missed in the room. Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice timing…) Relevant: - the model training pipeline and model archi are simple and hugely scalable, with a tiny 8+ people team like Kyutai building it in 4 months. Synthetic data is a huge enabler here - laser focus on local devices: Moshi will soon be everywhere. Frontier model builders have low incentive to let you run smaller models locally (price per token…) but non-profits like Kyutai have very different incentives. The Moshi demo is already online while the OpenAI 4o one is still in limbo. - going under 300 ms of latency while keeping Llama 8B or above quality of answers is a key enabler in terms of interactivity, it’s game changing, This feeling when the model answer your question before you even finished asking is quite crazy or when you interrupt the model while it’s talking and it react… Predictive coding in a model, instantly updated model of what you’re about to say... Basically they nailed the fundamentals. It’s here. This interactive voice tech will be everywhere. It will soon be an obvious commodity.
Like Comment
To view or add a comment, sign in
Give Me The Mic: Unleashing Your Potential

14 followers
1mo
Report this post
In my new video, I delve deep into the power of Perplexity AI, a game-changing tool that can search the internet and summarize data efficiently. I demonstrate how to perform searches, in this case, gathering insights on Nvidia and Apple stocks and explain how data retrieval and summarization happen. But the journey doesn't end here. We take it a notch higher by integrating Perplexity AI with Zapier, setting up daily triggers for automatic data fetching. Further enhancing automation, I bring in ChatGPT for a more detailed analysis, and guide you through the process of setting up API keys and navigating Perplexity AI's API documentation. The video provides a step-by-step tutorial on configuring and testing this automation in Zapier, including practical examples of retrieving and processing stock data. A perfect blend of theory and practice will equip you to leverage the AI era to its fullest. Keen to learn more? Visit https://lnkd.in/eWxK3TNj for more engaging content. #PerplexityAI #Zapier #DataAutomation #ChatGPT #AIInnovation
Like Comment
To view or add a comment, sign in
Humayun Khan

Chairman at The Net Rider | Network Engineer | Digital Marketing Expert | Freelancing Expert | Mentored 50,000+ Students| Consulted Many Brands | Public Speaker| Are You NEXT??
5mo
Report this post
AI Prompt Engineer Prompt engineers are experts at getting generative AI applications like ChatGPT to deliver a specific output. (For example, a generic prompt will elicit a generic response, so a prompt engineer will refine their prompts until they get the desired result.) As such, rather than being experts in programming, prompt engineers need strong communication skills, attention to detail, critical thinking, and data skills (in terms of working out what info the AI needs). #HumayunKhan #artificialintelligence #digitalmarketing #training #thenetrider #proud #AIChatbots #promptengineering
Like Comment
To view or add a comment, sign in
JD Turner

I Build AI Systems for Content & Community | Automating Everything | Transforming Community and Content Operations
8mo Edited
Report this post
"I created a GPT that posts content!" Hold my beer. 🍺 (If you're looking for the fully automated features - skip to around 4 minutes in!) ChatGPT's latest updates allow anyone to streamline their workflows. It ALSO allows people that know a bit more about development to GREATLY enhance their capabilities to a degree never before known. Now as a developer: - You don't have to build to scale. ChatGPT handles that. - You don't have to connect a ton of different functionalities together. ChatGPT does that. You can utilize ChatGPT's custom GPTs to create any workflow for anyone in your team, company, or yourself. Imagine not needing to train someone on how to do something. You create a GPT for that. Then you give your team member the custom code that they can just house on their own machine - which will power custom GPTs to astronomical heights. GPTs + Local workflows = An insane future. #chatgpt #gpts #gptstore #automation #ai

11 Comments
Like Comment
To view or add a comment, sign in
Alex T

Founder @ Visa Analyzer | Creator of Visa Information Consistency Finder | UCLA grad
10mo
Report this post
• Future of AI: A research team found a way to convert a simple ChatGPT prompt into an interactive mini-application for efficient data handling. • Tech Breakthrough: This game-changing technique lets data scientists interact with AI, creating a perfect blend of human and machine input. • User-friendly Design: Even without a tech background, the app's user-friendly interface makes it easy for non-techies to communicate effectively with AI entities. • A New Era of Data Science: This development could revolutionize the data science industry by enabling even more efficient information extraction from massive data sets. • Promising Outcomes: Although the tech is still in its early stages, initial results show massive potential for this AI-human collaboration model. In the realm of AI, we've turned chat prompts into mini software engineers! Prepare for a future where your AI will not only answer your questions but might even ask you one back! With this groundbreaking discovery, data science just got a cool new upgrade - and all you non-techies, the future's looking friendly! Be ready for some serious AI-human pow wows. Hashtags: #AI #DataScience #TechNews #GroundbreakingAI #FutureIsHere #CodingForNonTechies #ChatGPT #AIInnovation
Like Comment
To view or add a comment, sign in
Sakshi Shetye

A computer enthusiast looking for challenges
8mo
Report this post
Unlocking the power of ChatGPT: Unleash your creativity and build innovative applications with the ChatGPT Prompt Engineering for Developers short course. Ai! Check out my demo working video to see how I used the techniques I learned to create engaging and informative chatbots. #chatgpt #promptengineering #developers

1 Comment
Like Comment
To view or add a comment, sign in

220 followers

View Profile Follow

Xuan Son NGUYEN’s Post

More from this author

Set your own "ngrok alternative" on Heroku

How did I optimize my Wordpress website?

Bundling node_modules and your Node.JS app into one single file

Explore topics