Sunny Madra’s Post

View profile for Sunny Madra, graphic

GM of Cloud @ Groq

40k tok/s input on llama3 8b. AI workloads are generally very input-heavy. This, combined with our output speed, will now make Groq the only way to build performant AI applications.

View profile for Jonathan Ross, graphic

CEO & Founder, Groq®

Last Week: Groq exceeded 30,000 Tokens / second input rate on Llama3 8B! This Week: Llama3 70B at 40,792 Tokens/s input rate!! - FP16 Multiply, FP32 Accumulate - 7989 tokens in - full Llama context length Next Week: ...? 😲

Micah Berkley

Tech: AI Solutions Architect | Ex Googler - Google Cloud | BMW Technology AI/ML SRE | AI Startups Incubator Executive

1mo

At this point you're just being disrespectful. LMAO

Julian Alvarez 🚀

Co-Founder & CEO at Wisdolia (AI x Learning) | Transforming students into super learners

1mo

insane, you guys are crushing the non existent speed limit

Bob S.

CEO @ Goodcall | Applied AI: NLP & speech tech | Ex-Google, Area120, Speech AI | MIT

1mo

Groq: now with more ludicrous speed! What is the upper limit? Jonathan Ross

Like
Reply
Kevin Cho

Enterprise Account Executive @ Snowflake ❄️

1mo

Amazing! It just keeps getting better and faster

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics