40k tok/s input on llama3 8b. AI workloads are generally very input-heavy. This, combined with our output speed, will now make Groq the only way to build performant AI applications.
Last Week: Groq exceeded 30,000 Tokens / second input rate on Llama3 8B! This Week: Llama3 70B at 40,792 Tokens/s input rate!! - FP16 Multiply, FP32 Accumulate - 7989 tokens in - full Llama context length Next Week: ...? 😲
insane, you guys are crushing the non existent speed limit
Groq: now with more ludicrous speed! What is the upper limit? Jonathan Ross
Amazing! It just keeps getting better and faster
Tech: AI Solutions Architect | Ex Googler - Google Cloud | BMW Technology AI/ML SRE | AI Startups Incubator Executive
1moAt this point you're just being disrespectful. LMAO