Sunny Madra’s Post

GM of Cloud @ Groq

1mo

40k tok/s input on llama3 8b. AI workloads are generally very input-heavy. This, combined with our output speed, will now make Groq the only way to build performant AI applications.

Jonathan Ross

CEO & Founder, Groq®

1mo

Last Week: Groq exceeded 30,000 Tokens / second input rate on Llama3 8B! This Week: Llama3 70B at 40,792 Tokens/s input rate!! - FP16 Multiply, FP32 Accumulate - 7989 tokens in - full Llama context length Next Week: ...? 😲

4 Comments

Micah Berkley

Tech: AI Solutions Architect | Ex Googler - Google Cloud | BMW Technology AI/ML SRE | AI Startups Incubator Executive

1mo

At this point you're just being disrespectful. LMAO

1 Reaction

Julian Alvarez 🚀

Co-Founder & CEO at Wisdolia (AI x Learning) | Transforming students into super learners

1mo

insane, you guys are crushing the non existent speed limit

3 Reactions

Bob S.

CEO @ Goodcall | Applied AI: NLP & speech tech | Ex-Google, Area120, Speech AI | MIT

1mo

Groq: now with more ludicrous speed! What is the upper limit? Jonathan Ross

Kevin Cho

Enterprise Account Executive @ Snowflake ❄️

1mo

Amazing! It just keeps getting better and faster

See more comments

To view or add a comment, sign in

More Relevant Posts

Aaron Price

Lead Backend Engineer @ The Stable, Part of Accenture Song
1mo
Report this post
In case you missed, it Groq achieved over 40k input tokens PER SECOND. You maybe asking why input speed is so important, as context window limits are expanding (128k for GPT4o, and 1Mil in the case of Gemni 1.5) these speeds will be increasingly important.

Jonathan Ross

CEO & Founder, Groq®
1mo

Last Week: Groq exceeded 30,000 Tokens / second input rate on Llama3 8B! This Week: Llama3 70B at 40,792 Tokens/s input rate!! - FP16 Multiply, FP32 Accumulate - 7989 tokens in - full Llama context length Next Week: ...? 😲
Like Comment
To view or add a comment, sign in
Julien Chaumond

CTO at Hugging Face
1w
Report this post
Q2_K quantized model meets its FP32 base

67 Comments
Like Comment
To view or add a comment, sign in
Keven(Qi) Wang

AI for good
1w
Report this post
This is so true for at least most of out of box quantized model i.e from TheBloke on HF. Still quantization makes these LLM more accessible but I will never use them in production deployment, the result is unpredictable

Julien Chaumond

CTO at Hugging Face
1w

Q2_K quantized model meets its FP32 base
Like Comment
To view or add a comment, sign in
Celedon Solutions Inc.

27 followers
1mo
Report this post
Check out another example of Davinci in action: https://lnkd.in/eHj2fwQk Using information from a datasheet Davinci will automatically populate and link components.

Davinci Example: Sensor Package

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Elisha Bulalu

AI and Hardware Eng | Lead Blockchain engineer at Orbofi
3mo
Report this post
weekend updates: 1. complete CAN node circuits ✅ 2. write code for sending and receiving CAN frames ✅ 3. end 2 end simulations for the circuit ✅ 4. create a demo video ✅ the next steps will be plugin the sniffer into the car and reading its data.

2 Comments
Like Comment
To view or add a comment, sign in
MANTU KUMAR THAKUR

Student at NIT UK M.Tech( CSE-AI & ML)
1mo
Report this post
@ Feature Pyramid Network, it actually considers the high level feature map with lower level feature maps to actually create a new set of feature maps, then why do we use an extension of FPN ? it gives you a depth knowledge about one part of v4....
Like Comment
To view or add a comment, sign in
Southern Wind Shipyard

8,347 followers
9mo
Report this post
SW100X Allseas: hull mould completed. Think, plan, act!

SW100X Allseas: hull mould completed

1 Comment
Like Comment
To view or add a comment, sign in
NetBeez, Inc.

1,242 followers
2mo
Report this post
Fping, which stands for "fast ping" can run ping tests to multiple hosts in parallel and really fast (even with 1 millisecond intervals!). https://hubs.ly/Q02szHkP0 #Fping

Linux for Network Engineers: How to use fping
Like Comment
To view or add a comment, sign in
Srini V

Generative AI | Python | MLOps | Technical Product Management | SageMaker | LLMOps | AWS | Kubernetes | DevOps | DevSecOps | Web3 | Blockchain
1w
Report this post
Want to Achieve FP32 Accuracy for INT8 Inference Using Quantization Aware Training . Explore https://lnkd.in/gH6xqgss
Like Comment
To view or add a comment, sign in
Clemens Dempers

CEO @ Polar Analytics | Simulation Modeling, AHP, Data Analysis
11mo
Report this post
There are really neat applications of Wolfram System modeller in Engineering and Environmental Modeling / Green Economy. #Wolfram #greeneconomy #circulareconomy
Wolfram

26,689 followers
11mo

Try out something new with System Modeler version 13.3! Distribute libraries with easy-to-create installers, interact with real-time simulations., automatically calibrate model parameters, and more: https://wolfr.am/1fct9CrUv
Like Comment
To view or add a comment, sign in

12,203 followers

3000+ Posts

View Profile Follow

Sunny Madra’s Post

More Relevant Posts

Davinci Example: Sensor Package

https://www.youtube.com/

SW100X Allseas: hull mould completed

Explore topics