Joseph Tighe’s Post

3mo

Really cool launch coming out of our embodied AI group at FAIR!

View organization page for AI at Meta, graphic

822,602 followers

3mo

Today we’re releasing OpenEQA – the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” Details ➡️ https://go.fb.me/ni32ze Benchmark ➡️ https://go.fb.me/zy6l30 Paper ➡️ https://go.fb.me/7g8nqb We benchmarked state-of-art vision+language models (VLMs) on OpenEQA and found a significant gap between human level performance and even today’s best models. In fact, for questions that require spatial understanding, today’s VLMs are nearly “blind” – access to visual content provides only minor improvements over language-only models. We hope that by releasing OpenEQA, we can help to motivate additional research in this space. At FAIR, we’re working to build world models capable of performing well on OpenEQA, and we welcome others to join us in that effort.

To view or add a comment, sign in

More Relevant Posts

AI at Meta

822,602 followers
3mo
Report this post
Today we’re releasing OpenEQA – the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” Details ➡️ https://go.fb.me/ni32ze Benchmark ➡️ https://go.fb.me/zy6l30 Paper ➡️ https://go.fb.me/7g8nqb We benchmarked state-of-art vision+language models (VLMs) on OpenEQA and found a significant gap between human level performance and even today’s best models. In fact, for questions that require spatial understanding, today’s VLMs are nearly “blind” – access to visual content provides only minor improvements over language-only models. We hope that by releasing OpenEQA, we can help to motivate additional research in this space. At FAIR, we’re working to build world models capable of performing well on OpenEQA, and we welcome others to join us in that effort.

51 Comments
Like Comment
To view or add a comment, sign in
PURNA CHANDER RACHA 🇮🇳

MLOps Engineer | Generative AI | Artificial intelligence | Data Scientist | NLP Expert | Computer vision Engineer | Ex RBI | 13K+ Followers | Edge AI | 1.3Million+ post impressions |
3mo
Report this post
🚀 Exciting News Alert! 🚀 🌟 Introducing OpenEQA: The Revolutionary Open-Vocabulary Embodied Question Answering Benchmark! 🌟 Are you ready to test the limits of AI understanding in physical environments? OpenEQA is here to push the boundaries with open-vocabulary questions like "Where did I leave my badge?" 🤔 🔍 Dive into the Details: [Link to Details] ➡️ https://go.fb.me/ni32ze 📊 Check out the Benchmark: [Link to Benchmark] ➡️ https://go.fb.me/zy6l30 📑 Get the Insights from the Paper: [Link to Paper] ➡️ https://go.fb.me/7g8nqb 🔍 What did we find? State-of-the-art vision+language models (VLMs) were put to the test on OpenEQA, revealing a significant gap between human-level performance and today's best AI models. 😲 In fact, when it comes to questions requiring spatial understanding, current VLMs are nearly "blind" – access to visual content offers only minor improvements over language-only models. 🤯 🔥 Mission: By unleashing OpenEQA to the world, we aim to ignite a spark of innovation and motivation for further research in this dynamic field. At FAIR, we're committed to crafting world models capable of excelling on OpenEQA, and we invite YOU to join us on this groundbreaking journey! 🌍💡 Are you ready to revolutionize AI understanding of physical environments? Let's shape the future together with OpenEQA! 🌟 #OpenEQA #AI #Research #Innovation #FAIR

AI at Meta

822,602 followers
3mo

Today we’re releasing OpenEQA – the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” Details ➡️ https://go.fb.me/ni32ze Benchmark ➡️ https://go.fb.me/zy6l30 Paper ➡️ https://go.fb.me/7g8nqb We benchmarked state-of-art vision+language models (VLMs) on OpenEQA and found a significant gap between human level performance and even today’s best models. In fact, for questions that require spatial understanding, today’s VLMs are nearly “blind” – access to visual content provides only minor improvements over language-only models. We hope that by releasing OpenEQA, we can help to motivate additional research in this space. At FAIR, we’re working to build world models capable of performing well on OpenEQA, and we welcome others to join us in that effort.
Like Comment
To view or add a comment, sign in
Sudhanshu Heda

Co-Founder at Insight
3mo
Report this post
Enhancing LLMs by helping them see the world like humans do is the first step towards building “world” models. Insight XR is leveraging the power of VLMs to empower critical use-cases for XR Teams from remote surgery training to pilot training. In the process, we are building up a massive HQ multimodal data-repository annotated by developers themselves. Data is the new oil but not all data is of equal value. Access to exclusive data across use-cases will help us build better visual language models grounded in spatial understanding! This is what keeps us up at night :)

AI at Meta

822,602 followers
3mo

Today we’re releasing OpenEQA – the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” Details ➡️ https://go.fb.me/ni32ze Benchmark ➡️ https://go.fb.me/zy6l30 Paper ➡️ https://go.fb.me/7g8nqb We benchmarked state-of-art vision+language models (VLMs) on OpenEQA and found a significant gap between human level performance and even today’s best models. In fact, for questions that require spatial understanding, today’s VLMs are nearly “blind” – access to visual content provides only minor improvements over language-only models. We hope that by releasing OpenEQA, we can help to motivate additional research in this space. At FAIR, we’re working to build world models capable of performing well on OpenEQA, and we welcome others to join us in that effort.

1 Comment
Like Comment
To view or add a comment, sign in
Xiaohan Zhang

Robotics Researcher at Boston Dynamics AI Institute
3mo
Report this post
It's always exciting to me how foundation models redefine the future of robotics and embodied AI, then we really need reliable benchmarks, especially for long-horizon vision&language understanding. We build real-world datasets and provide clean and simple baselines in OpenEQA.

AI at Meta

822,602 followers
3mo

Today we’re releasing OpenEQA – the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” Details ➡️ https://go.fb.me/ni32ze Benchmark ➡️ https://go.fb.me/zy6l30 Paper ➡️ https://go.fb.me/7g8nqb We benchmarked state-of-art vision+language models (VLMs) on OpenEQA and found a significant gap between human level performance and even today’s best models. In fact, for questions that require spatial understanding, today’s VLMs are nearly “blind” – access to visual content provides only minor improvements over language-only models. We hope that by releasing OpenEQA, we can help to motivate additional research in this space. At FAIR, we’re working to build world models capable of performing well on OpenEQA, and we welcome others to join us in that effort.

1 Comment
Like Comment
To view or add a comment, sign in

1,780 followers

14 Posts

View Profile Follow

Joseph Tighe’s Post

More Relevant Posts

Explore topics