Joseph Tighe’s Post

Really cool launch coming out of our embodied AI group at FAIR!

View organization page for AI at Meta, graphic

822,602 followers

Today we’re releasing OpenEQA – the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” Details ➡️ https://go.fb.me/ni32ze  Benchmark ➡️ https://go.fb.me/zy6l30  Paper ➡️ https://go.fb.me/7g8nqb We benchmarked state-of-art vision+language models (VLMs) on OpenEQA and found a significant gap between human level performance and even today’s best models. In fact, for questions that require spatial understanding, today’s VLMs are nearly “blind” –  access to visual content provides only minor improvements over language-only models. We hope that by releasing OpenEQA, we can help to motivate additional research in this space. At FAIR, we’re working to build world models capable of performing well on OpenEQA, and we welcome others to join us in that effort.

To view or add a comment, sign in

Explore topics