FAR AI

FAR AI

Research Services

Berkeley, California 1,613 followers

Ensuring the safe development and deployment of frontier AI systems

About us

FAR AI is a technical AI research and education non-profit, dedicated to ensuring the safe development and deployment of frontier AI systems. FAR Research: Explores a portfolio of promising technical AI safety research directions. FAR Labs: Supports the San Francisco Bay Area AI safety research community through a coworking space, events and programs. FAR Futures: Delivers events and initiatives bringing together global leaders in AI academia, industry and policy.

Website
https://far.ai/
Industry
Research Services
Company size
11-50 employees
Headquarters
Berkeley, California
Type
Nonprofit
Founded
2022
Specialties
Artificial Intelligence and AI Alignment Research

Locations

Employees at FAR AI

Updates

  • View organization page for FAR AI, graphic

    1,613 followers

    🛡 Is AI robustness possible, or are adversarial attacks unavoidable? We investigate this in Go, testing three defenses to make superhuman Go AIs robust. Our defenses manage to protect against known threats, but unfortunately new adversaries bypass them, sometimes using qualitatively new attacks! 😈 Last year we found that superhuman Go AIs are vulnerable to “cyclic attacks”. This adversarial strategy was discovered by AI, but can be replicated by human players. See our previous update: https://buff.ly/4cqdVYW We were curious to know whether it was possible to defend against the cyclic attack. Over the course of a year, we tested three different ways of patching the cyclic-vulnerability in KataGo, the leading open-source Go AI: 📚 Defense #1: Positional Adversarial Training. The KataGo developers added manually curated adversarial examples to KataGo’s training data. While this successfully defends KataGo against our original versions of the cyclic attack, we find new variants of the cyclic attack that still get through. We also find brand new attacks that defeat this system, such as the “gift attack” shown at https://buff.ly/3xkqKoJ 🎁 🔄 Defense #2: Iterated Adversarial Training. This approach alternates between defense & offense, mirroring a cybersecurity arms race. Each iteration improves KataGo's defense against known adversaries, but after 9 cycles, the most defended model can still be beaten 81% of the time by a novel variant of the cyclic attack we call the “atari attack”: https://buff.ly/3RxW9uI 🎋 🖼️ Defense #3: Vision Transformer (ViT). In this defense, we replaced KataGo’s convolutional neural network (CNN) backbone, which focuses on local patterns, with a ViT backbone, which can attend to the entire board at once. Unfortunately our ViT bot remained vulnerable to the original cyclical attack. Three diverse defenses all being overcome by new attacks is further evidence that AI robustness issues like jailbreaks are likely to remain a problem for many years to come. 💡 However, we did notice one positive sign: defending against any fixed static attack was quick and easy. We think it might be possible to leverage this property to build a working defense both in Go and other settings. In particular, one could a) grow the adversarial training dataset by scaling up attack generation, b) improve the sample efficiency / generalization of adversarial training, and / or c) apply adversarial training online to defend against adversaries as they are learning to attack. For more information: 🔗 Visit our website: https://goattack.far.ai/ 📝 Check out the blog post: https://lnkd.in/eCFYYupX 📄 Read the full paper: https://lnkd.in/eZrSpCc6 👥 Research by Tom Tseng, Euan McLean, Kellin Pelrine, Tony Wang and Adam Gleave. 🚀 If you're interested in making AI systems more robust, we're hiring! Check out our roles at https://far.ai/jobs

    • No alternative text description for this image
  • View organization page for FAR AI, graphic

    1,613 followers

    ⚪⚫Can AI be truly robust, or are adversarial attacks inevitable? We tested 3 defenses on top Go AIs. Known threats were blocked, but new adversaries still broke through. Our paper was featured in Nature Magazine—learn more at our NextGen AI Safety talk at #ICML2024 on July 26! 👥 Research by Tom Tseng, Euan McLean, Kellin Pelrine, Tony Wang, and Adam Gleave 🔗 Visit our website: http://goattack.far.ai  📝 Check out the blog post: https://lnkd.in/eCFYYupX  📄 Read the full paper: https://lnkd.in/eZrSpCc6 🌱 Explore the Nature Magazine article: https://lnkd.in/gXuHDiA2 If you’re excited by this research, our team is hiring! See our job descriptions at https://far.ai/jobs/ or email hello@far.ai to explore collaboration opportunities.

    View organization page for FAR AI, graphic

    1,613 followers

    🛡 Is AI robustness possible, or are adversarial attacks unavoidable? We investigate this in Go, testing three defenses to make superhuman Go AIs robust. Our defenses manage to protect against known threats, but unfortunately new adversaries bypass them, sometimes using qualitatively new attacks! 😈 Last year we found that superhuman Go AIs are vulnerable to “cyclic attacks”. This adversarial strategy was discovered by AI, but can be replicated by human players. See our previous update: https://buff.ly/4cqdVYW We were curious to know whether it was possible to defend against the cyclic attack. Over the course of a year, we tested three different ways of patching the cyclic-vulnerability in KataGo, the leading open-source Go AI: 📚 Defense #1: Positional Adversarial Training. The KataGo developers added manually curated adversarial examples to KataGo’s training data. While this successfully defends KataGo against our original versions of the cyclic attack, we find new variants of the cyclic attack that still get through. We also find brand new attacks that defeat this system, such as the “gift attack” shown at https://buff.ly/3xkqKoJ 🎁 🔄 Defense #2: Iterated Adversarial Training. This approach alternates between defense & offense, mirroring a cybersecurity arms race. Each iteration improves KataGo's defense against known adversaries, but after 9 cycles, the most defended model can still be beaten 81% of the time by a novel variant of the cyclic attack we call the “atari attack”: https://buff.ly/3RxW9uI 🎋 🖼️ Defense #3: Vision Transformer (ViT). In this defense, we replaced KataGo’s convolutional neural network (CNN) backbone, which focuses on local patterns, with a ViT backbone, which can attend to the entire board at once. Unfortunately our ViT bot remained vulnerable to the original cyclical attack. Three diverse defenses all being overcome by new attacks is further evidence that AI robustness issues like jailbreaks are likely to remain a problem for many years to come. 💡 However, we did notice one positive sign: defending against any fixed static attack was quick and easy. We think it might be possible to leverage this property to build a working defense both in Go and other settings. In particular, one could a) grow the adversarial training dataset by scaling up attack generation, b) improve the sample efficiency / generalization of adversarial training, and / or c) apply adversarial training online to defend against adversaries as they are learning to attack. For more information: 🔗 Visit our website: https://goattack.far.ai/ 📝 Check out the blog post: https://lnkd.in/eCFYYupX 📄 Read the full paper: https://lnkd.in/eZrSpCc6 👥 Research by Tom Tseng, Euan McLean, Kellin Pelrine, Tony Wang and Adam Gleave. 🚀 If you're interested in making AI systems more robust, we're hiring! Check out our roles at https://far.ai/jobs

    • No alternative text description for this image
  • View organization page for FAR AI, graphic

    1,613 followers

    💗🗣 How does translating the Korean word "jeong" (정) illustrate the challenge of AI alignment? 🤖🎯 Been Kim discusses alignment and interpretability as part of the New Orleans Alignment Workshop hosted by FAR AI. Watch the full video at https://lnkd.in/gDWMR-v2 🎯 What is Value Alignment? 1️⃣ Technical Question: How to encode values in Al so it reliably does what it ought to do? 2️⃣ Normative Question: What goal or human values should Al be aligned with? Who decides? 🇰🇷Cultural Metaphors in AI: Using the Korean concept of 'jeong,' Kim beautifully illustrated the complexities in aligning human and machine understanding. ♟️Teaching with AI: Been’s research using AlphaZero to teach new chess strategies to Grandmasters demonstrated AI's potential to enrich and expand human expertise in established fields. 🔮 Future Perspectives: Been concludes  we must expand our knowledge to bridge the human-AI understanding gap, harnessing AI’s unique capabilities to augment human potential. Follow us for updates about upcoming content and workshops!

  • View organization page for FAR AI, graphic

    1,613 followers

    🔍 "Understanding AI through Computational Mechanics" by Adam Shai and Paul Riechers presented at FAR Labs Seminar 🌟 Key Highlights: 🧠 Insights on how computational mechanics predicts AI behavior 📊 Frameworks for robust AI safety benchmarks 📺 Watch the full recording: https://lnkd.in/gWRhSjsd -- and subscribe to our YouTube channel for future research presentations!

  • View organization page for FAR AI, graphic

    1,613 followers

    Attending #ICML2024? Join us for the Vienna Alignment Workshop: Open Social event! 🤖💬 🌟 📍 Where? Austria Center Vienna (ACV) ⏰ When? Sunday, July 21st 19:00-22:00 RSVP optional but a quick sign-up helps us plan.  https://lnkd.in/gaZ7wqVx  All ICML attendees interested in alignment are welcome – spread the word and invite your colleagues!

    • No alternative text description for this image
  • FAR AI reposted this

    View organization page for Founders Pledge, graphic

    15,236 followers

    How can we develop safer, aligned, beneficial AI? Dig into our recommendations and strategy in our new research on advanced AI: https://lnkd.in/gsrTqRNG We recommend four organizations tackling this critical work: The Centre for Long-Term Resilience, Effective Institutions Project, FAR AI, and the Institute for Law & AI. Additionally, our Global Catastrophic Risks Fund works to minimize the biggest threats we face (including advanced AI).

  • View organization page for FAR AI, graphic

    1,613 followers

    💯🦺 Could we have “provably safe AI”, and what would this imply for tech policy? 🧑⚖️📚 Max Tegmark discusses the possibility of quantified safety bounds at the New Orleans Alignment Workshop hosted by FAR AI. Watch the full video at https://lnkd.in/gxemP3sJ Key Highlights: 🧠 Revolutionizing formal verification and program synthesis 🔒 Ensuring AI cannot cause harm under the known laws of physics 🔍 Distilling learned knowledge into verifiable code Follow us for updates about upcoming content and workshops!

  • View organization page for FAR AI, graphic

    1,613 followers

    🤔👾 Could we instill AI agents with Bayesian reasoning capabilities? 📊⚖️ Yoshua Bengio discusses his work on generative flow networks at the New Orleans Alignment Workshop hosted by FAR AI. Watch the full video at https://lnkd.in/gQP599yG Bengio delivered a powerful message on the need for global, coordinated governance in AI. 🌍🤝 📊 Quantitative Safety Guarantees: Bengio dives into how methods of Bayesian methods structure learning such as GFlowNets can achieve rigorous safety guarantees. 🛡️ Shaping a Secure AI Tomorrow: To address the complex challenges posed by AI advances, Bengio calls for a network of democratically governed AI labs. ✨🤖 Follow us for updates about upcoming content and workshops!

  • View organization page for FAR AI, graphic

    1,613 followers

    🎲 Can Go AIs be adversarially robust? Tom Tseng explores the robustness of superhuman Go bots against adversarial attacks at the  FAR Labs Seminar. Key Highlights: 🔍 Adversarial training defends against the original attack – but new adaptive attacks can beat it. 🔄 Switching architecture to vision transformers doesn’t help – they’re vulnerable to the same attack as convolutional neural networks! ⚠️ Tom discovers qualitatively new attacks that beat superhuman Go AIs and can also be replicated by a human. 🔗 Visit our website: http://goattack.far.ai 📝 Check out the blog post: https://lnkd.in/eCFYYupX 📄 Read the published paper: https://lnkd.in/eZrSpCc6 📢 See the announcement: https://lnkd.in/gBCdus3X 📺 Watch the full recording on YouTube: https://lnkd.in/gQxSdrZM (subscribe for future research presentations!)

Similar pages