FAR AI

Research Services

Berkeley, California 1,613 followers

Ensuring the safe development and deployment of frontier AI systems

See jobs Follow

View all 24 employees

About us

FAR AI is a technical AI research and education non-profit, dedicated to ensuring the safe development and deployment of frontier AI systems. FAR Research: Explores a portfolio of promising technical AI safety research directions. FAR Labs: Supports the San Francisco Bay Area AI safety research community through a coworking space, events and programs. FAR Futures: Delivers events and initiatives bringing together global leaders in AI academia, industry and policy.

Website: https://far.ai/
External link for FAR AI
Industry: Research Services
Company size: 11-50 employees
Headquarters: Berkeley, California
Type: Nonprofit
Founded: 2022
Specialties: Artificial Intelligence and AI Alignment Research

Locations

Primary

Berkeley, California, US

Get directions

Employees at FAR AI

See all employees

Updates

FAR AI

1,613 followers
4w
Report this post
🛡 Is AI robustness possible, or are adversarial attacks unavoidable? We investigate this in Go, testing three defenses to make superhuman Go AIs robust. Our defenses manage to protect against known threats, but unfortunately new adversaries bypass them, sometimes using qualitatively new attacks! 😈 Last year we found that superhuman Go AIs are vulnerable to “cyclic attacks”. This adversarial strategy was discovered by AI, but can be replicated by human players. See our previous update: https://buff.ly/4cqdVYW We were curious to know whether it was possible to defend against the cyclic attack. Over the course of a year, we tested three different ways of patching the cyclic-vulnerability in KataGo, the leading open-source Go AI: 📚 Defense #1: Positional Adversarial Training. The KataGo developers added manually curated adversarial examples to KataGo’s training data. While this successfully defends KataGo against our original versions of the cyclic attack, we find new variants of the cyclic attack that still get through. We also find brand new attacks that defeat this system, such as the “gift attack” shown at https://buff.ly/3xkqKoJ 🎁 🔄 Defense #2: Iterated Adversarial Training. This approach alternates between defense & offense, mirroring a cybersecurity arms race. Each iteration improves KataGo's defense against known adversaries, but after 9 cycles, the most defended model can still be beaten 81% of the time by a novel variant of the cyclic attack we call the “atari attack”: https://buff.ly/3RxW9uI 🎋 🖼️ Defense #3: Vision Transformer (ViT). In this defense, we replaced KataGo’s convolutional neural network (CNN) backbone, which focuses on local patterns, with a ViT backbone, which can attend to the entire board at once. Unfortunately our ViT bot remained vulnerable to the original cyclical attack. Three diverse defenses all being overcome by new attacks is further evidence that AI robustness issues like jailbreaks are likely to remain a problem for many years to come. 💡 However, we did notice one positive sign: defending against any fixed static attack was quick and easy. We think it might be possible to leverage this property to build a working defense both in Go and other settings. In particular, one could a) grow the adversarial training dataset by scaling up attack generation, b) improve the sample efficiency / generalization of adversarial training, and / or c) apply adversarial training online to defend against adversaries as they are learning to attack. For more information: 🔗 Visit our website: https://goattack.far.ai/ 📝 Check out the blog post: https://lnkd.in/eCFYYupX 📄 Read the full paper: https://lnkd.in/eZrSpCc6 👥 Research by Tom Tseng, Euan McLean, Kellin Pelrine, Tony Wang and Adam Gleave. 🚀 If you're interested in making AI systems more robust, we're hiring! Check out our roles at https://far.ai/jobs
3 Comments

Like Comment Share
FAR AI

1,613 followers
2h
Report this post
Calling #ICML2024 attendees—Don’t miss the Vienna Alignment Workshop: Open Social event! 📅 Sunday, July 21st 🕖 19:00-22:00 📍 Austria Center Vienna (ACV) RSVP (optional): https://lnkd.in/gamRa78g Bring a friend and align your evening plans!

FAR AI on LinkedIn: #icml2024

linkedin.com

Like Comment Share
FAR AI

1,613 followers
2d
Report this post
⚪⚫Can AI be truly robust, or are adversarial attacks inevitable? We tested 3 defenses on top Go AIs. Known threats were blocked, but new adversaries still broke through. Our paper was featured in Nature Magazine—learn more at our NextGen AI Safety talk at #ICML2024 on July 26! 👥 Research by Tom Tseng, Euan McLean, Kellin Pelrine, Tony Wang, and Adam Gleave 🔗 Visit our website: http://goattack.far.ai 📝 Check out the blog post: https://lnkd.in/eCFYYupX 📄 Read the full paper: https://lnkd.in/eZrSpCc6 🌱 Explore the Nature Magazine article: https://lnkd.in/gXuHDiA2 If you’re excited by this research, our team is hiring! See our job descriptions at https://far.ai/jobs/ or email hello@far.ai to explore collaboration opportunities.
FAR AI

1,613 followers
4w

🛡 Is AI robustness possible, or are adversarial attacks unavoidable? We investigate this in Go, testing three defenses to make superhuman Go AIs robust. Our defenses manage to protect against known threats, but unfortunately new adversaries bypass them, sometimes using qualitatively new attacks! 😈 Last year we found that superhuman Go AIs are vulnerable to “cyclic attacks”. This adversarial strategy was discovered by AI, but can be replicated by human players. See our previous update: https://buff.ly/4cqdVYW We were curious to know whether it was possible to defend against the cyclic attack. Over the course of a year, we tested three different ways of patching the cyclic-vulnerability in KataGo, the leading open-source Go AI: 📚 Defense #1: Positional Adversarial Training. The KataGo developers added manually curated adversarial examples to KataGo’s training data. While this successfully defends KataGo against our original versions of the cyclic attack, we find new variants of the cyclic attack that still get through. We also find brand new attacks that defeat this system, such as the “gift attack” shown at https://buff.ly/3xkqKoJ 🎁 🔄 Defense #2: Iterated Adversarial Training. This approach alternates between defense & offense, mirroring a cybersecurity arms race. Each iteration improves KataGo's defense against known adversaries, but after 9 cycles, the most defended model can still be beaten 81% of the time by a novel variant of the cyclic attack we call the “atari attack”: https://buff.ly/3RxW9uI 🎋 🖼️ Defense #3: Vision Transformer (ViT). In this defense, we replaced KataGo’s convolutional neural network (CNN) backbone, which focuses on local patterns, with a ViT backbone, which can attend to the entire board at once. Unfortunately our ViT bot remained vulnerable to the original cyclical attack. Three diverse defenses all being overcome by new attacks is further evidence that AI robustness issues like jailbreaks are likely to remain a problem for many years to come. 💡 However, we did notice one positive sign: defending against any fixed static attack was quick and easy. We think it might be possible to leverage this property to build a working defense both in Go and other settings. In particular, one could a) grow the adversarial training dataset by scaling up attack generation, b) improve the sample efficiency / generalization of adversarial training, and / or c) apply adversarial training online to defend against adversaries as they are learning to attack. For more information: 🔗 Visit our website: https://goattack.far.ai/ 📝 Check out the blog post: https://lnkd.in/eCFYYupX 📄 Read the full paper: https://lnkd.in/eZrSpCc6 👥 Research by Tom Tseng, Euan McLean, Kellin Pelrine, Tony Wang and Adam Gleave. 🚀 If you're interested in making AI systems more robust, we're hiring! Check out our roles at https://far.ai/jobs
Like Comment Share
FAR AI

1,613 followers
1w
Report this post
💗🗣 How does translating the Korean word "jeong" (정) illustrate the challenge of AI alignment? 🤖🎯 Been Kim discusses alignment and interpretability as part of the New Orleans Alignment Workshop hosted by FAR AI. Watch the full video at https://lnkd.in/gDWMR-v2 🎯 What is Value Alignment? 1️⃣ Technical Question: How to encode values in Al so it reliably does what it ought to do? 2️⃣ Normative Question: What goal or human values should Al be aligned with? Who decides? 🇰🇷Cultural Metaphors in AI: Using the Korean concept of 'jeong,' Kim beautifully illustrated the complexities in aligning human and machine understanding. ♟️Teaching with AI: Been’s research using AlphaZero to teach new chess strategies to Grandmasters demonstrated AI's potential to enrich and expand human expertise in established fields. 🔮 Future Perspectives: Been concludes we must expand our knowledge to bridge the human-AI understanding gap, harnessing AI’s unique capabilities to augment human potential. Follow us for updates about upcoming content and workshops!

Like Comment Share
FAR AI

1,613 followers
1w
Report this post
🔍 "Understanding AI through Computational Mechanics" by Adam Shai and Paul Riechers presented at FAR Labs Seminar 🌟 Key Highlights: 🧠 Insights on how computational mechanics predicts AI behavior 📊 Frameworks for robust AI safety benchmarks 📺 Watch the full recording: https://lnkd.in/gWRhSjsd -- and subscribe to our YouTube channel for future research presentations!

FAR Seminar: Paul Riechers & Adam Shai – Simplex

https://www.youtube.com/

Like Comment Share
FAR AI

1,613 followers
1w Edited
Report this post
Attending #ICML2024? Join us for the Vienna Alignment Workshop: Open Social event! 🤖💬 🌟 📍 Where? Austria Center Vienna (ACV) ⏰ When? Sunday, July 21st 19:00-22:00 RSVP optional but a quick sign-up helps us plan. https://lnkd.in/gaZ7wqVx All ICML attendees interested in alignment are welcome – spread the word and invite your colleagues!
Like Comment Share
FAR AI reposted this

Founders Pledge

15,236 followers
1w
Report this post
How can we develop safer, aligned, beneficial AI? Dig into our recommendations and strategy in our new research on advanced AI: https://lnkd.in/gsrTqRNG We recommend four organizations tackling this critical work: The Centre for Long-Term Resilience, Effective Institutions Project, FAR AI, and the Institute for Law & AI. Additionally, our Global Catastrophic Risks Fund works to minimize the biggest threats we face (including advanced AI).

2 Comments

Like Comment Share
FAR AI

1,613 followers
2w
Report this post
💯🦺 Could we have “provably safe AI”, and what would this imply for tech policy? 🧑⚖️📚 Max Tegmark discusses the possibility of quantified safety bounds at the New Orleans Alignment Workshop hosted by FAR AI. Watch the full video at https://lnkd.in/gxemP3sJ Key Highlights: 🧠 Revolutionizing formal verification and program synthesis 🔒 Ensuring AI cannot cause harm under the known laws of physics 🔍 Distilling learned knowledge into verifiable code Follow us for updates about upcoming content and workshops!

Like Comment Share
FAR AI

1,613 followers
2w
Report this post
🤔👾 Could we instill AI agents with Bayesian reasoning capabilities? 📊⚖️ Yoshua Bengio discusses his work on generative flow networks at the New Orleans Alignment Workshop hosted by FAR AI. Watch the full video at https://lnkd.in/gQP599yG Bengio delivered a powerful message on the need for global, coordinated governance in AI. 🌍🤝 📊 Quantitative Safety Guarantees: Bengio dives into how methods of Bayesian methods structure learning such as GFlowNets can achieve rigorous safety guarantees. 🛡️ Shaping a Secure AI Tomorrow: To address the complex challenges posed by AI advances, Bengio calls for a network of democratically governed AI labs. ✨🤖 Follow us for updates about upcoming content and workshops!

Like Comment Share
FAR AI

1,613 followers
3w
Report this post
🎲 Can Go AIs be adversarially robust? Tom Tseng explores the robustness of superhuman Go bots against adversarial attacks at the FAR Labs Seminar. Key Highlights: 🔍 Adversarial training defends against the original attack – but new adaptive attacks can beat it. 🔄 Switching architecture to vision transformers doesn’t help – they’re vulnerable to the same attack as convolutional neural networks! ⚠️ Tom discovers qualitatively new attacks that beat superhuman Go AIs and can also be replicated by a human. 🔗 Visit our website: http://goattack.far.ai 📝 Check out the blog post: https://lnkd.in/eCFYYupX 📄 Read the published paper: https://lnkd.in/eZrSpCc6 📢 See the announcement: https://lnkd.in/gBCdus3X 📺 Watch the full recording on YouTube: https://lnkd.in/gQxSdrZM (subscribe for future research presentations!)

FAR Seminar: Tom Tseng – Defending Against Adversarial Attacks in Go

https://www.youtube.com/

Like Comment Share

FAR AI

Research Services

Berkeley, California 1,613 followers

Ensuring the safe development and deployment of frontier AI systems

About us

Locations

Employees at FAR AI

Lindsay Murachver

Head of Programs

Philip Quirke

AI Safety PM / researcher. Supporting Predator Free 2050

Adam Gleave

CEO at FAR AI

Vael Gates

Updates

FAR AI on LinkedIn: #icml2024

linkedin.com

FAR Seminar: Paul Riechers & Adam Shai – Simplex

https://www.youtube.com/

FAR Seminar: Tom Tseng – Defending Against Adversarial Attacks in Go

https://www.youtube.com/

Join now to see what you are missing

Similar pages

Redwood Research

Anthropic

Inflection AI

OpenAI

Ember

Google DeepMind

Center for Human-Compatible AI

Centre for the Governance of AI (GovAI)

Google

Cooperative AI Foundation