CO/AI’s Post

View organization page for CO/AI, graphic

1,782 followers

"The work has really just begun." New Research From Anthropic (The Maker of Claude) - They have the first detailed look inside a modern large language model. -A subfield of AI research: Mechanistic Interpretability Aims to understand how these models work by examining their internal mechanisms Or “Reverse engineer neural networks” - Anthropic -For the first time, Anthropic made significant strides in interpreting AI models, specifically Claude 3 Sonnet, using a technique called "dictionary learning." -Finding Patterns: They identified approximately 10 million patterns, or "features," that represent different concepts within the model. -When these features are triggered they change model output. -This is the first step in understanding models and tracing LLMs from training data to final output.

Mapping AI Models 🗺

Mapping AI Models 🗺

CO/AI on LinkedIn

Anthony Batt

Digital Product Designer, Entrepreneur

1mo

I've been fascinated by the work of the Anthropic team, specifically their focus on introspection. I avoid using the term "Mechanistic Interpretability," as it tends to confuse people. Instead, I explain that the creators of LLMs largely don't understand how the neural networks function, but they do have some insights. They are developing tools to observe how an LLM connects information and generates a response, similar to an MRI machine for an LLM. While people often find this intriguing and ask further complex questions, I always attempt to provide simple answers. It's exciting to see Anthropic making progress in this area.

Like
Reply

To view or add a comment, sign in

Explore topics