Mahaveer Dharmchand’s Post

Visioning, Architecting and Building Human-centric Gen AI | Dreamer | Entrepreneur

1mo

If you're interested in understanding the inner layers of LLM models, the #Anthropic blog is amazing. They tore down the Claude 3.0 Sonnet LLM models apart and peeked into its model view and perspectives. It's a long read paper, but an amazing read! By successfully extracting millions of features from the middle layer of their Claude 3.0 Sonnet model, they have uncovered a conceptual map of its internal representations, revealing how it encodes diverse concepts like cities, scientific fields, and even abstract notions supporting the security, various bias and power-seeking behavior etc... Good Read for a long weekend. https://lnkd.in/gNm8qA3W

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

transformer-circuits.pub

2 Comments

Ilya Ostrovsky

Ensuring Strategic Superiority with AI by solving Defence Data Bottleneck 🇺🇦🇪🇺

1mo

Impressive insights into the inner workings of the Claude 3.0 Sonnet model – it's a deep dive into its conceptual map. Thanks for sharing this, Mahaveer Dharmchand.

Armando Fandango

Generative AI Product Engineering Leader | PhD in AI | ex-AWS, ex-Nike, ex-Accenture, ex-IBM

1mo

Cool blog. Thanks for sharing.

See more comments

To view or add a comment, sign in

More Relevant Posts

Francy Lisboa

AI Agribusiness Consultant & Founder | Generative AI, Prompt Engineering
8mo
Report this post
Learning from AI Failures - The Limitations of Retrieval Augmentation In a recent video, AI researcher code_your_own_AI demonstrated an intriguing failure case of using direct retrieval with GPT-4 and the new OpenAI Assistant API. He was excited to create a personal GPT model with additional vector databases, but ran into issues when the retrieved information was not actually available to the model. On reflection, this failure highlights some fundamental limitations in today's popular retrieval augmentation techniques like Rag, as code_your_own_AI explains: - Models like GPT create their own tokenized vector embeddings based on specific training data. New terms and sequences may not connect well to this existing knowledge space. - External knowledge sources are separate vector representations, not integrated into the model's learned weight tensors. There is no guarantee of semantic similarity! - Simply adding more vector databases does not solve this underlying issue. The interfaces between the model and external knowledge need to be more tightly integrated. The solution proposed is to move to models with multiple "learned knowledge planes" - trained weight tensors for specific tasks that can be switched in as needed. This is better than just bolting on separate vector lookup. Cutting edge AI research from Stanford and Berkeley is already exploring this direction with techniques like tensor parallelism, as code_your_own_AI explains. Wrapping up, the video demonstrates through this excellent failure analysis, sometimes limitations and failures can reveal opportunities for learning and improvement. Understanding the root causes is key to progress. What do you think? Have you encountered situations where AI systems failed due to lack of true integration and understanding? I'm curious to hear others' perspectives and insights here. Please share your thoughts! #rag #openai #standford #nlp #llms #vectorsearch #gpts #assistants #vectordatabase Link to the original video: https://lnkd.in/dbjBTyig

RAG's Collapse: Uncovering Deep Flaws in LLM External Knowledge Retrieval

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Kevin Nguyen

🦄 Working on something exciting
4mo
Report this post
We should know more about Viet movement in AI, especially in NLP. Low resource languages is one of the biggest barriers to democratize AI. Ontocord.ai is trying to solve a lot of those. Big shout out to https://www.ontocord.ai/

Ontocord.AI

ontocord.ai
Like Comment
To view or add a comment, sign in
Joseph Curtin

Sr. Software Engineer & Machine Learning Engineer
4mo
Report this post
Recently I've started to implement an exploration algorithm of some sort on arXiv. This side project comes from reading a large amount of research papers on the BERT Language Model and not knowing all the source terminology of the field. It didn't take long for me to image up a complete search feature set from word embeddings to user interface. The plan is to implement a basic feature set to start with and maybe build up around that. Accepting a n-length text string and return results to the user through a WebUI. I originally gave myself a week to work on this project, but to my avail it took most of that time to download the archive. The next step is to collect some metrics about the dataset. Starting simple with collecting the file-length of each LaTeX source file and figuring out how many files are in the archive. Finally, also determining how long it might take to access the entire dataset.

Hugging Face – The AI community building the future.

huggingface.co
Like Comment
To view or add a comment, sign in
Dr Elakkiya R

Assistant Professor | Convenor & Vice Chair Dubai ACM Professional Chapter | Keynote Speaker | AI Researcher | Young Scientist - SRG(SERB)&DST(SYST) | Bilateral -Russia(DST&RFBR) & UK(RS) | Member - IEEE, IEEE YP, WIE
7mo
Report this post
🔍 Excited to share my latest review on #Deepfakes, exploring the intricate landscape of AI-generated synthetic media. 🚀 👉 Key Highlights: 🌐 Evolution of AI Technology: From basic generative models to advanced multimodal capabilities, the journey is fascinating. 🤯 Societal Impact: Deepfakes' role in eroding trust, spreading misinformation, and influencing political and social dynamics. 🌐 Government Advisory: Insights into the Union Government of India's recent advisory to combat deepfake-related challenges. #DeepfakeReview #AI #SyntheticMedia #DigitalEthics #TechInnovation 🔗 Check out the full review 👇

Review of: "Exploding AI-Generated Deepfakes and Misinformation: A Threat to Global Concern in the 21st Century"

qeios.com
Like Comment
To view or add a comment, sign in
Towards Data Science

630,846 followers
6mo Edited
Report this post
For a beginner-friendly introduction to GANs and their mathematical underpinnings, don't miss Michio Suginoo, CFA (He/Him)'s new explainer.

Mini-Max Optimization Design of Generative Adversarial Networks (GAN)

towardsdatascience.com
Like Comment
To view or add a comment, sign in
Abhishek Patnia

Bridging the gap between cutting-edge AI research and practical application
7mo
Report this post
There is a rich connection between cross entropy loss, perplexity, and bits-per-byte concepts. They are also very relevant to Language Models. This old blog post from Christopher Olah is maybe the best to understand these concepts cleanly. https://lnkd.in/g7tkRGxN

Visual Information Theory

colah.github.io
Like Comment
To view or add a comment, sign in
Alex Gault

President, Small World Ventures. Experienced entrepreneur building technology startups and media companies
8mo Edited
Report this post
Collective Superintelligence (CSi): Amplifying human intellect by connecting large groups of people into superintelligent systems that can solve problems no individual could solve on their own, while also ensuring that human values, morals and interests are inherent at every level. Biologists call the phenomenon Swarm Intelligence and it enables schools of fish, swarms of bees and flocks of birds to skillfully navigate their world without any individual being in charge. They don’t do this by taking votes or polls the way human groups make decisions. Instead, they form real-time interactive systems (that is, swarms) that push and pull on the decision-space and converge on optimized solutions.

The promise of collective superintelligence

https://venturebeat.com
Like Comment
To view or add a comment, sign in
Demetrius Lawson

Senior Cyber Security Engineer at AT&T
1mo
Report this post
Just finished the course “Artificial Intelligence for Students” by Madecraft and Jim Sterne! Check it out: https://lnkd.in/eyCfDxag #artificialintelligence. Really great course in active voice so it's not dull or boring.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Richard Shoemake

AI Architect / Engineer / AI Author / Patented Inventor
8mo
Report this post
Here is a nice reference, and an indication of just how much RAG helps reduce hallucinations. #ai #ml #machinelearning #artificialintelligence #llm #llms #nlp

Cobus Greyling

LLMs, NLP, NLU, Chatbots, Voicebots, CCAI, Ambient Orchestration, Ubiquitous User Interfaces
8mo

Galileo released an LLM Hallucination Index, which makes for very interesting reading. The charts shared considers a Q&A use-case, with and without RAG, and also Long-Form Text Generation. Hallucination has become a catch-all phrase for when the model generates responses which are incorrect or fabricated. Being able to measure hallucination is a first step in managing it. As seen in the article, it is very interesting to see what an equaliser RAG is, and how the disparity in model performance is much lower when RAG is introduced, as apposed to the absence of RAG. Read more here...⬇️ #LargeLanguageModels #RAG #LLMs https://lnkd.in/dG3nMYi2

LLM Hallucination Index

cobusgreyling.medium.com
Like Comment
To view or add a comment, sign in
Jose Crespo

Mathematician lurking in the Tech Underworld
5mo
Report this post
IMO, Active inference is rubbish. The model of the world should, by definition, be a priori, as convincingly stated by Immanuel Kant and Arthur Schopenhauer. It is the way in which the intuitions of space and time can work in our brains in contact with the outside world. The question here is not more inference but more deductive reasoning based on an a priori model of the world. The proof of that: not any mammal, man included, needs to learn everything to know it; in fact, you need very little data in comparison to a machine to understand things.

“Deep Learning is Rubbish” — Karl Friston & Yann LeCun Face Off at Davos 2024 World Economic Forum

medium.com

4 Comments
Like Comment
To view or add a comment, sign in

2,968 followers

View Profile Follow

Mahaveer Dharmchand’s Post

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

transformer-circuits.pub

More from this author

Harnessing the power of Edge computing in retail

Edge Computing opens up new scope for automation

Explore topics

Mahaveer Dharmchand’s Post

More Relevant Posts

RAG's Collapse: Uncovering Deep Flaws in LLM External Knowledge Retrieval

https://www.youtube.com/

More from this author

Harnessing the power of Edge computing in retail

Edge Computing opens up new scope for automation

Explore topics