Super Data Science: ML & AI Podcast with Jon Krohn

Jon Krohn

The latest machine learning, A.I., and data career topics from across both academia and industry are brought to you by host Dr. Jon Krohn on the Super Data Science Podcast. As the quantity of data on our planet doubles every couple of years and with this trend set to continue for decades to come, there's an unprecedented opportunity for you to make a meaningful impact in your lifetime. In conversation with the biggest names in the data science industry, Jon cuts through hype to fuel that professional impact. Whether you're curious about getting started in a data career or you're a deep technical expert, whether you'd like to understand what A.I. is or you'd like to integrate more data-driven processes into your business, we have inspiring guests and lighthearted conversation for you to enjoy. We cover tools, techniques, and implementation tricks across data collection, databases, analytics, predictive modeling, visualization, software engineering, real-world applications, commercialization, and entrepreneurship − everything you need to crush it with data science.

All Episodes

801: Merged LLMs Are Smaller And More Capable, with Arcee AI's Mark McQuade and Charles Goddard

Merged LLMs are the future, and we’re exploring how with Mark McQuade and Charles Goddard from Arcee AI on this episode with Jon Krohn. Learn how to combine multiple LLMs without adding bulk, train more efficiently, and dive into different expert approaches. Discover how smaller models can outperform larger ones and leverage open-source projects for big enterprise wins. This episode is packed with must-know insights for data scientists and ML engineers. Don’t miss out!Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• Explanation of Charles' job title: Chief of Frontier Research [03:31]• Model Merging Technology combining multiple LLMs without increasing size [04:43]• Using MergeKit for model merging [14:49]• Evolutionary Model Merging using evolutionary algorithms [22:55]• Commercial applications and success stories [28:10]• Comparison of Mixture of Experts (MoE) vs. Mixture of Agents [37:57]• Spectrum Project for efficient training by targeting specific modules [54:28]• Future of Small Language Models (SLMs) and their advantages [01:01:22]Additional materials: www.superdatascience.com/801

Jul 16

1 hr 17 min

800: A Transformative Century of Technological Progress, with Annie P.

The SuperDataScience Podcast is celebrating its 800th episode! Host Jon Krohn speaks to his grandmother, Annie, about growing up at a time when so many technologies we take for granted today were yet to be developed. Listen in to hear Annie’s experience of the changes in technology across 94 years and how she and her family fared in 1940s Ukraine with no electricity or running water.Additional materials: www.superdatascience.com/800Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jul 12

43 min 37 sec

799: AGI Could Be Near: Dystopian and Utopian Implications, with Dr. Andrey Kurenkov

No-code games with GenAI, the creative possibilities of LLMs, and our proximity to AGI: In this episode, Jon Krohn talks to Andrey Kurenkov about what turned him from an AGI skeptic to a positivist. You’ll also hear about his wildly popular podcast “Last Week in AI” and how the NVIDIA-backed startup Astrocade is helping videogame enthusiasts to create their own games through generative AI. A must-listen!This episode is brought to you by AWS Inferentia and AWS Trainium. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• All about The Gradient and Last Week in AI [10:42]• All about Astrocade and Andrey’s role at the startup [24:35]• Balancing UX and creative control at Astrocade [42:00]• The creative possibilities of LLMs [1:04:15]• The rapid emergence of AGI [1:10:31]Additional materials: www.superdatascience.com/799

Jul 9

1 hr 45 min

798: Claude 3.5 Sonnet: Frontier Capabilities & Slick New "Artifacts" UI

Claude 3.5 Sonnet, Anthropic’s newest model, is making waves in the AI community. This mid-size model outshines the larger Claude 3 Opus in tasks like code generation, content creation, and document summarization, and it’s twice as fast. In this episode of The Super Data Science Podcast, Jon Krohn discusses its top-notch performance across benchmarks like MMLU, GPQA, and HumanEval, along with its improved machine vision capabilities. Plus, learn about the new Artifacts UI feature, which makes managing generated content easier by displaying outputs side-by-side with inputs. Tune in to find out why Claude 3.5 Sonnet is setting new standards in AI.Additional materials: www.superdatascience.com/798Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jul 5

15 min 10 sec

797: Deep Learning Classics and Trends, with Dr. Rosanne Liu

Dr. Rosanne Liu, Research Scientist at Google DeepMind and co-founder of the ML Collective, shares her journey and the mission to democratize AI research. She explains her pioneering work on intrinsic dimensions in deep learning and the advantages of curiosity-driven research. Jon and Dr. Liu also explore the complexities of understanding powerful AI models, the specifics of character-aware text encoding, and the significant impact of diversity, equity, and inclusion in the ML community. With publications in NeurIPS, ICLR, ICML, and Science, Dr. Liu offers her expertise and vision for the future of machine learning.Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• How the ML Collective came about [03:31]• The concept of a failure CV [16:12]• ML Collective research topics [19:03]• How Dr. Liu's work on the “intrinsic dimension” of deep learning models inspired the now-standard LoRA approach to fine-tuning LLMs [21:28]• The pros and cons of curiosity-driven vs. goal-driven ML research [29:08]• Discussion on Dr. Liu's research and papers [33:17]• Character-aware vs. character-blind text encoding [54:59]• The positive impacts of diversity, equity, and inclusion in the ML community [57:51]Additional materials: www.superdatascience.com/797

Jul 2

1 hr 9 min

796: Earth's Coming Population Collapse and How AI Can Help, with Simon Kuestenmacher

Want to feel optimistic about your day? In this Friday episode, Simon Kuestenmacher talks to Jon Krohn about demography: What it is, why it’s so important, and why its forecasts should give us reason to hope for a better future. In an increasingly globalized world, and with an aging population in countries with the biggest GDPs, demography is more valuable than ever.Additional materials: www.superdatascience.com/796Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jun 28

42 min 45 sec

795: Fast-Evolving Data and AI Regulatory Frameworks, with Dr. Gina Guillaume-Joseph

Gina Guillaume-Joseph talks to Jon Krohn about the data and regulatory frameworks set to transform the AI industry and why that’s important to anyone working with data. This episode offers a solid path to understanding AI regulation’s past, present and future. Gina walks listeners through the AI Bill of Rights, the NIST AI Risk Framework and the MITRE ATLAS threat model.This episode is brought to you by AWS Inferentia and AWS Trainium, by Crawlbase, the ultimate data crawling platform, and by Babbel, the science-backed language-learning platform. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn:• What “responsible AI” means [08:14]• Why the federal government should be behind AI regulation [12:22]• The US vs EU on AI regulation [18:46]• About the AI Bill of Rights [26:14]• About MITRE and the MITRE Atlas [37:19]• What a systems engineer does [54:11]Additional materials: www.superdatascience.com/795

Jun 25

1 hr 6 min

794: Exciting (and Frightening!) Trends in Open-Source AI

Trends in open-source AI: Join Jon Krohn and a panel of data science icons as they discuss the most exciting and concerning developments in open-source AI. Hear insights from Drew Conway, Jared Lander, Emily Zabor, and JD Long on the transformative potential of AI and its future impact.Additional materials: www.superdatascience.com/794Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jun 21

11 min 2 sec

793: Bayesian Methods and Applications, with Alexandre Andorra

Bayesian methods take the spotlight in this episode with Alex Andorra, co-founder of PyMC Labs, and Jon Krohn. Learn how Bayesian techniques handle tough problems, make the most of prior knowledge, and work wonders with limited data. Alex and Jon break down essentials like PyMC, PyStan, and NumPyro libraries, show how to boost model efficiency with PyTensor, and talk about using ArviZ for top-notch diagnostics and visualizations. Plus, get into advanced modeling with Gaussian Processes.This episode is brought to you by Crawlbase, the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• Practical introduction to Bayesian statistics [04:54]• Definition and significance of epistemology [17:52]• Explanation of PyMC and Monte Carlo methods [27:57]• How to get started with Bayesian modeling and PyMC [34:26]• PyMC Labs and its consulting services [50:50]• ArviZ for post-modeling diagnostics and visualization [01:02:23]• Gaussian processes and their applications [01:09:02]Additional materials: www.superdatascience.com/793

Jun 18

1 hr 33 min

792: In Case You Missed It in May 2024

Jon Krohn shares his favorite clips from May. Hear how Navdeep Martin is spearheading a company to tackle the climate crisis, why Sol Rashidi and Demetrios Brinkmann find nailing job titles so necessary in the fast-paced industries of tech and AI, and get the latest on embeddings with Luis Serrano.Additional materials: www.superdatascience.com/792Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.

Jun 14

22 min 42 sec

791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert

Reinforcement learning through human feedback (RLHF) has come a long way. In this episode, research scientist Nathan Lambert talks to Jon Krohn about the technique’s origins of the technique. He also walks through other ways to fine-tune LLMs, and how he believes generative AI might democratize education.This episode is brought to you by AWS Inferentia and AWS Trainium, and by Crawlbase, the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.In this episode you will learn:• Why it is important that AI is open [03:13]• The efficacy and scalability of direct preference optimization [07:32]• Robotics and LLMs [14:32]• The challenges to aligning reward models with human preferences [23:00]• How to make sure AI’s decision making on preferences reflect desirable behavior [28:52]• Why Nathan believes AI is closer to alchemy than science [37:38]Additional materials: www.superdatascience.com/791

Jun 11

57 min 10 sec

790: Open-Source Libraries for Data Science at the New York R Conference

The experts reveal their top open-source R libraries with us live from the New York R Conference! This Super Data Science Podcast episode features an exclusive panel with data science trailblazers Drew Conway, Jared Lander, Emily Zabor, and JD Long. They share their favorite R libraries and valuable insights to enhance your data science practice. Additional materials: www.superdatascience.com/790Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.

Jun 7

7 min 25 sec