The MLOps Podcast

By Dean Pleban @ DagsHub

A podcast from DagsHub about bringing machine learning into the real world. Each episode features a conversation with top data science and machine learning practitioners, who'll share their thoughts, best practices, and tips for promoting machine learning to production

Listen on Spotify

Available on

Report content on Spotify

🔥 Getting Data Scientists to Write Better Code with Laszlo Sragner

The MLOps PodcastFeb 14, 2022

00:00

01:05:29

🍪 Machine Learning in the cookie-less era with Uri Goren

In this episode, I chatted with Uri Goren, founder and CEO of Argmax, about Machine Learning and the future of digital advertising in a world moving away from cookies due to privacy laws like GDPR and CCPA. We chat about challenges in maintaining personalized ads while respecting user privacy, and new methods like probabilistic models and contextual features to cover some of the gap left by removing cookies. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:35 The Rise of Privacy Regulations 1:40 The Impact of Losing Cookies 2:48 Understanding Cookies 4:33 Reasons for the Decline of Cookies 8:47 ML Leveraging Cookies in Advertising 10:32 The Shift to Contextual Features 12:53 The Future of ML without Cookies 15:23 New and Old Ways of Generating Contextual Features 20:33 Regulatory Conspiracies 22:33 Unsolved Problems in ML and AI 24:39 Predictions for the Next Year in AI and ML 26:17 Controversial Take: Overuse of LLMs 28:03 Recommendations ➡️ Uri Goren on LinkedIn – https://www.linkedin.com/in/ugoren/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

Apr 18, 202432:48

🛰️ Modern & Realistic MLOps with Han-chung Lee

In this episode, I speak with Han-Chung Lee, a machine learning engineer with a lot of interesting takes on ML and AI. We dive into the buzz around natural language processing and the big waves in generative AI. They chat about how newcomers are racing through NLP’s history, mixing old school and new tech, and the shift towards smarter databases. Han-Chung breaks it down with his straightforward takes, making complex AI trends feel like coffee chat topics. It’s a perfect listen for anyone keen on where AI’s headed, minus the jargon. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Intro 0:41 State of NLP and LLMs 1:33 Repeating the past in NLP 3:29 Vector databases vs. classical databases 8:49 Choosing the right LLM for an application 12:13 Advantages and disadvantages of LLMs 16:10 Where LLMs are most useful 21:13 The dark side of LLMs and can we detect it? 25:19 Thoughts on LLM leaderboard metrics 31:19 Using LLMs in regulated industries 36:40 Creating a moat in the LLM world 40:20 Evaluating LLMs 44:20 Impact of LLM on non-english languages 48:35 Thoughts on MLOps and getting ML into production 56:48 The Hardest Unsolved Problem in ML and AI 59:09 Predictions for the Future of ML and AI 1:03:25 Recommendations and Conclusion ➡️ Han Lee on Twitter – https://twitter.com/HanchungLee ➡️ Han Lee on LinkedIn – https://www.linkedin.com/in/hanchunglee/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

Mar 18, 202401:05:43

🩻 AI in Medical Devices & Medicine with Mila Orlovsky

In this episode, I had the pleasure of speaking with Mila Orlovsky, a pioneer in medical AI. We delve into practical applications, overcoming data challenges, and the intricacies of developing AI tools that meet regulatory standards. Mila discusses her experiences with predictive analytics in patient care, offering tips on navigating the complexities of AI implementation in medical environments. This episode is packed with actionable advice and forward-thinking strategies, making it essential listening for professionals looking to impact healthcare through AI. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 4:03 Early Days of Machine Learning in Medicine 5:19 Challenges in Building Medical AI Systems 6:54 Differences Between Medical ML and Other ML Domains 15:36 Unique Challenges of Medical Data in ML 24:01 Counterintuitive Learnings on the Business Side 28:07 Impact and Value of ML Models in Medicine 29:41 The Role of Doctors in the Age of AI 38:44 Explainability in Medical ML 44:31 The FDA and Compliance in Medical ML 48:56 Feedback and Iteration in Medical ML 52:25 Predictions for the Future of ML and AI 53:59 Controversial Predictions in the Field of ML 56:02 Recommendations 57:58 Conclusion ➡️ Mila Orlovsky on LinkedIn – https://www.linkedin.com/in/milaorlovsky/ 🩺MeDS – Medical Data Science Israel Community – https://www.facebook.com/groups/452832939966464/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

Feb 15, 202458:48

⏪ Making LLMs Backwards Compatible with Jason Liu

In this episode, I had the pleasure of speaking with Jason Liu, an applied AI consultant and the creator of Instructor – an open-source tool for extracting structured data from LLM outputs. We chat about LLM applications, their challenges, and how to overcome them. We also dive into Instructor, making LLMs interact with existing systems and a bunch of other cool things. Join our Discord community: https://discord.gg/tEYvqxwhah ➡️ Jason Liu on Twitter – https://twitter.com/jxnlco 🤖 Instructor Blog – https://jxnl.github.io/instructor/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

Timestamps: 00:00 Introduction 02:18 Excitement about Machine Learning and AI 03:28 Using LLMs as Backend Developers 04:22 Building Applications with LLMs 07:07 Building Instructor 09:30 Thinking in Logic and Design 10:33 Validating Data and Building Systems with Instructor 11:49 Thoughts About Product and UX in LLMs 17:51 Future of Instructor 20:25 Misconceptions and Unsolved Problems in LLMs 24:57 Improving LLM Applications 26:14 RAG as Recommendation Systems 29:32 Fine-tuning Embedding Models 32:32 Beyond Vector Similarity in RAG 39:32 Predictions for the Next Year in AI and ML 45:26 Measuring Impact on Business Outcomes 47:06 The Continuous Cycle of Machine Learning 48:38 Unlocking Economic Value through Structured Data Extraction 50:52 Questioning the Status Quo and Making an Impact

Jan 15, 202453:42

🔴 Live MLOps Podcast – Building, Deploying and Monitoring Large Language Models with Jinen Setpal

In this live episode, I'm speaking with Jinen Setpal, ML Engineer at DagsHub about actually building, deploying, and monitoring large language model applications. We discuss DPT, a chatbot project that is live in production on the DagsHub Discord server and helps answer support questions and the process and challenges involved in building it. We dive into evaluation methods, ways to reduce hallucinations and much more. We also answer the audience's great questions.

Sep 06, 202301:11:38

Live MLOps Podcast Episode!

Join now to take part in our first live MLOps Podcast episode.

I'll be chatting with Jinen Setpal, ML Engineer at DagsHub about his work building LLM applications and getting LLMs into production.

https://www.linkedin.com/events/7098968036782596096/comments/

Aug 28, 202300:29

⛹️‍♂️ Large Scale Video ML at WSC Sports with Yuval Gabay

In this episode, I had the pleasure of speaking with Yuval Gabay, MLOps Engineer at WSC Sports. Yuval builds better infrastructure and automation for developing, training, and deploying machine learning models at scale, with a focus on video data. We talk about MLOps methodologies, standardizing deployment in the organization, and closing the loop back from production into training.

Watch the video: https://youtu.be/3m__nRuifsQ Join our Discord community: https://discord.gg/tEYvqxwhah

➡️ Yuval Gabbay on LinkedIn – https://www.linkedin.com/in/yuval-gabay-68963253/ ➡️ WSC Sports – https://wsc-sports.com/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

Aug 07, 202301:02:09

🤖 GPTs & Large Language Models in production with Hamel Husain

In this episode, I had the pleasure of speaking with Hamel Husain. Hamel is a machine learning and MLOps extraordinaire, he was one of the core maintainers of Fast.ai and has worked on ML and MLOps in places like Data Robot, Airbnb, and GitHub. We talk about Large Language Models, the future role of data scientists in the world of LLMs, and Hamel's approach to solving MLOps problems. Watch the video: https://www.youtube.com/watch?v=3oElMXPkaVs Relevant Links: 🐦 Hamel's Twitter – https://twitter.com/HamelHusain 🟦 Hamel's Linkedin – https://www.linkedin.com/in/hamelhusain/ ✍️ Hamel's amazing blog: https://hamel.dev/blog/posts/nbdev/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

Jun 20, 202301:05:43

🫣 Is Data Science a dying job? with Almog Baku

In this episode, I had the pleasure of speaking with Almog Baku, a serial entrepreneur, consultant in Cloud, AI Infrastructure and Foundational models. We talk about Kubernetes, Large Language Models (LLMs), how to get them into production, and how data is becoming a more central piece of the ML landscape. We also Discuss Almog's newest project, Raptor ML, which helps ML teams productionize ML pipelines.

Watch the video: https://www.youtube.com/watch?v=DCApRXhXD_w&feature=youtu.be Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links: 🦅 Check out Raptor for ML productionization – https://github.com/raptor-ml/raptor 👫 Join the Open AI & Gen AI TLV meetup group – https://www.meetup.com/openai-genai-tlv/ 📄 Read a very cool LLM paper – https://react-lm.github.io/ ➡️ Almog Baku on LinkedIn – https://www.linkedin.com/in/almogbaku/ ➡️ Almog Baku on Twitter – https://twitter.com/almogbaku Recommendation Links: Watch "Foundation" – https://tv.apple.com/us/show/foundation/umc.cmc.5983fipzqbicvrve6jdfep4x3 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

May 23, 202356:08

🏃‍♀️Moving Fast and Breaking Data with Shreya Shankar

In this episode, I had the pleasure of speaking with Shreya Shankar, Ph.D. student at Berkeley RISELab. We chat about auto data validation and MLOps. Shreya shares her insights on several interesting topics, including the challenges of automating the data validation process and how to overcome them. We also discuss what makes organizations able to iterate faster in machine learning, and some predictions about the future of machine learning and MLOps.

Watch the video: https://youtu.be/_hi6--H2Hug Join our Discord community: https://discord.gg/tEYvqxwhah Relevant Links: 📃 Moving Fast with Broken Data – https://arxiv.org/abs/2303.06094v1 🤖 Operationalizing machine learning https://arxiv.org/pdf/2209.09125.pdf 📒 Operationalizing notebooks https://smacke.net/papers/nbslicer.pdf ➡️ Shreya Shankar on LinkedIn – https://www.linkedin.com/in/shrshnk/ ➡️ Shreya Shankar on Twitter – https://twitter.com/sh_reya Recommendation Links: 📺 The Glory – A Korean Revenge Drama – https://www.netflix.com/title/81519223 ⛷️ Go Skiing! 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

Mar 30, 202356:56

🚴‍♀️ Quick & Dirty Machine Learning with Noa Weiss

In this episode, Dean speaks with Noa Weiss, the wonderful AI & ML consultant. They dive into Deep Learning research for marine mammal sounds, abstractions for machine learning projects and some of the unspoken challenges she's seen in the ML development process. Also prediction markets and Harry Potter.

Watch the video: https://www.youtube.com/watch?v=uQrR0KPq3RQ

Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links:

Noa's talks:

🎥 The Quick & Dirty AI Startup: https://www.youtube.com/watch?v=HZ_LRxP3ep0

🎥 Choosing the Right Machine Learning Abstraction for your Business Needs – https://www.youtube.com/watch?v=C0gN47H91HM

➡️ Noa Weiss on LinkedIn – https://www.linkedin.com/in/noa-weiss/

➡️ Noa Weiss on Twitter – https://twitter.com/NWeiss

🌐 Noa's website: https://www.weissnoa.com

Recommendation Links:

- Unsong – https://unsongbook.com/

- Harry Potter & The Methods of Rationality – https://www.hpmor.com/

🌐 Check Out Our Website! https://dagshub.com

Social Links:

➡️ LinkedIn: https://www.linkedin.com/company/dagshub

➡️ Twitter: https://twitter.com/TheRealDAGsHub

➡️ Dean Pleban: https://twitter.com/DeanPlbn

Feb 21, 202350:13

✍️ Building ML Teams and Platforms with Assaf Pinhasi

In this episode, I speak with Assaf Pinhasi, ML engineering and MLOps consultant extraordinaire! Assaf was the VP R&D at Zebra Medical Vision, and built the PayPal Risk organization's Big Data Platform. We dive into building ML infrastructure from scratch 10 years ago vs. today, best practices involved in building teams to support machine learning models in production, and the future of generative models.

Watch the video: https://youtu.be/tSbuDA5tMxQ

Join our Discord community: https://discord.gg/tEYvqxwhah

➡️ Assaf Pinhasi on LinkedIn – https://www.linkedin.com/in/assafpinhasi/

🌐 Check Out Our Website! https://dagshub.com

Social Links:

➡️ LinkedIn: https://www.linkedin.com/company/dagshub

➡️ Twitter: https://twitter.com/TheRealDAGsHub

➡️ Dean Pleban: https://twitter.com/DeanPlbn

Jan 23, 202301:18:10

🎨 Stable Diffusion and generative models with David Marx

In this episode, I speak with David Marx, Distinguished Engineer at Stability AI. This talk dives into how David got into machine learning, open-source software, and Stability AI. We discuss following your curiosity, and what it takes to deploy a model like Stable Diffusion to production.

Watch the video: https://youtu.be/49dsoDK1KCA

Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links:

🛌 Big Sleep by Ryan Murdock – https://colab.research.google.com/drive/1NCceX2mbiKOSlAd_o7IU7nA9UskKN5WR?usp=sharing (Author – https://sigmoid.social/@Adverb)

➡️ David Marx on LinkedIn – https://www.linkedin.com/in/david-marx-b0a5bb14/

➡️ David Marx on Twitter – https://twitter.com/DigThatData

Recommendation Links:

📚 Guerrilla Analytics – https://guerrilla-analytics.net/

💿 Contribute to open source!

🧱 Build lego! It's awesome

🌐 Check Out Our Website! https://dagshub.com

Social Links:

➡️ LinkedIn: https://www.linkedin.com/company/dagshub

➡️ Twitter: https://twitter.com/TheRealDAGsHub

➡️ Dean Pleban: https://twitter.com/DeanPlbn

Jan 19, 202301:14:16

🔴🟢🟣Julia Language in Production with Logan Kilpatrick

In this episode, I speak with Logan Kilpatrick, Julia Language Developer Community Advocate. We talk about machine learning at NASA and how he discovered Julia as a student, the age-old Julia vs. Python debate, and how to get into a new scientific and technical field. It was absolutely awesome! Check it out.

Watch the video: https://www.youtube.com/watch?v=3kgRN8hJIro

Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links:

➡️ Logan Kilpatrick on LinkedIn – https://www.linkedin.com/in/logankilpatrick/

➡️ Logan Kilpatrick on Twitter – https://twitter.com/OfficialLoganK

➡️ Julia Language on Twitter – https://twitter.com/JuliaLanguage

Recommendation Links:

📚 Three-Body Problem Series: https://www.amazon.com/Three-Body-Problem-Cixin-Liu/dp/0765382032

📹 13 Lives on IMDB: https://www.imdb.com/title/tt12262116/

🌐 Check Out Our Website! https://dagshub.com

Social Links:

➡️ LinkedIn: https://www.linkedin.com/company/dagshub

➡️ Twitter: https://twitter.com/TheRealDAGsHub

➡️ Dean Pleban: https://twitter.com/DeanPlbn

Nov 21, 202201:01:03

🛠 Building tools for MLOps with Guy Smoilovsky

In this episode, I speak with Guy Smoilovsky, my friend, Co-Founder, and the CTO of DagsHub. We talk about quantum computing and AGI, concrete approaches for automating ML deployment, and how DagsHub came to be.

Watch the video: https://www.youtube.com/watch?v=67dByhXPT5g

Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links:

➡️ Guy Smoilovsky on LinkedIn – https://www.linkedin.com/in/guy-smoilovsky/

➡️ Guy Smoilovsky on Twitter – https://twitter.com/Guy_T_Sky/

TDD in machine learning – https://towardsdatascience.com/tdd-datascience-689c98492fcc

Recommendation Links:

Astral Codex Ten – https://astralcodexten.substack.com/

Don't Worry About the Vase – https://thezvi.wordpress.com/

The Sandman – https://www.imdb.com/title/tt1751634/

Lady Silver – https://www.ladysilverband.com/

🌐 Check Out Our Website! https://dagshub.com

Social Links:

➡️ LinkedIn: https://www.linkedin.com/company/dagshub

➡️ Twitter: https://twitter.com/TheRealDAGsHub

➡️ Dean Pleban: https://twitter.com/DeanPlbn

Oct 18, 202201:20:55

📈 You Have Too Much Data with Dean Langsam

In this episode, I speak with Dean Langsam, Data Scientist at SentinelOne and one of the organizers of PyData in Israel. We chat about imposter syndrome, the best field in machine learning, why XGBoost is the best model, and the fact that most organizations have too much data. It was fascinating for me, so I hope you enjoy it too.

🎬 Watch the video: https://www.youtube.com/watch?v=Akz_PpDdLlQ

Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links:

⭐️ Join PyData Tel Aviv: https://pydata.org/telaviv2022/ ⭐️

➡️ Dean Langsam on LinkedIn – https://www.linkedin.com/in/deanla/

➡️ Dean Langsam on Twitter – https://twitter.com/dean_la

Recommendation Links:

🌐 Check Out Our Website! https://dagshub.com

Social Links:

➡️ LinkedIn: https://www.linkedin.com/company/dagshub

➡️ Twitter: https://twitter.com/TheRealDAGsHub

➡️ Dean Pleban: https://twitter.com/DeanPlbn

Sep 16, 202201:11:11

🏗 Reasonable Scale MLOps with Jacopo Tagliabue

In this episode, I had the pleasure of speaking with Jacopo Tagliabue, Director of AI at Coveo. We talk about Reasonable Scale MLOps, how to approach building your ML platform, and how quickly you might hit the limits of model deployment (hint: it's pretty surprising)

Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links:

➡️ Jacopo on LinkedIn – https://www.linkedin.com/in/jacopotagliabue/

➡️ Jacopo on Twitter – https://twitter.com/jacopotagliabue

Recommendation Links:

📺The Boys – https://www.imdb.com/title/tt1190634/

📚Gödel, Escher, Bach – https://www.goodreads.com/book/show/24113.G_del_Escher_Bach

📚The Three-Body Problem – https://www.goodreads.com/book/show/20518872-the-three-body-problem

🌐 Check Out Our Website! https://dagshub.com

Social Links:

➡️ LinkedIn: https://www.linkedin.com/company/dagshub

➡️ Twitter: https://twitter.com/TheRealDAGsHub

➡️ Dean Pleban: https://twitter.com/DeanPlbn

Aug 22, 202201:20:48

🦾 Made With ML - Learning How to Apply MLOps with Goku Mohandas

In this episode, I had the pleasure of speaking with Goku Mohandas, founder of Made With ML. Goku has an incredible amount of experience building and teaching the community about machine learning and MLOps systems. We dive into system thinking and solving for ML workflows, his journey in the machine learning world, and how he chooses what to learn next. We discuss the most common mistakes he's seen in productionizing ML models and why building models no one will use is not necessarily bad.

Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links:

🤩 Check out Made With ML and thank us later – https://madewithml.com/

➡️ Goku on LinkedIn – https://www.linkedin.com/in/goku/

➡️ Goku on Twitter – https://twitter.com/gokumohandas

🌐 Check Out Our Website! https://dagshub.com

Social Links:

➡️ LinkedIn: https://www.linkedin.com/company/dagshub

➡️ Twitter: https://twitter.com/TheRealDAGsHub

➡️ Dean Pleban: https://twitter.com/DeanPlbn

Jul 18, 202201:28:20

🤹‍♀️ Building models that actually perform with Kyle Gallatin

In this episode, I had the pleasure of speaking with Kyle Gallatin, a Machine Learning Software Engineer at Etsy. We talk about how he built the machine learning platform at Etsy, experimentation in production (yes, you heard right), and how to optimize model performance at very large scales. It was awesome, and I'm sure many of you can learn a ton from this one!

Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links:
➡️ Kyle on LinkedIn – https://www.linkedin.com/in/kylegallatin/

🌐Check Out Our Website! https://dagshub.com

Social Links:
LinkedIn: https://www.linkedin.com/company/dagshub
Twitter: https://twitter.com/TheRealDAGsHub
Dean Pleban: https://twitter.com/DeanPlbn

Jun 20, 202258:20

💬 MLOps for NLP Systems with Charlene Chambliss

In this episode, I'm speaking with Charlene Chambliss, Software Engineer at Aquarium. Charlene has vast experience getting NLP models to production. We dive into the intricacies of these models and how they differ from other ML subfields, the challenges in productionizing them, and how to get excited about data quality issues.

Join our Discord community: https://discord.gg/tEYvqxwhah

Relevant Links:

➡️Charlene on LinkedIn – https://www.linkedin.com/in/charlenechambliss/
➡️Charlene on Twitter – https://twitter.com/blissfulchar

Recommendations:

🎬3blue1brown – Awesome YouTube channel about math & science: https://www.youtube.com/c/3blue1brown
🎙NLP Highlights – Allen AI Insititute podcast about NLP research: https://soundcloud.com/nlp-highlights
🎙Software engineering daily: https://softwareengineeringdaily.com/
🎙TWiML – Another great podcast about machine learning and AI: https://twimlai.com/
📰Sebastian Ruder's blog and newsletter about NLP and ML: https://ruder.io/
📰Taming the Tail: Adventures in Improving AI Economics: https://a16z.com/2020/08/12/taming-the-tail-adventures-in-improving-ai-economics/
📰State of AI report (2021): https://www.stateof.ai/
📕Learn to learn – Ultralearning by Scott Young: https://www.scotthyoung.com/

🌐Check Out Our Website! https://dagshub.com

Social Links:

🟦LinkedIn: https://www.linkedin.com/company/dagshub
🐦Twitter: https://twitter.com/TheRealDAGsHub
🐦Dean Pleban: https://twitter.com/DeanPlbn

May 16, 202201:01:06

🧩 Simplifying Complex Ideas with Yannic Kilcher

In this episode, I'm speaking with the one and only, Yannic Kilcher! We talk about sunglasses 😎, the value and methodologies behind taking complex machine learning research, and making the idea accessible and digestible. We also discuss reproducibility in machine learning and the moving between research and entrepreneurship.

If you haven't seen his videos you should definitely check them out on his YouTube channel (https://www.youtube.com/c/YannicKilcher).

Join our Discord community: https://discord.gg/tEYvqxwhah

---

Relevant Links:

➡️Yannics's amazing YouTube channel – https://www.youtube.com/c/YannicKilcher
➡️Yannic on LinkedIn – https://www.linkedin.com/in/ykilcher/
➡️Yannic on Twitter – https://twitter.com/ykilcher

Recommendations:

🎬Veritasium – YouTube channel about science with really good explanations about complex topics: https://www.youtube.com/c/veritasium
📖The Fifth Season – Good sci-fi fantasy book: https://www.amazon.com/Fifth-Season-Broken-Earth/dp/0316229296

🌐Check Out Our Website! https://dagshub.com Social

Links:

➡️LinkedIn: https://www.linkedin.com/company/dagshub
➡️Twitter: https://twitter.com/TheRealDAGsHub
➡️Dean PlbnTwitter: https://twitter.com/DeanPlbn

Apr 18, 202201:03:43

🔥 Getting Data Scientists to Write Better Code with Laszlo Sragner

In this episode, we dive into the challenging but very important topic of getting data scientists to write better code. How to approach complex machine learning projects and break them down, and why growing unicorns 🦄 is better than hunting them. Check out this is an awesome conversation with Laszlo Sragner, Founder at 🔥 Hypergolic.

Join our Discord community: https://discord.gg/tEYvqxwhah

---

Timestamps:

00:00 Podcast intro
01:00 Guest introduction
02:34 Why is writing better code important for data scientists?
03:40 How to improve your code
08:17 Don't be afraid of your code.
10:42 Breaking experiments into manageable pieces
12:35 How did your past experiences teach you to strive for better code?
15:21 Proving better code is worth it
18:07 What could be adopted from software development
23:06 What's the most interesting/challenging part of taking models to production?
27:12 What is the hardest part about building a machine learning model?
29:30 How it looks when it works well – a detailed example
36:23 The difference in writing better code in smaller startups compared to larger organizations
39:18 Laszlo's process for the first iteration in a machine learning project
44:33 Breaking data problems down into vertical slices
47:55 End-To-End Platforms vs. Best-of-breed tools
50:30 Obligatory job title discussion...
53:30 Hunting for data science unicorns
56:33 Traits to look for when building a data science team
58:30 Build vs. Buy? What's better?
59:56 What is the most exciting trend in ML and MLOps?
1:00:47 How do you stay up to date?
1:01:40 Recommendations for the audience

---

Relevant Links:

➡️Laszlo's awesome substack – https://laszlo.substack.com/
➡️Laszlo's LinkedIn – https://www.linkedin.com/in/laszlosragner/
➡️Laszlo's Twitter – https://twitter.com/xLaszlo

Recommendations:

👀Explore/Expand/Extract by Kent Beck: https://www.youtube.com/watch?v=FlJN6_4yI2A
👩‍💻Code Quality – Refactoring by Martin Fowler: https://martinfowler.com/books/refactoring.html
📐Geometric Deep Learning by Bronstein/Velickovic: https://www.youtube.com/watch?v=5h6MbQ_65-o
✍️Online Writing by Nicolas Cole: https://www.youtube.com/watch?v=Od5J2V-Lmlg
📕The Last Shadow by Orson Scott Card: https://www.goodreads.com/en/book/show/7108926-the-last-shadow
🎬7 minutes, 26 seconds, and the Fundamental Theorem of Agile Software Development: https://www.youtube.com/watch?v=WSes_PexXcA

🌐Check Out Our Website! https://dagshub.com

Social Links:

➡️LinkedIn: https://www.linkedin.com/company/dagshub
➡️Twitter: https://twitter.com/TheRealDAGsHub
➡️Dean PlbnTwitter: https://twitter.com/DeanPlbn

Feb 14, 202201:05:29

🎓 MLOps lessons learned helping companies build their ML systems with Lee Harper, Lead DS at Catapult

In this episode, I'm speaking with Lee Harper, Principal Data Scientist at Catapult Systems. Lee holds a Ph.D. in Physical and Theoretical Chemistry. Lee is a teacher-turned-data scientist. We cover the various entry paths into the world of data science, the value of background diversity, security in ML production, and even AI fairness.

Join our Discord community: https://discord.gg/tEYvqxwhah

---

Timestamps:

00:00 Podcast intro
01:00 Guest introduction
01:39 How did you get into the fields of data science and machine learning?
05:04 Coding boot camps vs. academia & diversity of backgrounds in ML
09:37 How does the process of bringing your work into production change over the years?
13:02 How has the change in the languages used for data science affected production processes?
16:01 How do you accelerate the timeframes for getting from POC to production in ML?
18:19 Do data scientists reinvent the wheel more often than software developers, and why?
22:14 The value of learning how to Google
23:00 Recurring themes, challenges, and common issues in data science
27:50 Solving for security in ML in production
31:57 ML security considerations for startups
34:30 Data security considerations in ML
35:18 What is the most interesting topic in machine learning right now?
38:05 ML fairness, bias, and responsible AI
41:44 What does it mean to build a fair or unbiased model?
47:15 If you had to choose one challenge in bringing models to production, what would it be?
51:00 What are the tools and processes that you use to make the transition to production easier?
55:35 About "vendor lock-in"
58:00 Your favorite tool recommendations
1:03:35 Recommendations for the audience

---

Relevant Links:

Linux Command Line and Shell Scripting Bible – https://www.amazon.com/Linux-Command-Shell-Scripting-Bible/dp/1119700914
Project Hail Mary – https://www.amazon.com/Project-Hail-Mary-Andy-Weir/dp/0593135202

Social Links:

https://www.linkedin.com/company/dagshub/
https://www.linkedin.com/company/catapult-systems/
https://www.linkedin.com/in/leeharper2425/
https://twitter.com/DeanPlbn
https://twitter.com/TheRealDAGsHub

Nov 04, 202101:08:50

🧠 Algorithmic challenges in bringing ML models into production with Roey Mechrez, CTO at BeyondMinds

In this episode, I'm speaking with Roey Mechrez from BeyondMinds. Roey holds a Ph.D. in Electrical Engineering, with vast experience in computer vision and deep learning research. We discuss the challenges of gluing together infrastructure solutions for an end-to-end ML platform, as well as generating monitoring insights for non-technical stakeholders and combating catastrophic forgetting.

Join our Discord community: https://discord.gg/tEYvqxwhah

---

Timestamps:

00:00 Podcast intro
01:00 Guest intro
01:49 What does BeyondMinds do?
06:24 Audience for an end-to-end ML platform
12:14 Communicating with non-technical stakeholders/users
15:03 The future of "AI-powered tools", and human-machine collaboration
20:04 On complex system orchestration, generating insights from monitoring, and catastrophic forgetting – Biggest challenges in production ML
25:23 Why is catastrophic forgetting a hard problem and how do you deal with it?
30:02 "Secret" tips on how to get started with automating the retraining process
33:30 Generating monitoring insights and observations in a user-friendly format
38:12 Making data labeling issues explainable (automatically)
45:07 Customizing complex systems per user – Orchestrating an ML platform
52:58 API design in ML platform components
55:45 Measuring success for researchers, ML engineers, and software developers – can ML work fit into the Agile workflow.
1:02:22 Is "time to production" a good metric? Gains in time to production in the real world
1:06:02 How do you divide the work between ML researchers and engineers?
1:08:39 Recommendations for the audience

---

Relevant Links:

Social Links:

Sep 20, 202101:13:44

🐤 Feature stores and CI/CD for machine learning with Qwak.ai VP Engineering, Ran Romano

In this episode, I'm speaking with Ran Romano from Qwak.ai. Ran built the ML platform at Wix, and we discuss the various data roles, when organizations should focus on ML infrastructure, solving the hard problems of features stores, and one approach to building an end-to-end ML platform.

Join our Discord community: https://discord.gg/tEYvqxwhah

---
Timestamps:
00:00 Podcast intro
01:00 Guest intro
01:30 Getting into the world of ML and ML Engineering
02:25 The line between Data Engineer, ML Engineer, and Data Scientist
03:50 The future of data roles – what are the trends?
07:21 The most exciting part about taking ML models into production
09:45 Jupyter notebooks in production (again??)
10:41 Signs that notebook productionization might not work
11:42 Building ML-focused CI/CD systems
15:32 Early days of building out the Wix ML platform
16:22 Signs that you might need to focus on ML infrastructure in your organization, and how to convince other stakeholders.
19:21 What part of the platform that you built are you most proud of?
23:51 Defining a feature store and the training/serving skew
27:24 Onboarding data scientists to using a feature store
33:49 When is it too early to build an ML platform?
35:33 Open source components – What parts of your platform did you choose not to build yourself?
40:16 Qwak.ai – What are you working on currently?
41:07 How do you define an "end-to-end" platform in the case of Qwak
44:25 End-to-end vs. Integrated – Advantages and disadvantages

---
Relevant Links:
- Qwak.ai: https://www.qwak.ai
- Wix ML Platform presentation by Ran: https://www.youtube.com/watch?v=E8839ENL-WY

- https://www.linkedin.com/company/dagshub
- https://www.linkedin.com/company/qwak-ai/

- https://twitter.com/TheRealDAGsHub
- https://twitter.com/DeanPlbn
- https://twitter.com/ranvromano

Aug 11, 202145:34

🤗 Large ML models in production with HuggingFace CTO Julien Chaumond

In this episode, I'm speaking with Julien Chaumond from 🤗 HuggingFace, about how they got started, getting large language models to production in millisecond inference times, and the CERN for machine learning.

Join our Discord community: https://discord.gg/tEYvqxwhah

---

Timestamps:

01:00 - Guest intro
02:14 - Origin of HuggingFace
05:37 - Why the focus on NLP?
07:45 - The success of the HuggingFace community
13:14 - Reproducing models and scaling for the community
18:14 - Enabling large models in production
23:14 - How HuggingFace scales so many models
27:34 - The biggest challenge HuggingFace solved in MLOps
32:02 - How HuggingFace transitions from research to production
34:44 - Using notebooks vs python modules
38:27 - The most interesting topic in ML production
40:10 - Fascinating ML research
45:24 - Learning new things
51:14 - Something that is true but most people disagree with
56:54 - Tips to organize research teams
1:00:05 - New features for accelerated inference
1:01:35 - Most common use case of HuggingFace
1:04:17 - Integrating search algorithms into transformer library
1:05:09 - Integrating vision models
1:06:06 - Long term business model
1:10:55 - Automation and simplification of the process of building models
1:13:02 - Support for real-time inference
1:14:40 - Recommendations for the audience

---

Relevant Links:

Jul 04, 202101:19:12

🛣 Finding your path in ML with NLP Engineer Urszula Czerwinska

In this episode, I'm speaking with Urszula Czerwinska about her path as a data scientist, the projects she worked on, experiences gained as a data scientist, as well as the challenges she's overcome in bringing her machine learning (ML) into production.

Join our Discord community: https://discord.gg/tEYvqxwhah

---

Timestamps:
0:00 - Podcast intro
1:15 - Guest intro and how you got into data science
3:48 - Finding your fit – research or industry and when to transition
7:23 - What types of ML projects do you specialize in
10:41 - ML explainability and interpretability
15:26 - ML explainability with non-technical stakeholders
17:13 - What problems does your team solve within the organization
20:56 - ML in production – how to bring your ML projects from research to production
25:17 - The tools you can't live without
28:11 - Do you have a set process for productizing ML projects
30:08 - Team structures and communication for data science teams
33:42 - Who's in charge of setting up infrastructure for a project and job title discussion
36:29 - Interesting tools and repositories you work with
39:30 - How do you stay up to date
42:00 - Biggest challenges for you in ML
45:12 - Favorite and least favorite thing about being a data scientist
49:52 - Handling a workplace that doesn't understand what a data scientist is
53:07 - Data scientists are 🦄 53:30 Good papers you read recently
58:12 - Tips to improve the data science workflow

Relevant Links:
- flair: https://github.com/flairNLP/flair
- AllenNLP: https://github.com/allenai/allennlp
- Papers with Code: https://paperswithcode.com/
- Dair.ai newsletter: https://dair.ai/newsletter/
- HuggingFace: https://huggingface.co/blog

Apr 27, 202101:01:18