Skip to main content
The Data Scientist Show

The Data Scientist Show

By Daliana Liu

A deep dive into data scientists' day-to-day work, tools and models they use, how they tackle problems, and their career journeys. This podcast helps you grow a successful career in data science. Listening to an episode is like having lunch with an experienced mentor. Guests are data science practitioners from various industries, AI researchers, economists, and CTOs of AI companies. Host: Daliana Liu, an ex-Amazon senior data scientist with 180k followers on Linkedin.
Join 20k subscribers at www.dalianaliu.com to learn more about data science, career, and this show. Twitter @DalianaLiu.

Available on
Apple Podcasts Logo
Google Podcasts Logo
Pocket Casts Logo
RadioPublic Logo
Spotify Logo
Currently playing episode

Weather forecasting with AI, Kaggle tips and tricks, dealing with missing data, deep learning with Jesper Dramsch, The Data Scientist Show #040

The Data Scientist ShowJun 16, 2022

00:00
01:58:12
Why data scientists are tired, six real data scientists' frustrations - The Data Scientist Show #089
Apr 17, 202442:22
Why 80% of A/B tests fail, how to 10X your experimentation velocity - Kristi Angel - The Data Scientist Show #088

Why 80% of A/B tests fail, how to 10X your experimentation velocity - Kristi Angel - The Data Scientist Show #088

Most experimentations fail, Kristi Angel shares her expertise on scaling experimentation and avoiding common A/B testing pitfalls. Learn five things that can help boost test velocity, designing impactful experiments, and leveraging knowledge repos. (Chapters below)

Kristi Angel’s LinkedIn: ⁠https://www.linkedin.com/in/kristiangel/


Subscribe to Daliana's newsletter on ⁠www.dalianaliu.com⁠ for more on data science and career.

Daliana's Twitter: ⁠https://twitter.com/DalianaLiu⁠

Daliana’s LinkedIn: ⁠https://www.linkedin.com/in/dalianaliu/⁠


(00:00:00) Intro

(00:01:26) Why do most experimentations fail?

(00:07:05) Mistakes in choosing metrics

(00:10:05) Is revenue a good metric?

(00:13:18) Split metrics in three ways

(00:15:10) Daliana's story with too many category breakdowns

(00:16:59) What makes the best data science team?

(00:19:24) Data scientist work in silo vs in a data science team

(00:21:15) Building a knowledge center

(00:23:40) Example of knowledge center; nuance of experimentations

(00:26:09) How many metrics and variants?

(00:30:56) How to reduce noise - CUPED

(00:33:01) Future of A/B testing

(00:38:33) Q&A: Low statistical power

Apr 08, 202443:46
From physics PhD to data science leader, unexpected challenges in survey data, Python vs R, EDA best practices, building MLOps toolkit - Julia Silge - The Data Scientist Show #087

From physics PhD to data science leader, unexpected challenges in survey data, Python vs R, EDA best practices, building MLOps toolkit - Julia Silge - The Data Scientist Show #087

Julia Silge is an engineering manager at Posit PBC, formerly know as R-studio, where she leads a team of developers building open source software MLOps. Before Posit, she finished a PhD in astrophysics, worked for several years in the nonprofit space, and was a data scientist at Stack Overflow where some of her most public work involved the annual developer survey. We talked about MLOps tools, challenges in survey data, text analysis, and balancing her interests in data science and engineering.

Subscribe to Daliana's newsletter on ⁠www.dalianaliu.com⁠ for more on data science and career.

Daliana's Twitter: ⁠https://twitter.com/DalianaLiu⁠

Daliana’s LinkedIn: ⁠https://www.linkedin.com/in/dalianaliu/⁠


(00:00:00) Introduction

(00:00:56) Getting into data science

(00:04:50) Transition from data centers to engineering manager

(00:14:04) Common challenges in tool development

(00:17:38) Challenges with survey data

(00:26:47) Engineering skills for data scientists

(00:28:59) Balancing roles

(00:34:49) Developing skills in Exploratory Data Analysis (EDA)

(00:39:19) Python vs. R for data analysis

(00:44:40) Exciting aspects in career and personal life

Mar 30, 202446:18
Why he created Pandas, the future of data systems, why he left his CTO role to become a chief architect - Wes McKinney - The Data Scientist Show #086

Why he created Pandas, the future of data systems, why he left his CTO role to become a chief architect - Wes McKinney - The Data Scientist Show #086

Wes McKinney is the co-creator of pandas library and he is the cofounder of Voltron data. Currently he is a principal Architect at Posit and an investor in data systems.

Daliana's Twitter: ⁠https://twitter.com/DalianaLiu⁠

Daliana’s LinkedIn: ⁠https://www.linkedin.com/in/dalianaliu/⁠

Wes' LinkedIn: https://www.linkedin.com/in/wesmckinn/

(00:00:00) Introduction

(00:00:44) How Pandas Started

(00:06:40) Voltron Data

(00:10:03) Benefits of Easy-to-Use Data Tools

(00:13:20) The Rise of New Data Tools

(00:18:07) Choosing Tools: Vertical or Flexible?

(00:23:01) Big Models and Data Tools

(00:29:29) Challenges in Building a Product

(00:31:28) Becoming a Top Architect

(00:34:55) Missed Aspects of Previous Roles

(00:39:04) A Busy Week: Advising, Designing, Investing

(00:43:42) Improving Open Source

(00:45:24) How to Decide What to Work On

(00:46:28) What he’s learning now

(00:47:56) Excitement in Career and Life

(00:48:29) Using ChatGPT for Learning

(00:50:27) Future Impact Goals

Mar 22, 202452:24
From financial analyst to director of analytics, how to get promoted quickly, 7 elements of influence - Christopher Fricker - The Data Scientist Show #085

From financial analyst to director of analytics, how to get promoted quickly, 7 elements of influence - Christopher Fricker - The Data Scientist Show #085

Christopher Fricker is a senior director in analytics and BI at Renaissance Learning. He started his career in finance and later became a data science consultant working with Meta, Netflix, and pre-IPO tech companies doing analytics. We talked about the mental models that helped him grow from a finance analyst to an analytics leader.

Subscribe to Daliana's newsletter on ⁠www.dalianaliu.com⁠ for more on data science and career.


Chris’ LinkedIn: https://www.linkedin.com/in/christopherfricker/

Daliana's Twitter: ⁠https://twitter.com/DalianaLiu⁠

Daliana’s LinkedIn: ⁠https://www.linkedin.com/in/dalianaliu/⁠


(00:00:00) Introduction (00:01:46) How to get promoted quickly (00:08:40) Power vs authority (00:11:21) First principal thinking (00:32:34) ROI of a data team (00:40:29) How to be persuasive (00:54:52) All Data is wrong (00:56:22) How he audits the data (01:00:52) How to make someone help you at work

Mar 15, 202401:13:51
Adapters: the game changer for fine-tuning - Geoffrey Angus - The Data Scientist Show #084

Adapters: the game changer for fine-tuning - Geoffrey Angus - The Data Scientist Show #084

I interviewed Geoffery Angus, ML team lead @Predibase to talk about why adapter-based training is a game changer. We started with an overview of fine-tuning and then discussed five reasons why adapters are the future of LLMs. Later we also shared a demo and answered questions from the live audience. Try fine-tuning for free: https://pbase.ai/GetStarted Geoffrey’s LinkedIn:https://www.linkedin.com/in/geoffreyangus Daliana's Twitter: ⁠https://twitter.com/DalianaLiu⁠ Daliana’s LinkedIn: ⁠https://www.linkedin.com/in/dalianaliu/⁠


Daliana's Twitter: ⁠https://twitter.com/DalianaLiu⁠

Daliana’s LinkedIn: ⁠https://www.linkedin.com/in/dalianaliu/

Geoffrey’s LinkedIn: https://www.linkedin.com/in/geoffreyangus

Try finetuning for free: https://pbase.ai/GetStarted


(00:00:00) Intro

(00:01:19) What is Fine-tuning?

(00:08:18) Utilizing Adapters for Finetuning Enhancement

(00:09:50) 5 reasons why adapters are the future of LLMs

(00:26:34) Common Mistakes in Adapters Usage

(00:28:34) Training Your Own Adapter

(00:32:23) Behind the Scenes of the Adapter Training Process

(00:37:51) Config File Guidance for Fine-Tuning

(00:39:41) Debugging Strategies for Suboptimal Fine-Tuning Results

(00:42:23) User Queries: Creating a LoRa Adapter and Future Support

(00:51:06) Key Takeaways and Recap

Mar 08, 202452:45
Landing a job by analyzing Seattle's crime data, from data scientist to founder of interview query, building a lifestyle business - Jay Feng - The Data Scientist Show #083

Landing a job by analyzing Seattle's crime data, from data scientist to founder of interview query, building a lifestyle business - Jay Feng - The Data Scientist Show #083

Jay Feng created a viral project using Seattle crime data and later got into data science. He later founded "Interview Query" helping data scientists get jobs. We'll talk about how he landed his data science job through his blog, and his journey from data scientist to founder. Subscribe to Daliana's newsletter on ⁠www.dalianaliu.com⁠ for more on data science and career.


Daliana's Twitter: ⁠https://twitter.com/DalianaLiu⁠

Daliana’s LinkedIn: ⁠https://www.linkedin.com/in/dalianaliu/⁠

Jay Feng's LinkedIn: ⁠https://www.linkedin.com/in/jay-feng-ab66b049/⁠

Jay Feng's YouTube: ⁠https://www.youtube.com/c/DataScienceJay⁠


(00:00:00) Introduction

(00:01:11) From engineer to data scientist

(00:03:10) Got a job through a project

(00:05:35) Daliana's portfolio project with Zillow

(00:09:13) From data scientist to entreprenuer

(00:13:19) "Tinder" for job

(00:15:01) How he chose companies to work for

(00:15:56) Why he became an entreprenuer

(00:17:37) How many hours does he work

(00:18:54) Challenges when building "interview query"

(00:20:18) Speed vs scale

(00:22:11) Growth hacks he used

(00:24:22) YouTube vs newsletter

(00:27:21) Lessons he learned as a CEO

(00:29:16) How to grow from tech employee to founder

(00:31:59) How he defines success

(00:34:38) If you have a business idea for Jay

Feb 29, 202435:41
Case studies from the GenAI frontier, scaling ML teams, from biologist to machine learning consultant- Erik Gafni - The Data Scientist Show #082

Case studies from the GenAI frontier, scaling ML teams, from biologist to machine learning consultant- Erik Gafni - The Data Scientist Show #082

Erik Gafni builds AI systems and teams. He founded Eventum AI (https://bit.ly/eventum-ai), an ML consulting company working with high-growth startups. We talked about GenAI projects he worked on, how he built production ML systems, how to scale ML teams, and his journey from biologist to ML researcher.

  • Interested in working with Erik: https://bit.ly/erik-consulting
  • Erik's LinkedIn: https://bit.ly/erik-gafni-LI

(00:00:00) Introduction

(00:01:59) Is GenAI overhyped?

(00:04:28) Ascent translation with AI

(00:11:58) Social media app with AI

(00:14:00) Stable diffusion model evaluation

(00:15:57) "Consult-to-hire" model

(00:17:35) AI in biotech

(00:22:46) Self-supervised learning

(00:31:22) How he hires people

(00:33:19) Research vs production

(00:35:57) Is AGI coming?

(00:37:30) New trends in GenAI

(00:41:45) Data quality in GenAI

(00:42:58) Philosophy in LLMs

(00:49:48) OpenAI vs Open Source

(00:53:58) Mistakes he made

(00:57:41) How did he get into ML

Feb 24, 202401:03:41
Data science job market in 2024, softskills for interviews, AI engineering - Jay Feng - The Data Scientist Show #081

Data science job market in 2024, softskills for interviews, AI engineering - Jay Feng - The Data Scientist Show #081

Jay Feng is the CEO of interview query, a service that help data scientists get jobs. Previously he worked as a data scientist at Nextdoor, Monster. We talked about data science job market, the rise of AI engineering, and the softskills people overlook during interviews. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.


Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/

Jay Feng's LinkedIn: https://www.linkedin.com/in/jay-feng-ab66b049/

Jay Feng's YouTube: https://www.youtube.com/c/DataScienceJay


00:00:00 Introduction

00:01:11 Data science job market in 2024

00:09:13 Build projects with AI

00:16:19 Softskills in interviews

00:23:18 Daliana's story on "socializing ideas"

00:28:38 Common mistakes in interviews

00:35:30 Product DS vs ML interviews

00:36:27 Product analytics interview questions

00:39:18 Career transition in DS

00:43:04 Jay's career journey

00:45:38 Is there a principal data analyst?

00:51:52 AI engineer

00:54:28 New roles vs obsolete roles in DS

01:04:46 Is data science dead?

Feb 16, 202401:07:14
How to handle being laid off (as data scientists), severance negotiation, full-time employment vs independent consultant - The Data Scientist Show #080
Feb 09, 202401:06:34
From data analyst to sales engineer, personality-based career design, sales skills for data people - Jenny Wu - The Data Scientist Show #079

From data analyst to sales engineer, personality-based career design, sales skills for data people - Jenny Wu - The Data Scientist Show #079

Jenny Wu is a data analyst turned sales engineer for data products at Hex. We talked about sales engineer vs data analyst, how to design a career based on your personality, and how to transition into a customer-facing role.

Jenny’s LinkedIn: https://www.linkedin.com/in/jenny-wu-... Daliana's Twitter: https://twitter.com/DalianaLiu Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/


(00:00:00) Introduction (00:01:34) What is a Sales Engineer? (00:09:35) Sales Engineering Day-to-Day (00:13:09) Challenge in sales (00:21:37) Traits of Successful Salespeople (00:30:32) Stakeholder Engagement (00:36:24) Getting into customer-facing roles (00:43:55) Quitting her job to travel the world (00:48:05) Advice on Career Breaks (00:50:39) Embedding Career and Personal Goals (00:51:57) How do you achieve happiness?

Feb 01, 202457:27
The future of data science teams, integrating AI into data science workflows, building data apps for stakeholders - Barry McCardel - The Data Scientist Show #078

The future of data science teams, integrating AI into data science workflows, building data apps for stakeholders - Barry McCardel - The Data Scientist Show #078

Barry McCardel is the cofounder and CEO of Hex(free trial: hex.tech/dsshow), a collaborative data workspace. Their customers include FiveTran, Notion, and Anthropic. We talked about what does the future of data team look like, how to tackle challenges of data team collaborations, and how to leverage AI in data science’s workflow. 60-day Free Trial: hex.tech/dsshow Barry’s LinkedIn: https://www.linkedin.com/in/barrymccardel (00:00:00) Introduction (00:01:25) Is AI replacing data scientists?

(00:06:08) Are data science teams getting smaller?

(00:09:54) What is Hex? (00:11:24) How to communicate with stakeholders

(00:24:29) Should data scientists be full stack?

(00:31:23) How data team measure ROI (00:33:35) Quantitative vs qualitative analysis (00:35:33) When you shouldn't use data? Data vs product intuition

(00:41:39) How to hire your first data team? (00:48:59) Is the modern data stack dead?

(00:53:55) GenAI in data science workflows

(00:59:03) Future of data scientist

(01:02:30) New features in Hex

Jan 21, 202401:04:56
Product data science for Microsoft AI, data scientist's role of GenAI, how to deal with burn out - Sid Sharan - The Data Scientist Show #077

Product data science for Microsoft AI, data scientist's role of GenAI, how to deal with burn out - Sid Sharan - The Data Scientist Show #077

Siddhartha Sharan is a Senior Data and Applied Scientist at Microsoft, helping product teams make data-driven decisions. Currently he is working on an AI product built with OpenAI APIs for sentiment analysis. We talked about how he evaluates AI products built with large language models at Microsoft, product data science, and how he went from a business background to data science. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.


Sid’s LinkedIn: https://www.linkedin.com/in/siddharthasharan/

Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/


(00:00:00) Introduction
(00:05:20) How does Microsoft evaluate AI product
(00:16:17) Using OpenAI API for sentiment analysis
(00:25:29) Microsoft data science team culture
(00:26:52) DS, PM collaboration
(00:28:29) Three steps to build trust in data science
(00:30:13) How did he got into Microsoft
(00:34:09) Level up in Genetech
(00:36:09) ML engineer vs Product DS
(00:37:43) Core skills in product DS
(00:40:20) Hiring
(00:42:47) How to deal with burnout
(00:45:03) Should you over work to earn trust?
(00:45:44) Daliana's story about first day at Amazon
(00:49:54) Will AI replace data scientists?
(00:51:32) Data scientist's role of GenAI
(00:54:32) How to keep up with GenAI

Jan 15, 202458:58
How she doubled her salary in a year as a data analyst, SQL in the real world, is job hopping bad? - Jess Ramos - The Data Scientist Show #076

How she doubled her salary in a year as a data analyst, SQL in the real world, is job hopping bad? - Jess Ramos - The Data Scientist Show #076

Jess Ramos is a Senior Data Analyst at Crunchbase, a LinkedIn Learning Instructor, and a content creator in the data space. She has a bachelor's degree in Math, Spanish, and Business from Berry University and a master's in Business Analytics from University of Georgia. Today we’ll talk about SQL in the real world, data analyst vs data scientist, is job hopping bad, how she negotiated her salary. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career. Jess’ Linkedin: https://www.linkedin.com/in/jessramosmsba/

Preparing to Get a Job in Data Analytics: shorturl.at/sCNPT

Solve Real-World Data Problems with SQL: https://bit.ly/3Zq6wnd

Big Data Energy Newsletter: https://bit.ly/46x4rIR


Daliana's Twitter: https://twitter.com/DalianaLiu Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/ (00:00:00) Introduction (00:01:24) Why Jess left her job at Freddie Mac (00:03:25) Is job hopping bad (00:04:42) How to explain short job stints when interviewing (00:06:49) Jess's day-to-day work and tech stack (00:09:15) SQL in the real world (00:12:10) How to talk data to stakeholders (00:18:33) How Jess prepares for SQL interviews (00:28:11) Data analysts vs data scientists (00:32:11) Choosing a career path (00:47:19) How to ask recruiter questions (00:50:15) Jess's LinkedIn content creation journey (00:59:03) The future of Jess's career (01:03:42) Jess's favorite books

Jan 05, 202401:07:49
How he got into machine learning and Gen AI at Amazon, how we went from "enemies" to allies - Mehdi Noori - The Data Scientist Show #075

How he got into machine learning and Gen AI at Amazon, how we went from "enemies" to allies - Mehdi Noori - The Data Scientist Show #075

Mehdi Noori is an applied science manager at the Generative AI Innovation Center at Amazon. I used to work with Mehdi while we were at the Machine Learning Solutions Lab at AWS. So before Amazon, Mehdi was a data scientist working on marketing intelligence. Mehdi has a PhD from University of Central Florida in civil engineering and sustainability. Subscribe to Daliana's newsletter for more on data science and career www.dalianaliu.com


Mehdi Noori: https://www.linkedin.com/in/mehdi-noori/

Predicting Soccer Goals: https://aws.amazon.com/blogs/machine-learning/predicting-soccer-goals-in-near-real-time-using-computer-vision/


Dec 06, 202301:32:23
Why she quit her finance job to become a farmer, exploring a different path from the modern life - Misty Arnold - The Data Scientist Show #074

Why she quit her finance job to become a farmer, exploring a different path from the modern life - Misty Arnold - The Data Scientist Show #074

My friend Misty moved to a farm in Portugal after her 20 years of career in finance. We talked about her experience moving from the busy corporate life to the farm life where she does a lot of manual work. Was it challenging, how does her finance work, and what is her advice to other people who also want to explore a different path outside of the modern city life. I hope this episode will give you a different perspective about your career.

Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career. Daliana's Twitter: https://twitter.com/DalianaLiu Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/


(00:00:00) Introduction

(00:11:41) Life on the farm

(00:15:46) Her finance plans

(00:22:55) Her career journey

(00:27:14) What do accountants do

(00:32:29) I thought I would be happy

(00:41:25) Daliana's personal view about finance; when it's enough for you

(00:44:41) Does she feel lonely on a farm?

(00:48:39) What if she didn't leave the corporate world?

(00:54:07) Does she regret her decision

Nov 29, 202301:10:29
Why he left his MLE job for product data science at Meta, data science at Uber, Linkedin, and Truecar - Pan Wu - The Data Scientist Show #073

Why he left his MLE job for product data science at Meta, data science at Uber, Linkedin, and Truecar - Pan Wu - The Data Scientist Show #073

Pan Wu is a senior manager of data science at Meta. We talked about why he moved from machine learning to product data science, projects he worked on at Uber, Linkedin, and Meta, and how he transitioned from IC to manager. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.


Pan’s LinkedIn: https://www.linkedin.com/in/panwu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/


(00:00:00) Introduction

(00:01:30) Why he transitioned from MLE to product DS

(00:07:38) Meta data scientists skill sets

(00:15:49) When did his interest shifted from MLE to product DS

(00:18:04) Is MLE more respected?

(00:25:46) A/B testing deep dives in 3 steps

(00:28:21) Built a tool at Linkedin

(00:35:52) How to sell your project

(00:41:07) Junior vs senior data scientist

(00:43:24) From staff data scientist to manager

(00:45:18) Explore being a manager

(00:46:24) Cultures in Uber, Linkedin, TrueCar

(00:52:09) Data science over the past 10 year

(00:55:06) MLE vs DS fun and frustration

(00:57:26) Product DS reality

(00:59:10) Learning new skills

(01:01:39) Mistakes he made

(01:06:34) Future of data science

(01:08:04) Will data scientists be replaced by AI

(01:09:42) Three skills he looks for when hiring

Nov 19, 202301:13:02
Machine learning in cybersecurity, computer vision in sports, from business analyst to ML engineer - Betty Zhang - The Data Scientist Show #072

Machine learning in cybersecurity, computer vision in sports, from business analyst to ML engineer - Betty Zhang - The Data Scientist Show #072

Betty Zhang is a data scientist currently working at a cloud security company, previously she was a data scientist at Amazon Web Services. Today we’ll talk about her computer vision projects in Sports, data science use cases in cyber security, from business major to data scientist, what’s her experience working in startups vs big tech companies. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.


Betty’s Linkedin: https://www.linkedin.com/in/betty-zhang-0bb63731/

Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/


(00:00:00) Introduction

(00:01:21) Computer Vision Project in Sports at AWS

(00:12:28) Challenges in computer vision

(00:14:02) Time allocation for ML projects

(00:15:22) 3 key skills for computer vision

(00:17:20) From business analyst to ML engineer

(00:18:14) How she got her data scientist job through Linkedin

(00:21:32) How she got into Amazon

(00:22:17) Three tech skills needed during Amazon interviews

(00:26:11) Why she joined a Cyber Security startup

(00:27:22) Three cybersecurity use cases

(00:29:47) Anomaly detection

(00:30:40) ML for cybersecurity

(00:34:43) Tech stacks Amazon vs Startups

(00:39:35) Startups vs big tech

(00:45:56) Balance learning and impact

(00:48:35) Advice for new data scientists

Nov 12, 202355:13
Stop abusing A/B testing, toxic experimentation culture, how to run A/B tests with rigor - Che Sharma - The Data Scientist Show #071

Stop abusing A/B testing, toxic experimentation culture, how to run A/B tests with rigor - Che Sharma - The Data Scientist Show #071

Che Sharma came back to discuss toxic behaviors in experimentation culture and provide actionable advice on how to handle those situations, how to have rigor and integrity when designing and analyzing A/B tests.


Che was the 4th data scientist at Airbnb, later he joined Webflow as an early employee. In 2021 he founded Eppo, a next-gen A/B experimentation platform designed for modern data and product teams to run more trustworthy and advanced experiments. We talked about A/B testing best practices, A/B testing for ML models, and Che’s career journey. Reach out to Che: https://www.linkedin.com/in/chetanvsharma/

Nov 04, 202301:03:42
Academia vs. Industry for Machine Learning, Research at Uber AI Labs, ML for Wind Farms - Jason Yosinski - The Data Scientist Show #070
Oct 23, 202301:16:10
Ads forecasting at Netflix and Spotify, how to build your personal moat - Jeff Li - The Data Scientist Show #069
Sep 14, 202301:26:29
A/B testing at Airbnb, building next-gen experimentation platform at Eppo - Che Sharma - The Data Scientist Show #068

A/B testing at Airbnb, building next-gen experimentation platform at Eppo - Che Sharma - The Data Scientist Show #068

Che Sharma was the 4th data scientist at Airbnb, later he joined Webflow as an early employee. In 2021 he founded Eppo, a next-gen A/B experimentation platform designed for modern data and product teams to run more trustworthy and advanced experiments. We talked about A/B testing best practices, A/B testing for ML models, and Che’s career journey. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.

Che’s LinkedIn: www.linkedin.com/in/chetanvsharma/
Try Eppo for A/B testing: www.geteppo.com/
Daliana's Twitter: twitter.com/DalianaLiu
Daliana's LinkedIn: www.linkedin.com/in/dalianaliu

(00:00:00) Introduction
(00:01:26) Getting started in data science at Airbnb
(00:03:08) Keys to successful A/B testing
(00:06:53) Interpreting and communicating A/B test results
(00:15:00) A/B testing best practices testing machine learning models
(00:41:39) Centralizing experiment analysis
(00:53:46) Preparing data scientists for the future
(00:59:33) Developing communication skills as a data scientist
(01:08:43) Transitioning from individual contributor to manager
(01:12:28) The future of experimentation
Aug 25, 202301:14:16
From data scientist@Meta to full-time YouTuber (500k+ sub), AI engineering, future of work - Tina Huang - The Data Scientist Show #067

From data scientist@Meta to full-time YouTuber (500k+ sub), AI engineering, future of work - Tina Huang - The Data Scientist Show #067

We talked about self-learning, productivity, how Tina navigates her career change and how she thinks AI could change the future of work.

Tina's YouTube: www.youtube.com/@TinaHuang1

Lonely Octopus: www.lonelyoctopus.com

Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.

Tina Huang is a data scientist turned YouTube creator with 500k subscribers. She is the founder of Lonely Octopus, an online program helping people gain data science, AI, and freelancing skills. She originally studied pharmacology before transitioning into tech, completing a master's degree in computer science at UPenn.

(00:02:38) Transitioning from Data Science to Content Creation

(00:06:29) Preparing for Data Science Interviews

(00:10:59) Starting a YouTube Channel

(00:14:18) Building Multiple Income Streams

(00:17:35) Getting Started with AI Skills

(00:29:29) Advice for Starting YouTube

(00:34:47) Improving Storytelling Skills

(00:36:58) Overcoming Procrastination

(00:42:33) The Future of Work

(01:47:08) Looking to the Future

(01:26:49) Income Breakdown

Aug 10, 202301:54:53
Making LLMs hallucinate less, how to diagnose ML models, from PM in Google AI to CEO of Galileo - Vikram Chatterji - The Data Scientist Show #066

Making LLMs hallucinate less, how to diagnose ML models, from PM in Google AI to CEO of Galileo - Vikram Chatterji - The Data Scientist Show #066

Vikram is the co-founder of Galileo – an AI diagnostics and explainability platform used by data science teams building NLP, LLMs and Computer Vision models across the Fortune 500 and high growth startups. 
 Prior to Galileo, Vikram led Product Management at Google AI, where his team built models for the Fortune 2000 across retail, financial services, healthcare and contact centers. He has a master degree from Carnegie Mellon University from the school of computer science. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.

Resources:LLM Studio: https://www.rungalileo.io/blog/announcing-llm-studio

Galileo: https://www.rungalileo.io/

Blog on LLM Hallucination: https://thesequence.substack.com/p/guest-post-stop-hallucinations-from

Vikram Chatterji’s LinkedIn: https://www.linkedin.com/in/vikram-chatterji/

"The Mom Test": https://www.amazon.com/The-Mom-Test-Rob-Fitzpatrick-audiobook/dp/B07RJZKZ7F

Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu


(00:00:00) Introduction

(00:04:24) How he got into machine learning

(00:06:53) Diagnosing large language models

(00:09:56) Addressing model hallucination

(00:12:46) Metrics for measuring hallucination

(00:17:30) From Google AI to starting Galileo

(00:24:08) Developing LLMs and putting them into production

(00:32:51) Galileo's diagnostics and explainability platform

(00:43:16)  Advice for data scientists when joining a startup

Aug 01, 202301:26:50
Data Science "Mix Martial Arts", applied re-inforcement learning, scaling AI workloads using Ray - Max Pumperla - The Data Scientist Show #065

Data Science "Mix Martial Arts", applied re-inforcement learning, scaling AI workloads using Ray - Max Pumperla - The Data Scientist Show #065

Max Pumperla designed his own career path in data science. He is a freelance software engineer at AnyScale, and also a data science professor. We talked about reinforcement learning, open source contributions, Ray for data scientists, and his view on the data scientists role. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.


Max’s LinkedIn: https://www.linkedin.com/in/max-pumperla-a8099354/

Max's GitHub: https://github.com/maxpumperla

Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu


(00:00:00) Introduction

(00:09:19) How he got a remote job through Twitter

(00:14:06) Introduction to Ray

(00:18:52) Reinforcement learning

(00:23:56) Key lessons on integrating customer feedback

(00:35:12) Flaws in data science job titles

(00:45:51) How to be irreplaceable as a data scientist

(00:48:55) An unconventional career path as a data scientist

(01:12:24) Productivity and work-life balance

(01:28:10) Advice for building a personal brand

Jul 28, 202301:53:28
Uber's ML Systems (Uber Eats, Customer Support), Declarative Machine Learning - Piero Molino - The Data Scientist Show #064

Uber's ML Systems (Uber Eats, Customer Support), Declarative Machine Learning - Piero Molino - The Data Scientist Show #064

Piero Molino was one of the founding members of Uber AI Labs. He worked on several deployed ML systems, including an NLP model for Customer Support, and the Uber Eats Recommender System. He is the author of Ludwig , an open source declarative deep learning framework. In 2021 he co-founded Predibase, the low-code declarative machine learning platform built on top of Ludwig. Piero's LinkedIn: https://www.linkedin.com/in/pieromolino

Predibase free access: bit.ly/3PCeqqw

Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu

(00:00:00) Introduction

(00:01:54) Journey to machine learning

(00:03:51) Recommending system at Uber Eats

(00:04:13) Projects at Uber AI 

(00:09:34) Uber's customer obsession ticket system

(00:16:01) How to evaluate online-offline business and model performance metrics

(00:17:16) Customer Satisfaction

(00:28:38) When do you know whether a project is good enough

(00:41:50) Declarative machine learning and Ludwig

(00:45:32) Ludwig vs AutoML

(00:54:44) Working with Professor Chris Re

(00:58:32) Why he started Predibase

(01:07:56) LLM and GenAI

(01:10:17) Challenges for LLMs

(01:22:36) Advice for data scientists

(01:34:29) Career advice to his younger self

Jul 04, 202301:50:05
Data science in transportation, the intersection of operations research and ML - Holger Teichgraeber - The Data Scientist Show #063

Data science in transportation, the intersection of operations research and ML - Holger Teichgraeber - The Data Scientist Show #063

Holger Teichgraeber is a Data Science Manager at Archer Aviation. Previously, he worked at Convoy as a Research Scientist on their trucking marketplace, and at various companies in the energy space. Holger has a Bachelor's degree in Mechanical Engineering from Aachen, Germany, and a Masters and Ph.D. with research focus on machine learning and optimization applied to energy systems from Stanford University. He regularly writes on LinkedIn, with the goal to show how to build valuable products at the intersection of machine learning and optimization in production. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.


Holger's LinkedIn: https://www.linkedin.com/in/holgerteichgraeber/

Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu


(00:00:00) Introduction

(00:01:28) How he got into operations research

(00:02:39) Operation research vs data science

(00:04:37) Trucking optimization at Convoy

(00:08:42) Optimization problem

(00:10:18) Strategic planning on air mobility at Archer

(00:13:50) Using simulation and solving a problem

(00:16:45) Big data science work vs smaller data science work

(00:21:23) Stakeholder management

(00:29:28) IC vs Manager

(00:32:04) Advice on promotion

(00:39:12) Work cultures in Germany and the US

(00:41:16) How to handle tight deadlines

(00:43:21) Important feedback from his work

(00:44:14) How to plan projects

(00:44:45) Next big challenge for data science teams

(00:45:40) Career growth in the next few years

(00:46:01) Connect with Holger

Jun 26, 202346:53
Tackling data quality issues, 5 pillars of data observability, from management consultant to CEO of Monte Carlo - Barr Moses -The Data Scientist Show #062

Tackling data quality issues, 5 pillars of data observability, from management consultant to CEO of Monte Carlo - Barr Moses -The Data Scientist Show #062

Barr Moses is a consultant turned CEO & Co-Founder of Monte Carlo, a data reliability company. She started her career as a management consultant at Bain & Company and a research assistant at the Statistics Department at Stanford University. Later, she became VP of Customer Operations at customer success company Gainsight, where she built the data and analytics team. She also served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a B.Sc. in Mathematical and Computational Science. Today, we’ll talk about Barr’s career journey, data reliability and observability, and what it means for data teams. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science. Barr's LinkedIn: https://www.linkedin.com/in/barrmoses/ Daliana's Twitter: https://twitter.com/DalianaLiu Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu


(00:00:00) Introduction

(00:01:24) How did she got into data science

(00:08:26) Frameworks for data-driven decisions

(00:11:20) Is customer support ticket always bad?

(00:15:20) How to quickly find out what is true

(00:20:17) Struggles in the data team

(00:23:37) Daliana’s story about lineage

(00:28:00) People stressed about data

(00:28:09) Netflix was down because of wrong data

(00:30:40) Common issues with data quality

(00:33:14) 5 pillars of data observability

(00:39:14) How does Monte Carlo help data scientists

(00:43:08) Build in-house vs adopt tools

(00:45:48) How Daliana fixed a data quality issue

(01:02:44) How to measure the impact of the data team

(01:09:09) Mistakes she made

(01:15:28) Beat the odds

May 18, 202301:21:31
Is search dead? Google vs ChatGPT, from Google Search to enterprise search at Glean, machine learning in search, tech layoffs - Deedy Das - The Data Scientist Show #061

Is search dead? Google vs ChatGPT, from Google Search to enterprise search at Glean, machine learning in search, tech layoffs - Deedy Das - The Data Scientist Show #061

Deedy Das is a founding engineer at Glean, an enterprise search startup. Previously, he was a Tech Lead at Google Search working on query understanding and the sports product in New York, Tel Aviv, and Bangalore. Before that, he was an engineer at Facebook New York and graduated from Cornell University. Outside of work, Deedy writes on his blog. He published a viral resume template and his work on exposing grading flaws in the Indian education system. He also enjoys running marathons, road cycling, and playing cricket. Today we’ll talk about the search projects he worked on at Google, why he left Google, his current work at Glean, and his thoughts on whether Google is doomed because of  ChatGPT. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science.  


Deedy's Twitter: https://twitter.com/debarghya_das?s=20

Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu 


(00:00:00) Introduction 

(00:01:52) What is search 

(00:04:33) Query understanding 

(00:12:46) Google vs ChatGPT 

(00:18:24) Fixing bug for Sundar Pichai 

(00:27:33) Why he left google 

(00:30:32) How to get into search 

(00:34:38) Enterprise search at Glean 

(00:46:55) Advice for people who got laid off 

(00:48:41) What do search engineers do 

(00:51:37) How he evaluates candidates 

(00:53:58) Future of search 

(00:57:16) Why the web is declining 

(00:59:25) Copilot and AI-powered developer tools 

(01:03:46) Indian startup ecosystem 

(01:07:45) India vs Silicon Valley 

(01:09:48) How he grew 30k followers on Twitter 

(01:13:28) Daliana and Deedy’s challenge with social media 

(01:19:31) Career mistakes he made

Feb 21, 202301:27:07
The 100-hour work week of an self-taught machine learning researcher, how he got into Google Brain, why he started Omni - Jeremy Nixon - The Data Scientist Show #060

The 100-hour work week of an self-taught machine learning researcher, how he got into Google Brain, why he started Omni - Jeremy Nixon - The Data Scientist Show #060

Jeremy Nixon is a machine learning researcher, software engineer, and startup founder. Previously he was a software engineer at Google Brain working on deep learning. Now, he is the co-founder and CEO of Omni, building an immersive information retrieval system for you and your team. He studied applied math at Harvard University. Today we’ll talk about how he got into Google brain, his 3-month self-learning plan to learn machine learning, his startup, and how he executed his goal relentlessly since 2016. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science.


Jeremy's Twitter: https://twitter.com/JvNixon

Jeremy's Blog: https://jeremynixon.github.io/

Daliana's Twitter: https://twitter.com/DalianaLiu

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu

Jeremy's LinkedIn: https://www.linkedin.com/in/jeremyvnixon


(00:00:00) Introduction 

(00:01:50) Research in Google Brain 

(00:03:37) How he got into Google Brain 

(00:07:56) His 3-month plan to learn ML 

(00:17:55) The 100-hour workweek 

(00:33:26) What if he is tired 

(00:39:59) Why he found Omni 

(00:44:24) Data science problems in Omni 

(00:54:42) Future of machine learning 

(00:57:51) Silicon Valley is very accessible 

(00:59:47) The golden handcuffs 

(01:06:58) From data scientist to full-stack engineer 

(01:09:06) Close-minded data scientists 

(01:24:10) Advice to ML learners 

(01:29:41) Something he wished that he did when he was younger 

(01:37:25) The future of his career 

(01:42:17) Connect with Jeremy

Feb 20, 202301:42:52
The power of error analysis, tree models for search relevancy, what ChatGPT means for data scientists - Sergey Feldman - The Data Scientist Show #059

The power of error analysis, tree models for search relevancy, what ChatGPT means for data scientists - Sergey Feldman - The Data Scientist Show #059

Sergey Feldman is the head of AI at Alongside, providing mental health support for students. He is also a Lead Applied Research Scientist at Allen Institute for AI, where he built an ML model that improved search relevancy for scientific literature. Sergey has a PhD in Electrical and Electronics Engineering from the University of Washington. Today we’ll talk about machine learning for search, his consulting project for the Gates Foundation, AI for mental health, and career lessons. Make sure you listen till the end. If you like the show, subscribe, leave a comment, and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's Twitter: https://twitter.com/DalianaLiuDaliana's  

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/  

Sergey's LinkedIn: https://www.linkedin.com/in/sergey-feldman-6b45074b/ 

Data Cowboys: http://www.data-cowboys.com/

Sergey Feldman: You Should Probably Be Doing Nested Cross-Validation | PyData Miami 2019: https://www.youtube.com/watch?v=DuDtXtKNpZs

December 4th, 2018 - Breakfast with WACh with Dr. Sergey Feldman, PhD: https://www.youtube.com/watch?v=vA_czRcCpvQ


(00:00:00) Introduction 

(00:01:24) Machine learning skeptic 

(00:03:02) Tree-based models for search relevance 

(00:14:34) How to do error analysis 

(00:19:20) Nested cross-validation 

(00:21:34) Model evaluation 

(00:30:43) Error analysis common mistakes 

(00:33:37) How to avoid overfitting 

(00:35:56) Consulting project with Gates Foundation 

(00:41:16) Tree-based models vs linear models 

(00:45:19) Working with non-tech stakeholders 

(00:50:20) Chatbot for teen’s mental health 

(00:54:32) Can ChatGPT provide therapy?  

(00:58:12) How he got into machine learning 

(01:02:12) How to not have a boss 

(01:03:46) Feelings vs Facts 

(01:09:02) Future of machine learning 

(01:11:30) How to prepare for the future 

(01:13:39) AutoML 

(01:17:12) His passion for large language models

Jan 24, 202301:19:44
How to build data science muscle memory, DeepChecks -- an open source ML testing suite - Philip Tannor - The Data Scientist Show #058

How to build data science muscle memory, DeepChecks -- an open source ML testing suite - Philip Tannor - The Data Scientist Show #058

Philip Tannor is the Co-Founder and CEO of Deepchecks, a python package to run checks for machine learning models. Previously, he was the head of data science group at the Isreal Defense Force. He has a master's degree from Tel Aviv University in engineering, his thesis was about a new algorithm that combines neural networks with gradient-boosting decision trees. Today we’ll talk about his career journey, how to build your data science muscle memory, the algorithm he worked on, and how to check ML models. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science and career.


Daliana's Twitter: https://twitter.com/DalianaLiuDaliana's 

LinkedIn: https://www.linkedin.com/in/dalianaliu/ 

Philip’s LinkedIn: https://www.linkedin.com/in/philip-tannor-a6a910b7/?originalSubdomain=il

Augboost: https://medium.com/@ptannor/augboost-like-xgboost-but-with-few-twists-e4df4017a5c4


(00:00:00) Introduction 

(00:01:17) How did he get into ML 

(00:02:52) Data science in the military 

(00:08:15) How to take feedback 

(00:13:24) Handling criticism 

(00:15:12) What he worked on 

(00:18:18) testing deployment 

(00:21:28) How to build the data science muscle memory 

(00:27:09) Improving the skills of data scientists 

(00:30:42) His thesis in grad school 

(00:36:59) Combine NN and gradient boosting 

(00:40:05) Aug boost 

(00:41:15)Tools he uses 

(00:45:58) Deepchecks 

(00:50:46) Most challenging part of building Deepchecks 

(00:52:05) How can people contribute 

(00:53:40) Behind the scenes 

(00:56:09) Deciding how to fix or improve the model 

(01:00:49) Advise for those who wanna create open-source projects 

(01:04:07) Features to add for the enterprise product 

(01:06:57) About his life and career right now 

(01:08:27) Connect with Philip

Dec 07, 202201:08:51
The Daliana Special: how did I got into data science, 5 things only experienced data scientists know, and why I started "The Data Scientist Show" - Daliana Liu #057

The Daliana Special: how did I got into data science, 5 things only experienced data scientists know, and why I started "The Data Scientist Show" - Daliana Liu #057

Who is Daliana? This is a conversation I had in 2021 with Harpreet Sahota. I talked about my unexpected journey to data science all the way back in high school, things I wish I could know earlier about my career, the projects I worked on, what is like to be a quote-and-unquote influencer on Linkedin, and more. If you want more content from me, I write about data science and career nerdy jokes, on my Linkedin and you can subscribe to my very infrequent newsletter at dalianaliu.com. I’m curious what you think about this episode, leave a comment on YouTube or send a DM on Linkedin. Hope you enjoy the Daliana special!  


Daliana's Newsletter: https://dalianaliu.com 

Daliana's Twitter: https://twitter.com/DalianaLiu 

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ 

Harpreet's LinkedIn: https://www.linkedin.com/in/harpreetsahota204/ 

The artist of the data science podcast: https://theartistsofdatascience.fireside.fm/ 


(00:00:00) Introduction 

(00:02:52) Where did Daliana grow up 

(00:05:19) Daliana in highschool 

(00:07:11) How did she got into data science 

(00:11:36) Why is writing important for data scientist 

(00:15:51) How to write better 

(00:20:56) Career lessons you didn't learn in school 

(00:27:40) Imposter syndrome 

(00:31:29) Day-to-day work as a data scientist 

(00:36:16) Most common mistakes data scientists make 

(00:39:41) Data Analyst vs. Data Scientist 

(00:42:30) What is the science in data science? 

(00:44:51) Can everyone be a data scientist 

(00:49:21) Linkedin profile tips for job search 

(00:52:59) How she creates content 

(00:54:11) Being a data scientist "influencer" 

(00:56:04) Why she started "the data scientist show" 

(01:01:16) Women in data science 

(01:06:39) What's her legacy 

(01:09:43) What is she reading 

(01:14:21) Connect with Daliana

Nov 24, 202201:15:20
How he carved his own path at Airbnb, from data engineer to CEO of Mage - Tommy Dang - the data scientist show #056

How he carved his own path at Airbnb, from data engineer to CEO of Mage - Tommy Dang - the data scientist show #056

Tommy Dang is the Co-founder and CEO of Mage, a data ingestion and transformation pipeline for data engineers (https://github.com/mage-ai/mage-ai). Previously, he was working on data engineering and machine learning engineering at Airbnb. He has a bachelor degree of science in UC Berkeley studying economic, history, and sociology. Today we’ll talk about how he learned engineering and machine learning after college, data tools and ML tools he built at Airbnb, performance review, and how he navigates his career. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science and career.


Tommy’s LinkedIn: https://www.linkedin.com/in/dangtommy/

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu


(00:00:00) Introduction 

(00:01:28) Get into computer science from non-tech background 

(00:03:08) How he started his first project 

(00:04:07) Projects at Airbnb 

(00:06:09) Speed vs Quality when building data pipelines 

(00:16:34) How to deal with AdHoc requests 

(00:21:00) How did he learn machine learning 

(00:24:04) How he convinced data scientists to teach him ML 

(00:25:15) Performance review 

(00:27:11) Don’t let your job title limit your career 

(00:28:29) Why he started his company 

(00:31:38) Build your own tool vs use open source solutions 

(00:33:12) Transitioning from an engineer to a CEO 

(00:34:50) Earn trust from internal stakeholders 

(00:36:27) Career advice 

(00:41:31) How he carved his own path at Airbnb 

(00:46:00) How did he learn to be a good engineer 

(00:47:10) Best advice for data scientists or engineers 

(00:48:41) Most important quality of data scientists or engineers 

(00:51:51) Design principles 

(00:58:51) Future of tools 

(01:01:00) What does he think about his future career 

(01:05:05) Inspiration of Tommy

Nov 08, 202201:08:02
How to effectively test and debug machine learning models, from ML engineer@Apple to startup founder - Gabriel Bayomi - the data scientist show #055

How to effectively test and debug machine learning models, from ML engineer@Apple to startup founder - Gabriel Bayomi - the data scientist show #055

Gabriel Bayomi is the Co-Founder at OpenLayer, a tool that tests & debugs machine learning models. OpenLayer was in the YCombinator’s batch in 2021, building tools for machine learning model testing. Previously he was a machine learning engineer at Apple working on Siri. He has a master degree in computer science from Carnegie Mellon. He is passionate about Natural Language Processing, Machine Learning, and Computational Social Science. We talked about how to test and debug machine learning models, his experience at Apple, and career lessons. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science and career.


Gabriel’s LinkedIn: https://www.linkedin.com/in/gbayomi

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu


(0:00) Intro

(01:01:39) How he got into machine learning

(01:06:43) His experience at Apple, Siri

(01:15:55) How to validate the solution

(01:19:39) Benefits of using external error analysis framework

(01:21:30) How to build a model evaluation pipeline

(01:28:26) Don’t overfit the subset of data

(01:33:19) Your validation set shouldn’t be fixed

(01:41:03) Become one with data

(01:44:05) Three model interpretability library you should use

(01:50:47) Common mistakes people made in model validation

(01:53:33) How to create an adversarial test

(01:55:43) How to check data quality

(01:06:46) Transition from engineer to executive

(01:10:04) Things he learnt from his favorite coworker

(01:17:57) how job roles would evolve

Oct 24, 202201:24:02
From Amazon research scientist to head of data product at Vestiaire Collective, why data science projects fail, how to be a good communicator - Alisa Kim - the data scientist show #054

From Amazon research scientist to head of data product at Vestiaire Collective, why data science projects fail, how to be a good communicator - Alisa Kim - the data scientist show #054

Alisa Kim is the head of data product at Vestiaire Collective. Previously, she was a research scientist at Amazon Web Services. We used to work on the same team in Machine Learning Solutions Lab and Amazon Web Services. We have collaborated on projects before and previously she was a consultant and worked on analytics and investment banking. She has a Ph.D. in Econ AI and she has worked on various industries and multiple continents. She's someone I really enjoyed working with. We talked about her journey, the projects she worked on and the lessons she learnt. If you like the show subscribe to the channel and give us a 5 star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.  


Alisa's LinkedIn: https://de.linkedin.com/in/alisakolesnikova

Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's twitter: https://twitter.com/DalianaLiu


(0:00) Intro

(00:01:38) how she got into data science

(00:04:38) day-to-day at AWS ML Solutions Lab

(00:08:00) AWS leadership principles

(00:16:34) challenges the consultant faces when working with external customers

(00:23:36) from AWS to Vestiaire Collective

(00:37:54) how to build a better data product

(00:44:17) how data scientist can align with business stakeholders 

(00:57:52) from tech to business

(01:01:33) how to develop communication skills

(01:09:17) increase visibility of the data science team

(01:17:22) being proactive vs being passive in chasing opportunities

(01:24:06) get feedback from your "nearest neighbors"

(01:25:37) how to set boundary at work

(01:38:48) mistakes she made in her career

(01:48:25) how to manage disagreement

(01:57:53) future of data science

Oct 19, 202202:12:17
The lessons from almost losing a million dollars for his company, how to build good data assets and get buy-in from the leadership - Mark Freeman - the data scientist show#053

The lessons from almost losing a million dollars for his company, how to build good data assets and get buy-in from the leadership - Mark Freeman - the data scientist show#053

Mark Freeman is a community health advocate turned data scientist His mission is to improve the well-being of people, especially among those marginalized. He is currently a senior data scientist at Humu where he builds data tools that drive behavior change to make work better. He has a master degree from the Stanford School of Medicine in clinical research, experimental design and statistics. He also has a certificate in entrepreneurship from the Business School of Stanford. In his free time, he volunteers with a Bay Area Community Health Advisory Council. He also plays Men's Division III Rugby. We talked about the building data tools, data engineering skills for data scientist, how to pitch a projects, and his career journey. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Mark's LinkedIn: https://www.linkedin.com/in/mafreeman2/


Chapters:

(0:00) Intro

(00:03:05) Our experience using R - 1000 lines of code

(00:09:22) Entrepreneurship within a company

(00:16:25) DBT and modern data stack

(00:20:15) Tools don’t matter (in interviews)

(00:21:09) Things DE enjoys but DS doesn’t

(00:24:55) How to work with different stakeholders

(00:30:32) Common SQL mistakes

(00:33:34) SQL vs Python vs R

(00:35:26) T.R.I.B.E framework for projects

(00:40:43) Meet the stakeholders where they at

(00:42:40) Use feedback to get buy-in from collaborator

(00:46:36) How to pitch a new idea

(00:49:45) Don’t lead with solution, lead with the problem

(00:51:03) How to get buy-in from the leadership

(00:57:56) Present an idea as if the audience came up with it

(00:58:41) How to iterate a project

(01:00:27) How he almost lost 1 Million dollar for his company

(01:02:07) Things he learned from his manager

(01:04:19) Things that help people make changes effectively

(01:06:05) Things he learned from mentoring

(01:12:19) Mental Health and anxiety

(01:17:12) Web3

(01:20:14) Why he cares about community health

(01:25:40) "Soul - searching" on his future

(01:28:36) Why he write on LinkedIn

(01:30:04) Future of data science


Oct 15, 202201:32:32
From deep learning architect at AWS to PM in AI product - Abhi Sharma - the data scientist show #052

From deep learning architect at AWS to PM in AI product - Abhi Sharma - the data scientist show #052

Abhi Sharma started his career as a software engineer at Amazon Lab 126, building cloud services for Alexa. Later he transferred to Amazon Web Services as a deep learning architect. We used to work at the same team at machine learning solutions lab in AWS. Currently, he is a product manager, responsible for machine learning products like chatbot at Chime. We talked about how he transitioned his career from software engineer to deep learning architect and to a product manager, cool projects he worked on, and our shared experiences at Amazon. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Abhi's LinkedIn: https://www.linkedin.com/in/abhivs/


Highlights:

(0:00) Intro

(00:01:48) from SWE to deep learning architect to product manager

(00:12:44) day-to-day as a product manager at Chime

(00:19:46) how he collaborates with different data personas

(00:27:21) how to negotiate for more time for projects with leaders

(00:33:59) some timelines are negotiable

(00:38:00) most impactful project he worked on

(00:44:22) how to evaluate KPI, and not game the system

(00:48:02) think about development in the beginning

(00:50:29) data scientists need to educate the business and demystify the buzz words

(00:54:19) Amazon’s Think Big Challenge

(00:57:09) Never solve the problem twice

(01:00:25) How to transition to a product manager

(01:07:48) why he wanted to become a PM

(01:25:35) How can data scientist learn from PM

Oct 04, 202201:30:45
What data scientists need to know about MLOps principles, from GPA 2.6 to Sr. MLOps Engineer@Intuit - Mikiko Bazeley - the data scientist show051

What data scientists need to know about MLOps principles, from GPA 2.6 to Sr. MLOps Engineer@Intuit - Mikiko Bazeley - the data scientist show051

Mikiko Bazeley is a senior software engineer working on MLOps at Intuit. Previously, she worked as a growth hacker, data analyst in Finance, then become a data scientist, and later transitioned into machine learning. She has a bachelor degree in econ, biological anthropologie, did data science bootcamp at springboard. She is a tech writer for NVIDIA and she’s working on a course on MLOps. Her goal is to demystify MLOps & show how to develop high-quality ML products from scratch. You can find her content on Linkedin and YouTube. Today, we’ll talk about useful engineering principles for data scientists, MLOps, and her career journey. Subscribe to www.dalianaliu.com for more on data science and career. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Mikiko's Linkedin: https://www.linkedin.com/in/mikikobazeley/


Highlights:

(0:00) Intro 

(00:02:00) from GPA2.6 to data scientist

(00:05:27) her experience at Mailchimp

(00:11:44) her frustrations on Cookiecutter project

(00:14:09) the pain point of a data scientist working with engineering

(00:21:01) 2 MLOps pattern

(00:25:52) challenges about her work

(00:29:49) the basic engineering skills a data scientist should have

(00:32:46) the tests a data scientist should write

(00:37:42) how an MLOps engineer collaborates with a data scientist

(00:45:28) what makes a good MLOps engineer

(00:52:33) AWS vs GCP vs Azure

(00:58:59) how a data scientist collaborates with an MLOps engineer 

(01:05:19) suggestions for building a model on a large scale

(01:09:11) how she learnt MLOps on her own within 6 months

(01:17:32) learn from code review

(01:19:17) MLOps books and resources she recommended

(01:24:13) mistakes she made earlier in her career

(01:31:29) common mistakes people make during career change

(01:38:22) "Start with the end in mind"

(01:41:16) the future of MLOps

(01:46:23) how she sees her career growth

(01:56:40) how she continues learning new skills

(02:00:09) what she is excited about her career and life

Sep 27, 202202:04:50
Bayesian thinking in work and life, ad attribution models and A/B testing, machine learning@Foursquare - Max Sklar - the data scientist show050

Bayesian thinking in work and life, ad attribution models and A/B testing, machine learning@Foursquare - Max Sklar - the data scientist show050

Max Sklar is an independent engineer and researcher. Previously, he was an engineering and Innovation Labs Advisor at Foursquare after 7 years at the company as a machine learning engineer. Previously, he has worked on Ad Attribution, recommendation engine, ratings. He is the host of The Local Maximum podcast. Max studied CS from Yale, and holds a Master degree in information systems from New York university. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Max's Linkedin: https://www.linkedin.com/in/max-sklar-b638464/

Max’s website: localmaxradio.com/about


Interviews he mentioned during the podcast:

Andrew Gelman, Statistics at Columbia University

Shirin Mojarad on Causality

Johnny Nelson on Free Speech and Moderation online

Stephanie Yang talking about Foursquare's Venue Rating System

Dennis Crowley: on Labs, on Innovation

Sophie Carr (Bayesian Mathematician)

Will Kurt (Bayesian)

Marsbot for Airpods

Other Episodes Mentioned

Bayesian Thinking

P-Hacking

Interview on Learn Bayesian Statistics


Highlights:

(0:00) Intro

(00:01:23) from computer science to machine learning

(00:05:35) Bayesian methods in rating system

(00:14:53) how to choose a Bayesian prior

(00:20:10) how to deal with p-hacking

(00:26:57) causality model in ad attribution

(00:35:20) Bias-correction methods

(00:45:43) negative lift in advertising

(00:51:05) unexpected consumer behaviors

(00:52:08) why he decided not to climb the "engineer ladder"

(00:56:46) the challenges of having 5 managers in a year

(01:01:38) using the 3rd-party software vs building his own

(01:04:18) how he approaches ML problems

(01:07:51) his tech stack

(01:09:25) his advise on learning machine learning

(01:12:40) projects he is working on

(01:17:10) Bayesian for his life decisions

(01:22:00) how writing helps him

(01:23:48) the confusion, stress and excitement in his career

Sep 13, 202201:30:26
Why he quit a $500k+ machine learning job at Meta (Facebook): a candid review of his experience, mistakes, and ML best practices - Damien Benveniste - the data scientist show049

Why he quit a $500k+ machine learning job at Meta (Facebook): a candid review of his experience, mistakes, and ML best practices - Damien Benveniste - the data scientist show049

Damien Benveniste is a data scientist and software engineer. Previously, he was a machine learning tech leader and mentor. He has worked for almost ten years in different machine learning roles in different industries such as AdTech market research, e-commerce and health care. He has a Ph.D. in physics from Johns Hopkins University and now working towards co-founding own startup in employee engagement space. We talked about his career journey, how he solved challenging problems, and his advice for new data scientists and engineers. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Damien's Linkedin: https://www.linkedin.com/in/damienbenveniste/


(00:00) Intro 

(00:01:17) from quantitative trading to machine learning 

(00:07:52) his experience at Meta 

(00:21:16) automated machine learning 

(00:28:52) model paradigm 

(00:32:47) the productivity-oriented culture at Meta 

(00:41:42) short-term gain vs long-term goal 

(00:44:38) things he liked at Meta 

(00:51:54) the project that shaped his career 

(01:03:56) the importance of having a baseline for ML models 

(01:09:12) why he time-boxed everything 

(01:16:25) test the model in production 

(01:20:05)experimental design for ML 

(01:23:25) the most challenging project he worked on 

(01:37:07) best practices for machine learning 

(01:48:44) how he sees himself 

(02:00:52) lessons he learnt from being layoff 

(02:06:45) frustration he had in his previous job 

(02:16:14) what he is working on 

(02:29:18) the future of machine learning 

(02:39:52) things he is excited about

Sep 06, 202202:44:27
Time series modeling in supply chain, how to master business communication, save the environment with data science - Sunishchal Dev - the data scientist show048

Time series modeling in supply chain, how to master business communication, save the environment with data science - Sunishchal Dev - the data scientist show048

Sunishchal Dev is a lead data scientist at Booster. He's helping to decarbonize the transportation industry by optimizing last mile delivery of renewable fuels. Previously, he was a management consultant. On the side, he volunteers with Project Drawdown to model the most effective solutions to climate change. He is also a mentor of future data scientist as a springboard by guiding them through real world projects. We talked about his career journey, supply chain optimization, how data science can help the environment. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu


(0:00) Intro

(00:01:24) from business to data science

(00:06:36) the big impact of a small improvement

(00:08:50) data engineering vs predictive modeling

(00:11:48) routing optimization

(00:16:27) time series model

(00:21:32) use upsampling to simulate intermittent time series problem

(00:26:20) his modern data stack

(00:28:29) collaborate with engineers

(00:30:06) common mistakes people made in building time series model

(00:37:02) collaborate with truck drivers

(00:40:17) how to become a good communicator

(00:46:30) his experience in mentoring data scientist

(00:51:14) things people cannot learn at school

(00:53:16) the mistakes he made and the things he learnt from his mentor

(00:56:07) how data science can help the environment


Books recommended: 

The Pyramid Principle: Logic in Writing and Thinking

The Book of Why: The New Science of Cause and Effect

Influence, New and Expanded: The Psychology of Persuasion

Aug 31, 202201:03:10
Product data science@Spotity, from management consultant to data scientist, salary negotiation, managing ADHD - Felicia Rutberg - the data scientist show047

Product data science@Spotity, from management consultant to data scientist, salary negotiation, managing ADHD - Felicia Rutberg - the data scientist show047

Felicia Rutberg is a product strategy and analytics manager at Snap, previously she was a product data scientist at Spotify. She started her career as a management consultant at Accenture. She studied mathematics and cognitive psychology at the Vanderbilt University. Felicia reached out to me on Linkedin because she wanted to share how she became a data scientist while having ADHD. Today we’ll talk about product analytics at Spotify and Snap, her career journey, and ADHD. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Felicia's Linkedin: https://www.linkedin.com/in/feliciarutberg/ 


Highlights: 

(00:01:29) from management consulting to data science 

(00:12:20) financial data analyst at Spotify 

(00:20:06) how to do internal job transition 

(00:25:57) product data scientist at Spotify in the econometrics team 

(00:29:33) how she became more vocal on the creative process

(00:33:48) how to get the last 1% of the work done 

(00:38:53) how to ensure the quality of the analysis 

(00:50:19) propensity score matching at Spotify 

(00:57:09) how to validate causal inference outcomes 

(01:00:51) lessons from working with economists 

(01:19:16) from Spotify to Snap 

(01:27:35) salary negotiation 

(01:34:02) day-to-day at Snap 

(01:38:33) Spotify vs Snap 

(01:44:35) lessons from management consulting that helped her data science journey 

(01:47:37) ADHD and self-compassion 

(02:02:52) the books she recommended 

(02:08:26) her future career

Aug 18, 202202:12:58
Data science interviews trends, from being laid off to landing a data scientist job at Airbnb - Emma Ding - the data scientist show #046

Data science interviews trends, from being laid off to landing a data scientist job at Airbnb - Emma Ding - the data scientist show #046

Emma Ding is a data scientist turned career coach. Previously she was a data scientist and software engineer at airbnb. I first discovered her through a viral Medium blog called “how I got 4 data science offers and doubled my income 2 months after being laid off". Today, her mission is to help data scientists land their dream offers by being strategic and efficient in their interview preparation at https://www.datainterviewpro.com/. Among the 80 clients she worked with, 90% of them received data scientist job offers from top tech companies, such as meta, linkedin, doordash, robinhood, etc. We talked about how she doubled her salary and got into Airbnb after she was laid off , her experience at Airbnb during the first half of the podcast, and then we’ll dive into new trends in data science interviews and her best strategy to get a data scientist job. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Emma's YouTube: https://www.youtube.com/c/

DataInterviewPro Free product case class: https://www.datainterviewpro.com/product-case-masterclass-registration 

Books on causal inference: Mostly harmless econometrics and Mastering Metrics: The Path from Cause to Effect. 

Emma's Linkedin: https://www.linkedin.com/in/emmading001/ 


(00:00) Intro  

(00:04:24) her strategy to get the data scientist offer after the layoff  

(00:07:00) advices for preparing interviews  

(00:14:04) her day-to-day at Airbnb  

(00:16:46) things she learnt from her mentor  

(00:18:07) from a data scientist to a SDE to a data interview pro  

(00:22:12) trends of data science interview  

(00:26:48) data scientist tracks: analytics-driven vs algorithms-driven  

(00:32:56) SQL interviews: readability and proficiency    

(00:35:06) make a study plan, execute it and keep the confidence  

(00:41:29) what she teaches in her datainterview.com  

(00:43:45) how to tackle take-home challenges  

(00:45:41) how to negotiate salaries  

(00:46:56) how to build confidence in the job search process  

(00:50:23) how to study efficiently different subjects  

(00:54:26) how to transition to data science  

(01:00:05) how to remedy mistakes during the interview  

(01:03:37) is data scientist still in demand?  

(01:08:43) advices for getting ready for the new career

Aug 02, 202201:19:35
Using ML to tackle disruptive behaviors in gaming@Activision, data science in the metaverse, cyber security - Carly Taylor - the data scientist show #045

Using ML to tackle disruptive behaviors in gaming@Activision, data science in the metaverse, cyber security - Carly Taylor - the data scientist show #045

Carly Taylor is a senior manager at Activision, leading a team of  machine learning engineers to tackle disruptive behaviors in the game ‘Call of Duty’. Previously, she has held various roles including machine learning engineer, data scientist, product analyst, Analytical Chemist. She has a master degree in computational chemistry from the university of colorado. She’s passionate about video games and cyber security. She shares her insights on machine learning, gaming, and career with 33k Linkedin follower. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Carly's Linkedin: https://www.linkedin.com/in/carly-taylor0017/


Highlights:

(00:00) Intro 

(00:01:14) from chemistry major to data scientist in gaming 

(00:05:46) how she tackles disruptive behavior using machine learning 

(00:11:38) feature engineering and model drift in fraud detection 

(00:16:49) the challenge of dealing with the large scale of data 

(00:27:10) data science in the Metaverse 

(00:36:08) signal processing and anomaly detection 

(00:40:31) dealing with the outliers 

(00:45:49) gets the buy-ins from the leadership 

(00:49:56) from an IC to a manager 

(00:53:36) mentorship, mistakes, and other things she learnt from work 

(00:58:48) Python or R? 

(01:05:30) how she sees herself grow and how she deals with struggles 

(01:07:56) the future of data science in gaming

Jul 29, 202201:15:42
From lawyer to senior data scientist at Amazon, data science in devices, HR, and real estate, how to 're-invent' yourself - Pauline Chow - the data scientist show #044

From lawyer to senior data scientist at Amazon, data science in devices, HR, and real estate, how to 're-invent' yourself - Pauline Chow - the data scientist show #044

Pauline Chow is a data scientist and former legal attorney and active transportation advocate. She worked in banking, fashion and education start-ups, and Amazon. Currently, she is the data engineering lead for Thrackle, a blockchain research and modeling company. She has a master degree in computer science, Machine learning, from Georgia Institute of Technology, she also has a law degree JD from the university of wisconsin. She is also a certified yoga teacher and published writer. 

We talked about her projects in three different teams in Amazon: devices, HR, and real estate; how her law degree helped her become a better data scientist; how she 're-invented' herself. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu


Jul 13, 202201:31:11
From chemical engineer to data scientist@ExxonMobil, why he left to do data science freelancing, data career jumpstart, Avery Smith - the data scientist show#043
Jul 06, 202201:31:60
Applied machine learning research methods, human-machine team, AI strategies, trends in machine learning, how to earn trust - Vin Vashishta - The data scientist show #042

Applied machine learning research methods, human-machine team, AI strategies, trends in machine learning, how to earn trust - Vin Vashishta - The data scientist show #042

Vin Vashishta is a chief data officer and AI strategist at V Squared, a company he founded in 2012 that  provides AI strategy, transformation, and data organizational build-out services.

He teaches data professionals about strategy, communications, business acumen, and applied machine learning research methods. Vin has 130k+ followers on Linkedin talking about AI, analytics, and strategy. His website: https://www.datascience.vin/ If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu


Highlights:

(0:00) Intro 

(00:03:37) "ML strategy" with 'pricing' as an example 

(00:09:45) what is a good metric for ML 

(00:13:16) how to translate a business problem into a data problem 

(00:23:42) leverage users in the "Human Machine Teaming" 

(00:48:22) how he earned the trust 

(01:17:31) data science evolution from 2012 to 2022 

(01:31:06) how he learns new domain knowledge

(01:36:25) the mistakes he made 

(01:42:15) what he learnt from his mentor

Jun 29, 202201:50:01
Retail store forecasting with video and audio, ML in high frequency trading, from tech to politics, ML in Web3 - Greg Tanaka, the data scientist show #041

Retail store forecasting with video and audio, ML in high frequency trading, from tech to politics, ML in Web3 - Greg Tanaka, the data scientist show #041

Greg Tanaka is a computer scientist turned CEO of an AI company. He started coding when he was 6, studied computer science at UC Berkeley, and has built many machine learning applications, he is the the founder and CEO of Percolata developing ”Forecast as a Service”. He is also the council member of Palo Alto in California, and just finished his campaign for congress. Today we’ll talk about his career journey, forecasting, machine learning in blockchain and political campaigns. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.


Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/

Daliana's Twitter: https://twitter.com/DalianaLiu

Greg's Linkedin: https://www.linkedin.com/in/gltanaka/, Twitter: https://twitter.com/GregTanaka

Greg's DAO: https://www.gregtanaka.org/dao


Highlights:

(00:02:10) use computer vision, audio, and Wi-Fi fingerprints to forecast the retail store traffic 

(00:21:55) why time series forecast is hard 

(00:26:39) how he made the forecasting more stable 

(00:28:46) how he troubleshot the spikes and drops in data 

(00:36:04) human trading vs algorithmic trading 

(00:47:36) his vision of machine learning in blockchain 

(00:54:57) why he got into politics 

(01:05:57) advises for people who are interested in Web3 

(01:11:04) AutoML and the future of machine learning

(01:15:36) things he wished he could learn earlier

Jun 23, 202201:30:42
Weather forecasting with AI, Kaggle tips and tricks, dealing with missing data, deep learning with Jesper Dramsch, The Data Scientist Show #040

Weather forecasting with AI, Kaggle tips and tricks, dealing with missing data, deep learning with Jesper Dramsch, The Data Scientist Show #040

Jesper Dramsch is a scientist for machine learning at the European Centre for Medium-Range Weather forecasts. They have a phd in applied Machine Learning to Geoscience from Technical University of Denmark. They are a Kaggle Kernals Expert and TPU star, ranking at top 81/100k worldwide. We talked about weather forecasting, things they learned from Kaggle, how to deal with missing data and ourliers, deep learning, Keras vs Pytorch, XGBoost, their struggles as a phd student, working in the EU vs US. Follow @DalianaLiu for more updates on data science and this show.

(00:01:27) how he got into in ML 

(00:09:10) how he handled missing data 

(00:28:34) Transformers are eating the world 

(00:49:36) Hoover Loss is a fantastic metric to deal with extreme values 

(00:54:48) his experience with Kaggle competition 

(01:02:59) Kaggle tricks that helped his models perform better 

(01:08:18) PyTorch vs Keras 

(01:30:30) working in different countries and cultures 

Resources shared by Jesper:

The newsletter with missing data:

https://buttondown.email/jesper/archive/towels-have-quite-a-dry-sense-of-humor/

The paper by Gael about missing data:

https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giac013/6568998

The Huber Loss:

https://en.wikipedia.org/wiki/Huber_loss

Skill Scores:

https://en.wikipedia.org/wiki/Forecast_skill

Brier Skill in Weather:

https://www.dwd.de/EN/ourservices/seasonals_forecasts/forecast_reliability.html

CRPS Continuous Ranked Probability Score

https://datascience.stackexchange.com/questions/63919/what-is-continuous-ranked-probability-score-crps

ConvNext, Convnets for the 2020s:

https://arxiv.org/abs/2201.03545

Transformers for ensemble forecasts:

https://arxiv.org/abs/2106.13924

Books I recommend:

https://www.amazon.com/shop/jesperdramsch/list/2DYS5KVR5TX0E

Blog posts I wrote about these books:

https://dramsch.net/tags/books/

Short I made about Test-Time Augmentation

https://www.youtube.com/shorts/w4sAh9lKyls

Their links: https://dramsch.net/links

Their open PhD thesis: https://dramsch.net/phd

Newsletter: https://dramsch.net/newsletter

Twitter: https://dramsch.net/twitter

Youtube: https://dramsch.net/youtube

Linkedin: https://dramsch.net/linkedin

Kaggle: https://dramsch.net/

Jun 16, 202201:58:12