The Data Scientist Show
By Daliana Liu
Join 20k subscribers at www.dalianaliu.com to learn more about data science, career, and this show. Twitter @DalianaLiu.
The Data Scientist ShowJun 16, 2022
Why data scientists are tired, six real data scientists' frustrations - The Data Scientist Show #089
Daliana interviewed 6 data scientists from her meetup in New York City. It's a unique episode where you get to hear the real frustrations of data scientists. We talked about struggles working in healthcare, finance, data quality and AI, how to advocate for yourself, and align with your managers.
Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
Why 80% of A/B tests fail, how to 10X your experimentation velocity - Kristi Angel - The Data Scientist Show #088
Most experimentations fail, Kristi Angel shares her expertise on scaling experimentation and avoiding common A/B testing pitfalls. Learn five things that can help boost test velocity, designing impactful experiments, and leveraging knowledge repos. (Chapters below)
Kristi Angel’s LinkedIn: https://www.linkedin.com/in/kristiangel/
Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
(00:00:00) Intro
(00:01:26) Why do most experimentations fail?
(00:07:05) Mistakes in choosing metrics
(00:10:05) Is revenue a good metric?
(00:13:18) Split metrics in three ways
(00:15:10) Daliana's story with too many category breakdowns
(00:16:59) What makes the best data science team?
(00:19:24) Data scientist work in silo vs in a data science team
(00:21:15) Building a knowledge center
(00:23:40) Example of knowledge center; nuance of experimentations
(00:26:09) How many metrics and variants?
(00:30:56) How to reduce noise - CUPED
(00:33:01) Future of A/B testing
(00:38:33) Q&A: Low statistical power
From physics PhD to data science leader, unexpected challenges in survey data, Python vs R, EDA best practices, building MLOps toolkit - Julia Silge - The Data Scientist Show #087
Julia Silge is an engineering manager at Posit PBC, formerly know as R-studio, where she leads a team of developers building open source software MLOps. Before Posit, she finished a PhD in astrophysics, worked for several years in the nonprofit space, and was a data scientist at Stack Overflow where some of her most public work involved the annual developer survey. We talked about MLOps tools, challenges in survey data, text analysis, and balancing her interests in data science and engineering.
Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
(00:00:00) Introduction
(00:00:56) Getting into data science
(00:04:50) Transition from data centers to engineering manager
(00:14:04) Common challenges in tool development
(00:17:38) Challenges with survey data
(00:26:47) Engineering skills for data scientists
(00:28:59) Balancing roles
(00:34:49) Developing skills in Exploratory Data Analysis (EDA)
(00:39:19) Python vs. R for data analysis
(00:44:40) Exciting aspects in career and personal life
Why he created Pandas, the future of data systems, why he left his CTO role to become a chief architect - Wes McKinney - The Data Scientist Show #086
Wes McKinney is the co-creator of pandas library and he is the cofounder of Voltron data. Currently he is a principal Architect at Posit and an investor in data systems.
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
Wes' LinkedIn: https://www.linkedin.com/in/wesmckinn/
(00:00:00) Introduction
(00:00:44) How Pandas Started
(00:06:40) Voltron Data
(00:10:03) Benefits of Easy-to-Use Data Tools
(00:13:20) The Rise of New Data Tools
(00:18:07) Choosing Tools: Vertical or Flexible?
(00:23:01) Big Models and Data Tools
(00:29:29) Challenges in Building a Product
(00:31:28) Becoming a Top Architect
(00:34:55) Missed Aspects of Previous Roles
(00:39:04) A Busy Week: Advising, Designing, Investing
(00:43:42) Improving Open Source
(00:45:24) How to Decide What to Work On
(00:46:28) What he’s learning now
(00:47:56) Excitement in Career and Life
(00:48:29) Using ChatGPT for Learning
(00:50:27) Future Impact Goals
From financial analyst to director of analytics, how to get promoted quickly, 7 elements of influence - Christopher Fricker - The Data Scientist Show #085
Christopher Fricker is a senior director in analytics and BI at Renaissance Learning. He started his career in finance and later became a data science consultant working with Meta, Netflix, and pre-IPO tech companies doing analytics. We talked about the mental models that helped him grow from a finance analyst to an analytics leader.
Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Chris’ LinkedIn: https://www.linkedin.com/in/christopherfricker/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
(00:00:00) Introduction (00:01:46) How to get promoted quickly (00:08:40) Power vs authority (00:11:21) First principal thinking (00:32:34) ROI of a data team (00:40:29) How to be persuasive (00:54:52) All Data is wrong (00:56:22) How he audits the data (01:00:52) How to make someone help you at work
Adapters: the game changer for fine-tuning - Geoffrey Angus - The Data Scientist Show #084
I interviewed Geoffery Angus, ML team lead @Predibase to talk about why adapter-based training is a game changer. We started with an overview of fine-tuning and then discussed five reasons why adapters are the future of LLMs. Later we also shared a demo and answered questions from the live audience. Try fine-tuning for free: https://pbase.ai/GetStarted Geoffrey’s LinkedIn:https://www.linkedin.com/in/geoffreyangus Daliana's Twitter: https://twitter.com/DalianaLiu Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
Geoffrey’s LinkedIn: https://www.linkedin.com/in/geoffreyangus
Try finetuning for free: https://pbase.ai/GetStarted
(00:00:00) Intro
(00:01:19) What is Fine-tuning?
(00:08:18) Utilizing Adapters for Finetuning Enhancement
(00:09:50) 5 reasons why adapters are the future of LLMs
(00:26:34) Common Mistakes in Adapters Usage
(00:28:34) Training Your Own Adapter
(00:32:23) Behind the Scenes of the Adapter Training Process
(00:37:51) Config File Guidance for Fine-Tuning
(00:39:41) Debugging Strategies for Suboptimal Fine-Tuning Results
(00:42:23) User Queries: Creating a LoRa Adapter and Future Support
(00:51:06) Key Takeaways and Recap
Landing a job by analyzing Seattle's crime data, from data scientist to founder of interview query, building a lifestyle business - Jay Feng - The Data Scientist Show #083
Jay Feng created a viral project using Seattle crime data and later got into data science. He later founded "Interview Query" helping data scientists get jobs. We'll talk about how he landed his data science job through his blog, and his journey from data scientist to founder. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
Jay Feng's LinkedIn: https://www.linkedin.com/in/jay-feng-ab66b049/
Jay Feng's YouTube: https://www.youtube.com/c/DataScienceJay
(00:00:00) Introduction
(00:01:11) From engineer to data scientist
(00:03:10) Got a job through a project
(00:05:35) Daliana's portfolio project with Zillow
(00:09:13) From data scientist to entreprenuer
(00:13:19) "Tinder" for job
(00:15:01) How he chose companies to work for
(00:15:56) Why he became an entreprenuer
(00:17:37) How many hours does he work
(00:18:54) Challenges when building "interview query"
(00:20:18) Speed vs scale
(00:22:11) Growth hacks he used
(00:24:22) YouTube vs newsletter
(00:27:21) Lessons he learned as a CEO
(00:29:16) How to grow from tech employee to founder
(00:31:59) How he defines success
(00:34:38) If you have a business idea for Jay
Case studies from the GenAI frontier, scaling ML teams, from biologist to machine learning consultant- Erik Gafni - The Data Scientist Show #082
Erik Gafni builds AI systems and teams. He founded Eventum AI (https://bit.ly/eventum-ai), an ML consulting company working with high-growth startups. We talked about GenAI projects he worked on, how he built production ML systems, how to scale ML teams, and his journey from biologist to ML researcher.
- Interested in working with Erik: https://bit.ly/erik-consulting
- Erik's LinkedIn: https://bit.ly/erik-gafni-LI
(00:00:00) Introduction
(00:01:59) Is GenAI overhyped?
(00:04:28) Ascent translation with AI
(00:11:58) Social media app with AI
(00:14:00) Stable diffusion model evaluation
(00:15:57) "Consult-to-hire" model
(00:17:35) AI in biotech
(00:22:46) Self-supervised learning
(00:31:22) How he hires people
(00:33:19) Research vs production
(00:35:57) Is AGI coming?
(00:37:30) New trends in GenAI
(00:41:45) Data quality in GenAI
(00:42:58) Philosophy in LLMs
(00:49:48) OpenAI vs Open Source
(00:53:58) Mistakes he made
(00:57:41) How did he get into ML
Data science job market in 2024, softskills for interviews, AI engineering - Jay Feng - The Data Scientist Show #081
Jay Feng is the CEO of interview query, a service that help data scientists get jobs. Previously he worked as a data scientist at Nextdoor, Monster. We talked about data science job market, the rise of AI engineering, and the softskills people overlook during interviews. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
Jay Feng's LinkedIn: https://www.linkedin.com/in/jay-feng-ab66b049/
Jay Feng's YouTube: https://www.youtube.com/c/DataScienceJay
00:00:00 Introduction
00:01:11 Data science job market in 2024
00:09:13 Build projects with AI
00:16:19 Softskills in interviews
00:23:18 Daliana's story on "socializing ideas"
00:28:38 Common mistakes in interviews
00:35:30 Product DS vs ML interviews
00:36:27 Product analytics interview questions
00:39:18 Career transition in DS
00:43:04 Jay's career journey
00:45:38 Is there a principal data analyst?
00:51:52 AI engineer
00:54:28 New roles vs obsolete roles in DS
01:04:46 Is data science dead?
How to handle being laid off (as data scientists), severance negotiation, full-time employment vs independent consultant - The Data Scientist Show #080
We are joined by two data scientists who have firsthand experience with layoffs. We’ll talk about how to negotiate severance packages, how to handle stress, strategies for job hunting post-layoff, and how to reduce risks in full-time employment.
Working with Daliana on personal branding: https://forms.gle/heNuZzaHjaAMQwLu6
Her email: daliana@dalianaliu.com
Guests:
Susan Shu Chang:
Linkedin: https://www.linkedin.com/in/susan-shu-chang/
Newsletter: susanshu.substack.com
Sundar Swaminathan
Linkedin: https://www.linkedin.com/in/sswamina3/
Website: https://www.sundarswaminathan.com/
(00:00:00) Introduction (00:06:13) Severance Negotiation (00:20:29) Identity crisis (00:26:22) Job search after layoff (00:30:21) Networking (00:35:23) Risk at pre-seed startups (00:37:03) How should data scientists pick companies (00:40:43) What to ask hiring managers (00:45:01) Does GenAI change interview processes? (00:47:17) Are data science teams getting leaner? (00:48:56) Future of data science roles (00:50:37) Full time employment and job security (00:53:46) Benefits of full time jobs (00:58:14) Reduce risk of being laid off (01:00:43) How to sell yourself (01:02:43) How to plan your finances (01:05:09) How to become an independent consultant
From data analyst to sales engineer, personality-based career design, sales skills for data people - Jenny Wu - The Data Scientist Show #079
Jenny Wu is a data analyst turned sales engineer for data products at Hex. We talked about sales engineer vs data analyst, how to design a career based on your personality, and how to transition into a customer-facing role.
Jenny’s LinkedIn: https://www.linkedin.com/in/jenny-wu-... Daliana's Twitter: https://twitter.com/DalianaLiu Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
(00:00:00) Introduction (00:01:34) What is a Sales Engineer? (00:09:35) Sales Engineering Day-to-Day (00:13:09) Challenge in sales (00:21:37) Traits of Successful Salespeople (00:30:32) Stakeholder Engagement (00:36:24) Getting into customer-facing roles (00:43:55) Quitting her job to travel the world (00:48:05) Advice on Career Breaks (00:50:39) Embedding Career and Personal Goals (00:51:57) How do you achieve happiness?
The future of data science teams, integrating AI into data science workflows, building data apps for stakeholders - Barry McCardel - The Data Scientist Show #078
Barry McCardel is the cofounder and CEO of Hex(free trial: hex.tech/dsshow), a collaborative data workspace. Their customers include FiveTran, Notion, and Anthropic. We talked about what does the future of data team look like, how to tackle challenges of data team collaborations, and how to leverage AI in data science’s workflow. 60-day Free Trial: hex.tech/dsshow Barry’s LinkedIn: https://www.linkedin.com/in/barrymccardel (00:00:00) Introduction (00:01:25) Is AI replacing data scientists?
(00:06:08) Are data science teams getting smaller?
(00:09:54) What is Hex? (00:11:24) How to communicate with stakeholders
(00:24:29) Should data scientists be full stack?
(00:31:23) How data team measure ROI (00:33:35) Quantitative vs qualitative analysis (00:35:33) When you shouldn't use data? Data vs product intuition
(00:41:39) How to hire your first data team? (00:48:59) Is the modern data stack dead?
(00:53:55) GenAI in data science workflows
(00:59:03) Future of data scientist
(01:02:30) New features in Hex
Product data science for Microsoft AI, data scientist's role of GenAI, how to deal with burn out - Sid Sharan - The Data Scientist Show #077
Siddhartha Sharan is a Senior Data and Applied Scientist at Microsoft, helping product teams make data-driven decisions. Currently he is working on an AI product built with OpenAI APIs for sentiment analysis. We talked about how he evaluates AI products built with large language models at Microsoft, product data science, and how he went from a business background to data science. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Sid’s LinkedIn: https://www.linkedin.com/in/siddharthasharan/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
(00:00:00) Introduction
(00:05:20) How does Microsoft evaluate AI product
(00:16:17) Using OpenAI API for sentiment analysis
(00:25:29) Microsoft data science team culture
(00:26:52) DS, PM collaboration
(00:28:29) Three steps to build trust in data science
(00:30:13) How did he got into Microsoft
(00:34:09) Level up in Genetech
(00:36:09) ML engineer vs Product DS
(00:37:43) Core skills in product DS
(00:40:20) Hiring
(00:42:47) How to deal with burnout
(00:45:03) Should you over work to earn trust?
(00:45:44) Daliana's story about first day at Amazon
(00:49:54) Will AI replace data scientists?
(00:51:32) Data scientist's role of GenAI
(00:54:32) How to keep up with GenAI
How she doubled her salary in a year as a data analyst, SQL in the real world, is job hopping bad? - Jess Ramos - The Data Scientist Show #076
Jess Ramos is a Senior Data Analyst at Crunchbase, a LinkedIn Learning Instructor, and a content creator in the data space. She has a bachelor's degree in Math, Spanish, and Business from Berry University and a master's in Business Analytics from University of Georgia. Today we’ll talk about SQL in the real world, data analyst vs data scientist, is job hopping bad, how she negotiated her salary. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career. Jess’ Linkedin: https://www.linkedin.com/in/jessramosmsba/
Preparing to Get a Job in Data Analytics: shorturl.at/sCNPT
Solve Real-World Data Problems with SQL: https://bit.ly/3Zq6wnd
Big Data Energy Newsletter: https://bit.ly/46x4rIR
Daliana's Twitter: https://twitter.com/DalianaLiu Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/ (00:00:00) Introduction (00:01:24) Why Jess left her job at Freddie Mac (00:03:25) Is job hopping bad (00:04:42) How to explain short job stints when interviewing (00:06:49) Jess's day-to-day work and tech stack (00:09:15) SQL in the real world (00:12:10) How to talk data to stakeholders (00:18:33) How Jess prepares for SQL interviews (00:28:11) Data analysts vs data scientists (00:32:11) Choosing a career path (00:47:19) How to ask recruiter questions (00:50:15) Jess's LinkedIn content creation journey (00:59:03) The future of Jess's career (01:03:42) Jess's favorite books
How he got into machine learning and Gen AI at Amazon, how we went from "enemies" to allies - Mehdi Noori - The Data Scientist Show #075
Mehdi Noori is an applied science manager at the Generative AI Innovation Center at Amazon. I used to work with Mehdi while we were at the Machine Learning Solutions Lab at AWS. So before Amazon, Mehdi was a data scientist working on marketing intelligence. Mehdi has a PhD from University of Central Florida in civil engineering and sustainability. Subscribe to Daliana's newsletter for more on data science and career www.dalianaliu.com
Mehdi Noori: https://www.linkedin.com/in/mehdi-noori/
Predicting Soccer Goals: https://aws.amazon.com/blogs/machine-learning/predicting-soccer-goals-in-near-real-time-using-computer-vision/
Why she quit her finance job to become a farmer, exploring a different path from the modern life - Misty Arnold - The Data Scientist Show #074
My friend Misty moved to a farm in Portugal after her 20 years of career in finance. We talked about her experience moving from the busy corporate life to the farm life where she does a lot of manual work. Was it challenging, how does her finance work, and what is her advice to other people who also want to explore a different path outside of the modern city life. I hope this episode will give you a different perspective about your career.
Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career. Daliana's Twitter: https://twitter.com/DalianaLiu Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
(00:00:00) Introduction
(00:11:41) Life on the farm
(00:15:46) Her finance plans
(00:22:55) Her career journey
(00:27:14) What do accountants do
(00:32:29) I thought I would be happy
(00:41:25) Daliana's personal view about finance; when it's enough for you
(00:44:41) Does she feel lonely on a farm?
(00:48:39) What if she didn't leave the corporate world?
(00:54:07) Does she regret her decision
Why he left his MLE job for product data science at Meta, data science at Uber, Linkedin, and Truecar - Pan Wu - The Data Scientist Show #073
Pan Wu is a senior manager of data science at Meta. We talked about why he moved from machine learning to product data science, projects he worked on at Uber, Linkedin, and Meta, and how he transitioned from IC to manager. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Pan’s LinkedIn: https://www.linkedin.com/in/panwu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
(00:00:00) Introduction
(00:01:30) Why he transitioned from MLE to product DS
(00:07:38) Meta data scientists skill sets
(00:15:49) When did his interest shifted from MLE to product DS
(00:18:04) Is MLE more respected?
(00:25:46) A/B testing deep dives in 3 steps
(00:28:21) Built a tool at Linkedin
(00:35:52) How to sell your project
(00:41:07) Junior vs senior data scientist
(00:43:24) From staff data scientist to manager
(00:45:18) Explore being a manager
(00:46:24) Cultures in Uber, Linkedin, TrueCar
(00:52:09) Data science over the past 10 year
(00:55:06) MLE vs DS fun and frustration
(00:57:26) Product DS reality
(00:59:10) Learning new skills
(01:01:39) Mistakes he made
(01:06:34) Future of data science
(01:08:04) Will data scientists be replaced by AI
(01:09:42) Three skills he looks for when hiring
Machine learning in cybersecurity, computer vision in sports, from business analyst to ML engineer - Betty Zhang - The Data Scientist Show #072
Betty Zhang is a data scientist currently working at a cloud security company, previously she was a data scientist at Amazon Web Services. Today we’ll talk about her computer vision projects in Sports, data science use cases in cyber security, from business major to data scientist, what’s her experience working in startups vs big tech companies. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Betty’s Linkedin: https://www.linkedin.com/in/betty-zhang-0bb63731/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
(00:00:00) Introduction
(00:01:21) Computer Vision Project in Sports at AWS
(00:12:28) Challenges in computer vision
(00:14:02) Time allocation for ML projects
(00:15:22) 3 key skills for computer vision
(00:17:20) From business analyst to ML engineer
(00:18:14) How she got her data scientist job through Linkedin
(00:21:32) How she got into Amazon
(00:22:17) Three tech skills needed during Amazon interviews
(00:26:11) Why she joined a Cyber Security startup
(00:27:22) Three cybersecurity use cases
(00:29:47) Anomaly detection
(00:30:40) ML for cybersecurity
(00:34:43) Tech stacks Amazon vs Startups
(00:39:35) Startups vs big tech
(00:45:56) Balance learning and impact
(00:48:35) Advice for new data scientists
Stop abusing A/B testing, toxic experimentation culture, how to run A/B tests with rigor - Che Sharma - The Data Scientist Show #071
Che Sharma came back to discuss toxic behaviors in experimentation culture and provide actionable advice on how to handle those situations, how to have rigor and integrity when designing and analyzing A/B tests.
Che was the 4th data scientist at Airbnb, later he joined Webflow as an early employee. In 2021 he founded Eppo, a next-gen A/B experimentation platform designed for modern data and product teams to run more trustworthy and advanced experiments. We talked about A/B testing best practices, A/B testing for ML models, and Che’s career journey. Reach out to Che: https://www.linkedin.com/in/chetanvsharma/
Academia vs. Industry for Machine Learning, Research at Uber AI Labs, ML for Wind Farms - Jason Yosinski - The Data Scientist Show #070
Jason Yosinski was a founding member of Uber AI Labs. He is also a co-founder of WinscapeAI a company dedicated to using custom sensor networks and machine learning to increase the efficiency and sustainability of wind farms. Jason holds a PhD in computer science from Cornell University. We talked about his experience at Uber AI, his research in deep learning, and ML for wind farms. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career. Jason’s Website: https://yosinski.com/ Jason’s LinkedIn: https://www.linkedin.com/in/jasonyosinski/ Daliana's Twitter: https://twitter.com/DalianaLiu Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction (00:06:06) His advice for Uber ML teams (00:16:03) From research to industry (00:20:24) ML for wind farms (00:25:40) Metrics for wind energy prediction (00:29:23) Start with a small dataset (00:32:00) ML in academia vs. the industry (00:33:24) Do you need a PhD for ML? (00:38:14) Daliana's story about grad school (00:41:37) The value of a PhD (00:43:13) ML Collective (00:48:36) Technical communication (00:57:21) ML Skillsets (00:59:45) Future of machine learning (01:05:23) Personal development: Hoffman process (01:15:13) Do things that excites you
Ads forecasting at Netflix and Spotify, how to build your personal moat - Jeff Li - The Data Scientist Show #069
Jeff Li is a senior data scientist at Netflix, focusing on Ads forecast. Previously he was a data science manager at Spotify, worked on supply forecasting, demand forecasting, and data infrastructure. He studied business at the University of Southern California. We talked about Ads forecasting, career path as a manager vs IC, culture in Spotify vs Netflix vs Doordash. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career. Jeff Li’s LinkedIn: https://www.linkedin.com/in/lijeffrey/ Daliana's Twitter: https://twitter.com/DalianaLiu Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction (00:00:45) Got into data science from poker and consulting (00:07:54) Ads forecasting at Netflix and Spotify (00:13:30) From IC to manager to IC (00:14:53) how to measure forecasting models (00:21:58) collaborating with stakeholders in sales (00:29:44) how he became an expert in ads forecasting (00:49:57) impact sizing at Doordash (00:57:34) Company culture differences (DoorDash, Spotify, Netflix) (01:12:47) how he wants to grow his career
A/B testing at Airbnb, building next-gen experimentation platform at Eppo - Che Sharma - The Data Scientist Show #068
Che’s LinkedIn: www.linkedin.com/in/chetanvsharma/
Try Eppo for A/B testing: www.geteppo.com/
Daliana's Twitter: twitter.com/DalianaLiu
Daliana's LinkedIn: www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:01:26) Getting started in data science at Airbnb
(00:03:08) Keys to successful A/B testing
(00:06:53) Interpreting and communicating A/B test results
(00:15:00) A/B testing best practices testing machine learning models
(00:41:39) Centralizing experiment analysis
(00:53:46) Preparing data scientists for the future
(00:59:33) Developing communication skills as a data scientist
(01:08:43) Transitioning from individual contributor to manager
(01:12:28) The future of experimentation
From data scientist@Meta to full-time YouTuber (500k+ sub), AI engineering, future of work - Tina Huang - The Data Scientist Show #067
We talked about self-learning, productivity, how Tina navigates her career change and how she thinks AI could change the future of work.
Tina's YouTube: www.youtube.com/@TinaHuang1
Lonely Octopus: www.lonelyoctopus.com
Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Tina Huang is a data scientist turned YouTube creator with 500k subscribers. She is the founder of Lonely Octopus, an online program helping people gain data science, AI, and freelancing skills. She originally studied pharmacology before transitioning into tech, completing a master's degree in computer science at UPenn.
(00:02:38) Transitioning from Data Science to Content Creation
(00:06:29) Preparing for Data Science Interviews
(00:10:59) Starting a YouTube Channel
(00:14:18) Building Multiple Income Streams
(00:17:35) Getting Started with AI Skills
(00:29:29) Advice for Starting YouTube
(00:34:47) Improving Storytelling Skills
(00:36:58) Overcoming Procrastination
(00:42:33) The Future of Work
(01:47:08) Looking to the Future
(01:26:49) Income Breakdown
Making LLMs hallucinate less, how to diagnose ML models, from PM in Google AI to CEO of Galileo - Vikram Chatterji - The Data Scientist Show #066
Vikram is the co-founder of Galileo – an AI diagnostics and explainability platform used by data science teams building NLP, LLMs and Computer Vision models across the Fortune 500 and high growth startups. Prior to Galileo, Vikram led Product Management at Google AI, where his team built models for the Fortune 2000 across retail, financial services, healthcare and contact centers. He has a master degree from Carnegie Mellon University from the school of computer science. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Resources:LLM Studio: https://www.rungalileo.io/blog/announcing-llm-studio
Galileo: https://www.rungalileo.io/
Blog on LLM Hallucination: https://thesequence.substack.com/p/guest-post-stop-hallucinations-from
Vikram Chatterji’s LinkedIn: https://www.linkedin.com/in/vikram-chatterji/
"The Mom Test": https://www.amazon.com/The-Mom-Test-Rob-Fitzpatrick-audiobook/dp/B07RJZKZ7F
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:04:24) How he got into machine learning
(00:06:53) Diagnosing large language models
(00:09:56) Addressing model hallucination
(00:12:46) Metrics for measuring hallucination
(00:17:30) From Google AI to starting Galileo
(00:24:08) Developing LLMs and putting them into production
(00:32:51) Galileo's diagnostics and explainability platform
(00:43:16) Advice for data scientists when joining a startup
Data Science "Mix Martial Arts", applied re-inforcement learning, scaling AI workloads using Ray - Max Pumperla - The Data Scientist Show #065
Max Pumperla designed his own career path in data science. He is a freelance software engineer at AnyScale, and also a data science professor. We talked about reinforcement learning, open source contributions, Ray for data scientists, and his view on the data scientists role. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Max’s LinkedIn: https://www.linkedin.com/in/max-pumperla-a8099354/
Max's GitHub: https://github.com/maxpumperla
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:09:19) How he got a remote job through Twitter
(00:14:06) Introduction to Ray
(00:18:52) Reinforcement learning
(00:23:56) Key lessons on integrating customer feedback
(00:35:12) Flaws in data science job titles
(00:45:51) How to be irreplaceable as a data scientist
(00:48:55) An unconventional career path as a data scientist
(01:12:24) Productivity and work-life balance
(01:28:10) Advice for building a personal brand
Uber's ML Systems (Uber Eats, Customer Support), Declarative Machine Learning - Piero Molino - The Data Scientist Show #064
Piero Molino was one of the founding members of Uber AI Labs. He worked on several deployed ML systems, including an NLP model for Customer Support, and the Uber Eats Recommender System. He is the author of Ludwig , an open source declarative deep learning framework. In 2021 he co-founded Predibase, the low-code declarative machine learning platform built on top of Ludwig. Piero's LinkedIn: https://www.linkedin.com/in/pieromolino
Predibase free access: bit.ly/3PCeqqw
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:01:54) Journey to machine learning
(00:03:51) Recommending system at Uber Eats
(00:04:13) Projects at Uber AI
(00:09:34) Uber's customer obsession ticket system
(00:16:01) How to evaluate online-offline business and model performance metrics
(00:17:16) Customer Satisfaction
(00:28:38) When do you know whether a project is good enough
(00:41:50) Declarative machine learning and Ludwig
(00:45:32) Ludwig vs AutoML
(00:54:44) Working with Professor Chris Re
(00:58:32) Why he started Predibase
(01:07:56) LLM and GenAI
(01:10:17) Challenges for LLMs
(01:22:36) Advice for data scientists
(01:34:29) Career advice to his younger self
Data science in transportation, the intersection of operations research and ML - Holger Teichgraeber - The Data Scientist Show #063
Holger Teichgraeber is a Data Science Manager at Archer Aviation. Previously, he worked at Convoy as a Research Scientist on their trucking marketplace, and at various companies in the energy space. Holger has a Bachelor's degree in Mechanical Engineering from Aachen, Germany, and a Masters and Ph.D. with research focus on machine learning and optimization applied to energy systems from Stanford University. He regularly writes on LinkedIn, with the goal to show how to build valuable products at the intersection of machine learning and optimization in production. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Holger's LinkedIn: https://www.linkedin.com/in/holgerteichgraeber/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:01:28) How he got into operations research
(00:02:39) Operation research vs data science
(00:04:37) Trucking optimization at Convoy
(00:08:42) Optimization problem
(00:10:18) Strategic planning on air mobility at Archer
(00:13:50) Using simulation and solving a problem
(00:16:45) Big data science work vs smaller data science work
(00:21:23) Stakeholder management
(00:29:28) IC vs Manager
(00:32:04) Advice on promotion
(00:39:12) Work cultures in Germany and the US
(00:41:16) How to handle tight deadlines
(00:43:21) Important feedback from his work
(00:44:14) How to plan projects
(00:44:45) Next big challenge for data science teams
(00:45:40) Career growth in the next few years
(00:46:01) Connect with Holger
Tackling data quality issues, 5 pillars of data observability, from management consultant to CEO of Monte Carlo - Barr Moses -The Data Scientist Show #062
Barr Moses is a consultant turned CEO & Co-Founder of Monte Carlo, a data reliability company. She started her career as a management consultant at Bain & Company and a research assistant at the Statistics Department at Stanford University. Later, she became VP of Customer Operations at customer success company Gainsight, where she built the data and analytics team. She also served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a B.Sc. in Mathematical and Computational Science. Today, we’ll talk about Barr’s career journey, data reliability and observability, and what it means for data teams. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science. Barr's LinkedIn: https://www.linkedin.com/in/barrmoses/ Daliana's Twitter: https://twitter.com/DalianaLiu Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:01:24) How did she got into data science
(00:08:26) Frameworks for data-driven decisions
(00:11:20) Is customer support ticket always bad?
(00:15:20) How to quickly find out what is true
(00:20:17) Struggles in the data team
(00:23:37) Daliana’s story about lineage
(00:28:00) People stressed about data
(00:28:09) Netflix was down because of wrong data
(00:30:40) Common issues with data quality
(00:33:14) 5 pillars of data observability
(00:39:14) How does Monte Carlo help data scientists
(00:43:08) Build in-house vs adopt tools
(00:45:48) How Daliana fixed a data quality issue
(01:02:44) How to measure the impact of the data team
(01:09:09) Mistakes she made
(01:15:28) Beat the odds
Is search dead? Google vs ChatGPT, from Google Search to enterprise search at Glean, machine learning in search, tech layoffs - Deedy Das - The Data Scientist Show #061
Deedy Das is a founding engineer at Glean, an enterprise search startup. Previously, he was a Tech Lead at Google Search working on query understanding and the sports product in New York, Tel Aviv, and Bangalore. Before that, he was an engineer at Facebook New York and graduated from Cornell University. Outside of work, Deedy writes on his blog. He published a viral resume template and his work on exposing grading flaws in the Indian education system. He also enjoys running marathons, road cycling, and playing cricket. Today we’ll talk about the search projects he worked on at Google, why he left Google, his current work at Glean, and his thoughts on whether Google is doomed because of ChatGPT. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science.
Deedy's Twitter: https://twitter.com/debarghya_das?s=20
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:01:52) What is search
(00:04:33) Query understanding
(00:12:46) Google vs ChatGPT
(00:18:24) Fixing bug for Sundar Pichai
(00:27:33) Why he left google
(00:30:32) How to get into search
(00:34:38) Enterprise search at Glean
(00:46:55) Advice for people who got laid off
(00:48:41) What do search engineers do
(00:51:37) How he evaluates candidates
(00:53:58) Future of search
(00:57:16) Why the web is declining
(00:59:25) Copilot and AI-powered developer tools
(01:03:46) Indian startup ecosystem
(01:07:45) India vs Silicon Valley
(01:09:48) How he grew 30k followers on Twitter
(01:13:28) Daliana and Deedy’s challenge with social media
(01:19:31) Career mistakes he made
The 100-hour work week of an self-taught machine learning researcher, how he got into Google Brain, why he started Omni - Jeremy Nixon - The Data Scientist Show #060
Jeremy Nixon is a machine learning researcher, software engineer, and startup founder. Previously he was a software engineer at Google Brain working on deep learning. Now, he is the co-founder and CEO of Omni, building an immersive information retrieval system for you and your team. He studied applied math at Harvard University. Today we’ll talk about how he got into Google brain, his 3-month self-learning plan to learn machine learning, his startup, and how he executed his goal relentlessly since 2016. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science.
Jeremy's Twitter: https://twitter.com/JvNixon
Jeremy's Blog: https://jeremynixon.github.io/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
Jeremy's LinkedIn: https://www.linkedin.com/in/jeremyvnixon
(00:00:00) Introduction
(00:01:50) Research in Google Brain
(00:03:37) How he got into Google Brain
(00:07:56) His 3-month plan to learn ML
(00:17:55) The 100-hour workweek
(00:33:26) What if he is tired
(00:39:59) Why he found Omni
(00:44:24) Data science problems in Omni
(00:54:42) Future of machine learning
(00:57:51) Silicon Valley is very accessible
(00:59:47) The golden handcuffs
(01:06:58) From data scientist to full-stack engineer
(01:09:06) Close-minded data scientists
(01:24:10) Advice to ML learners
(01:29:41) Something he wished that he did when he was younger
(01:37:25) The future of his career
(01:42:17) Connect with Jeremy
The power of error analysis, tree models for search relevancy, what ChatGPT means for data scientists - Sergey Feldman - The Data Scientist Show #059
Sergey Feldman is the head of AI at Alongside, providing mental health support for students. He is also a Lead Applied Research Scientist at Allen Institute for AI, where he built an ML model that improved search relevancy for scientific literature. Sergey has a PhD in Electrical and Electronics Engineering from the University of Washington. Today we’ll talk about machine learning for search, his consulting project for the Gates Foundation, AI for mental health, and career lessons. Make sure you listen till the end. If you like the show, subscribe, leave a comment, and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's Twitter: https://twitter.com/DalianaLiuDaliana's
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Sergey's LinkedIn: https://www.linkedin.com/in/sergey-feldman-6b45074b/
Data Cowboys: http://www.data-cowboys.com/
Sergey Feldman: You Should Probably Be Doing Nested Cross-Validation | PyData Miami 2019: https://www.youtube.com/watch?v=DuDtXtKNpZs
December 4th, 2018 - Breakfast with WACh with Dr. Sergey Feldman, PhD: https://www.youtube.com/watch?v=vA_czRcCpvQ
(00:00:00) Introduction
(00:01:24) Machine learning skeptic
(00:03:02) Tree-based models for search relevance
(00:14:34) How to do error analysis
(00:19:20) Nested cross-validation
(00:21:34) Model evaluation
(00:30:43) Error analysis common mistakes
(00:33:37) How to avoid overfitting
(00:35:56) Consulting project with Gates Foundation
(00:41:16) Tree-based models vs linear models
(00:45:19) Working with non-tech stakeholders
(00:50:20) Chatbot for teen’s mental health
(00:54:32) Can ChatGPT provide therapy?
(00:58:12) How he got into machine learning
(01:02:12) How to not have a boss
(01:03:46) Feelings vs Facts
(01:09:02) Future of machine learning
(01:11:30) How to prepare for the future
(01:13:39) AutoML
(01:17:12) His passion for large language models
How to build data science muscle memory, DeepChecks -- an open source ML testing suite - Philip Tannor - The Data Scientist Show #058
Philip Tannor is the Co-Founder and CEO of Deepchecks, a python package to run checks for machine learning models. Previously, he was the head of data science group at the Isreal Defense Force. He has a master's degree from Tel Aviv University in engineering, his thesis was about a new algorithm that combines neural networks with gradient-boosting decision trees. Today we’ll talk about his career journey, how to build your data science muscle memory, the algorithm he worked on, and how to check ML models. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science and career.
Daliana's Twitter: https://twitter.com/DalianaLiuDaliana's
LinkedIn: https://www.linkedin.com/in/dalianaliu/
Philip’s LinkedIn: https://www.linkedin.com/in/philip-tannor-a6a910b7/?originalSubdomain=il
Augboost: https://medium.com/@ptannor/augboost-like-xgboost-but-with-few-twists-e4df4017a5c4
(00:00:00) Introduction
(00:01:17) How did he get into ML
(00:02:52) Data science in the military
(00:08:15) How to take feedback
(00:13:24) Handling criticism
(00:15:12) What he worked on
(00:18:18) testing deployment
(00:21:28) How to build the data science muscle memory
(00:27:09) Improving the skills of data scientists
(00:30:42) His thesis in grad school
(00:36:59) Combine NN and gradient boosting
(00:40:05) Aug boost
(00:41:15)Tools he uses
(00:45:58) Deepchecks
(00:50:46) Most challenging part of building Deepchecks
(00:52:05) How can people contribute
(00:53:40) Behind the scenes
(00:56:09) Deciding how to fix or improve the model
(01:00:49) Advise for those who wanna create open-source projects
(01:04:07) Features to add for the enterprise product
(01:06:57) About his life and career right now
(01:08:27) Connect with Philip
The Daliana Special: how did I got into data science, 5 things only experienced data scientists know, and why I started "The Data Scientist Show" - Daliana Liu #057
Who is Daliana? This is a conversation I had in 2021 with Harpreet Sahota. I talked about my unexpected journey to data science all the way back in high school, things I wish I could know earlier about my career, the projects I worked on, what is like to be a quote-and-unquote influencer on Linkedin, and more. If you want more content from me, I write about data science and career nerdy jokes, on my Linkedin and you can subscribe to my very infrequent newsletter at dalianaliu.com. I’m curious what you think about this episode, leave a comment on YouTube or send a DM on Linkedin. Hope you enjoy the Daliana special!
Daliana's Newsletter: https://dalianaliu.com
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Harpreet's LinkedIn: https://www.linkedin.com/in/harpreetsahota204/
The artist of the data science podcast: https://theartistsofdatascience.fireside.fm/
(00:00:00) Introduction
(00:02:52) Where did Daliana grow up
(00:05:19) Daliana in highschool
(00:07:11) How did she got into data science
(00:11:36) Why is writing important for data scientist
(00:15:51) How to write better
(00:20:56) Career lessons you didn't learn in school
(00:27:40) Imposter syndrome
(00:31:29) Day-to-day work as a data scientist
(00:36:16) Most common mistakes data scientists make
(00:39:41) Data Analyst vs. Data Scientist
(00:42:30) What is the science in data science?
(00:44:51) Can everyone be a data scientist
(00:49:21) Linkedin profile tips for job search
(00:52:59) How she creates content
(00:54:11) Being a data scientist "influencer"
(00:56:04) Why she started "the data scientist show"
(01:01:16) Women in data science
(01:06:39) What's her legacy
(01:09:43) What is she reading
(01:14:21) Connect with Daliana
How he carved his own path at Airbnb, from data engineer to CEO of Mage - Tommy Dang - the data scientist show #056
Tommy Dang is the Co-founder and CEO of Mage, a data ingestion and transformation pipeline for data engineers (https://github.com/mage-ai/mage-ai). Previously, he was working on data engineering and machine learning engineering at Airbnb. He has a bachelor degree of science in UC Berkeley studying economic, history, and sociology. Today we’ll talk about how he learned engineering and machine learning after college, data tools and ML tools he built at Airbnb, performance review, and how he navigates his career. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science and career.
Tommy’s LinkedIn: https://www.linkedin.com/in/dangtommy/
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
(00:00:00) Introduction
(00:01:28) Get into computer science from non-tech background
(00:03:08) How he started his first project
(00:04:07) Projects at Airbnb
(00:06:09) Speed vs Quality when building data pipelines
(00:16:34) How to deal with AdHoc requests
(00:21:00) How did he learn machine learning
(00:24:04) How he convinced data scientists to teach him ML
(00:25:15) Performance review
(00:27:11) Don’t let your job title limit your career
(00:28:29) Why he started his company
(00:31:38) Build your own tool vs use open source solutions
(00:33:12) Transitioning from an engineer to a CEO
(00:34:50) Earn trust from internal stakeholders
(00:36:27) Career advice
(00:41:31) How he carved his own path at Airbnb
(00:46:00) How did he learn to be a good engineer
(00:47:10) Best advice for data scientists or engineers
(00:48:41) Most important quality of data scientists or engineers
(00:51:51) Design principles
(00:58:51) Future of tools
(01:01:00) What does he think about his future career
(01:05:05) Inspiration of Tommy
How to effectively test and debug machine learning models, from ML engineer@Apple to startup founder - Gabriel Bayomi - the data scientist show #055
Gabriel Bayomi is the Co-Founder at OpenLayer, a tool that tests & debugs machine learning models. OpenLayer was in the YCombinator’s batch in 2021, building tools for machine learning model testing. Previously he was a machine learning engineer at Apple working on Siri. He has a master degree in computer science from Carnegie Mellon. He is passionate about Natural Language Processing, Machine Learning, and Computational Social Science. We talked about how to test and debug machine learning models, his experience at Apple, and career lessons. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science and career.
Gabriel’s LinkedIn: https://www.linkedin.com/in/gbayomi
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
(0:00) Intro
(01:01:39) How he got into machine learning
(01:06:43) His experience at Apple, Siri
(01:15:55) How to validate the solution
(01:19:39) Benefits of using external error analysis framework
(01:21:30) How to build a model evaluation pipeline
(01:28:26) Don’t overfit the subset of data
(01:33:19) Your validation set shouldn’t be fixed
(01:41:03) Become one with data
(01:44:05) Three model interpretability library you should use
(01:50:47) Common mistakes people made in model validation
(01:53:33) How to create an adversarial test
(01:55:43) How to check data quality
(01:06:46) Transition from engineer to executive
(01:10:04) Things he learnt from his favorite coworker
(01:17:57) how job roles would evolve
From Amazon research scientist to head of data product at Vestiaire Collective, why data science projects fail, how to be a good communicator - Alisa Kim - the data scientist show #054
Alisa Kim is the head of data product at Vestiaire Collective. Previously, she was a research scientist at Amazon Web Services. We used to work on the same team in Machine Learning Solutions Lab and Amazon Web Services. We have collaborated on projects before and previously she was a consultant and worked on analytics and investment banking. She has a Ph.D. in Econ AI and she has worked on various industries and multiple continents. She's someone I really enjoyed working with. We talked about her journey, the projects she worked on and the lessons she learnt. If you like the show subscribe to the channel and give us a 5 star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Alisa's LinkedIn: https://de.linkedin.com/in/alisakolesnikova
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's twitter: https://twitter.com/DalianaLiu
(0:00) Intro
(00:01:38) how she got into data science
(00:04:38) day-to-day at AWS ML Solutions Lab
(00:08:00) AWS leadership principles
(00:16:34) challenges the consultant faces when working with external customers
(00:23:36) from AWS to Vestiaire Collective
(00:37:54) how to build a better data product
(00:44:17) how data scientist can align with business stakeholders
(00:57:52) from tech to business
(01:01:33) how to develop communication skills
(01:09:17) increase visibility of the data science team
(01:17:22) being proactive vs being passive in chasing opportunities
(01:24:06) get feedback from your "nearest neighbors"
(01:25:37) how to set boundary at work
(01:38:48) mistakes she made in her career
(01:48:25) how to manage disagreement
(01:57:53) future of data science
The lessons from almost losing a million dollars for his company, how to build good data assets and get buy-in from the leadership - Mark Freeman - the data scientist show#053
Mark Freeman is a community health advocate turned data scientist His mission is to improve the well-being of people, especially among those marginalized. He is currently a senior data scientist at Humu where he builds data tools that drive behavior change to make work better. He has a master degree from the Stanford School of Medicine in clinical research, experimental design and statistics. He also has a certificate in entrepreneurship from the Business School of Stanford. In his free time, he volunteers with a Bay Area Community Health Advisory Council. He also plays Men's Division III Rugby. We talked about the building data tools, data engineering skills for data scientist, how to pitch a projects, and his career journey. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Mark's LinkedIn: https://www.linkedin.com/in/mafreeman2/
Chapters:
(0:00) Intro
(00:03:05) Our experience using R - 1000 lines of code
(00:09:22) Entrepreneurship within a company
(00:16:25) DBT and modern data stack
(00:20:15) Tools don’t matter (in interviews)
(00:21:09) Things DE enjoys but DS doesn’t
(00:24:55) How to work with different stakeholders
(00:30:32) Common SQL mistakes
(00:33:34) SQL vs Python vs R
(00:35:26) T.R.I.B.E framework for projects
(00:40:43) Meet the stakeholders where they at
(00:42:40) Use feedback to get buy-in from collaborator
(00:46:36) How to pitch a new idea
(00:49:45) Don’t lead with solution, lead with the problem
(00:51:03) How to get buy-in from the leadership
(00:57:56) Present an idea as if the audience came up with it
(00:58:41) How to iterate a project
(01:00:27) How he almost lost 1 Million dollar for his company
(01:02:07) Things he learned from his manager
(01:04:19) Things that help people make changes effectively
(01:06:05) Things he learned from mentoring
(01:12:19) Mental Health and anxiety
(01:17:12) Web3
(01:20:14) Why he cares about community health
(01:25:40) "Soul - searching" on his future
(01:28:36) Why he write on LinkedIn
(01:30:04) Future of data science
From deep learning architect at AWS to PM in AI product - Abhi Sharma - the data scientist show #052
Abhi Sharma started his career as a software engineer at Amazon Lab 126, building cloud services for Alexa. Later he transferred to Amazon Web Services as a deep learning architect. We used to work at the same team at machine learning solutions lab in AWS. Currently, he is a product manager, responsible for machine learning products like chatbot at Chime. We talked about how he transitioned his career from software engineer to deep learning architect and to a product manager, cool projects he worked on, and our shared experiences at Amazon. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Abhi's LinkedIn: https://www.linkedin.com/in/abhivs/
Highlights:
(0:00) Intro
(00:01:48) from SWE to deep learning architect to product manager
(00:12:44) day-to-day as a product manager at Chime
(00:19:46) how he collaborates with different data personas
(00:27:21) how to negotiate for more time for projects with leaders
(00:33:59) some timelines are negotiable
(00:38:00) most impactful project he worked on
(00:44:22) how to evaluate KPI, and not game the system
(00:48:02) think about development in the beginning
(00:50:29) data scientists need to educate the business and demystify the buzz words
(00:54:19) Amazon’s Think Big Challenge
(00:57:09) Never solve the problem twice
(01:00:25) How to transition to a product manager
(01:07:48) why he wanted to become a PM
(01:25:35) How can data scientist learn from PM
What data scientists need to know about MLOps principles, from GPA 2.6 to Sr. MLOps Engineer@Intuit - Mikiko Bazeley - the data scientist show051
Mikiko Bazeley is a senior software engineer working on MLOps at Intuit. Previously, she worked as a growth hacker, data analyst in Finance, then become a data scientist, and later transitioned into machine learning. She has a bachelor degree in econ, biological anthropologie, did data science bootcamp at springboard. She is a tech writer for NVIDIA and she’s working on a course on MLOps. Her goal is to demystify MLOps & show how to develop high-quality ML products from scratch. You can find her content on Linkedin and YouTube. Today, we’ll talk about useful engineering principles for data scientists, MLOps, and her career journey. Subscribe to www.dalianaliu.com for more on data science and career. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Mikiko's Linkedin: https://www.linkedin.com/in/mikikobazeley/
Highlights:
(0:00) Intro
(00:02:00) from GPA2.6 to data scientist
(00:05:27) her experience at Mailchimp
(00:11:44) her frustrations on Cookiecutter project
(00:14:09) the pain point of a data scientist working with engineering
(00:21:01) 2 MLOps pattern
(00:25:52) challenges about her work
(00:29:49) the basic engineering skills a data scientist should have
(00:32:46) the tests a data scientist should write
(00:37:42) how an MLOps engineer collaborates with a data scientist
(00:45:28) what makes a good MLOps engineer
(00:52:33) AWS vs GCP vs Azure
(00:58:59) how a data scientist collaborates with an MLOps engineer
(01:05:19) suggestions for building a model on a large scale
(01:09:11) how she learnt MLOps on her own within 6 months
(01:17:32) learn from code review
(01:19:17) MLOps books and resources she recommended
(01:24:13) mistakes she made earlier in her career
(01:31:29) common mistakes people make during career change
(01:38:22) "Start with the end in mind"
(01:41:16) the future of MLOps
(01:46:23) how she sees her career growth
(01:56:40) how she continues learning new skills
(02:00:09) what she is excited about her career and life
Bayesian thinking in work and life, ad attribution models and A/B testing, machine learning@Foursquare - Max Sklar - the data scientist show050
Max Sklar is an independent engineer and researcher. Previously, he was an engineering and Innovation Labs Advisor at Foursquare after 7 years at the company as a machine learning engineer. Previously, he has worked on Ad Attribution, recommendation engine, ratings. He is the host of The Local Maximum podcast. Max studied CS from Yale, and holds a Master degree in information systems from New York university. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Max's Linkedin: https://www.linkedin.com/in/max-sklar-b638464/
Max’s website: localmaxradio.com/about
Interviews he mentioned during the podcast:
Andrew Gelman, Statistics at Columbia University
Johnny Nelson on Free Speech and Moderation online
Stephanie Yang talking about Foursquare's Venue Rating System
Dennis Crowley: on Labs, on Innovation
Sophie Carr (Bayesian Mathematician)
Other Episodes Mentioned
Interview on Learn Bayesian Statistics
Highlights:
(0:00) Intro
(00:01:23) from computer science to machine learning
(00:05:35) Bayesian methods in rating system
(00:14:53) how to choose a Bayesian prior
(00:20:10) how to deal with p-hacking
(00:26:57) causality model in ad attribution
(00:35:20) Bias-correction methods
(00:45:43) negative lift in advertising
(00:51:05) unexpected consumer behaviors
(00:52:08) why he decided not to climb the "engineer ladder"
(00:56:46) the challenges of having 5 managers in a year
(01:01:38) using the 3rd-party software vs building his own
(01:04:18) how he approaches ML problems
(01:07:51) his tech stack
(01:09:25) his advise on learning machine learning
(01:12:40) projects he is working on
(01:17:10) Bayesian for his life decisions
(01:22:00) how writing helps him
(01:23:48) the confusion, stress and excitement in his career
Why he quit a $500k+ machine learning job at Meta (Facebook): a candid review of his experience, mistakes, and ML best practices - Damien Benveniste - the data scientist show049
Damien Benveniste is a data scientist and software engineer. Previously, he was a machine learning tech leader and mentor. He has worked for almost ten years in different machine learning roles in different industries such as AdTech market research, e-commerce and health care. He has a Ph.D. in physics from Johns Hopkins University and now working towards co-founding own startup in employee engagement space. We talked about his career journey, how he solved challenging problems, and his advice for new data scientists and engineers. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Damien's Linkedin: https://www.linkedin.com/in/damienbenveniste/
(00:00) Intro
(00:01:17) from quantitative trading to machine learning
(00:07:52) his experience at Meta
(00:21:16) automated machine learning
(00:28:52) model paradigm
(00:32:47) the productivity-oriented culture at Meta
(00:41:42) short-term gain vs long-term goal
(00:44:38) things he liked at Meta
(00:51:54) the project that shaped his career
(01:03:56) the importance of having a baseline for ML models
(01:09:12) why he time-boxed everything
(01:16:25) test the model in production
(01:20:05)experimental design for ML
(01:23:25) the most challenging project he worked on
(01:37:07) best practices for machine learning
(01:48:44) how he sees himself
(02:00:52) lessons he learnt from being layoff
(02:06:45) frustration he had in his previous job
(02:16:14) what he is working on
(02:29:18) the future of machine learning
(02:39:52) things he is excited about
Time series modeling in supply chain, how to master business communication, save the environment with data science - Sunishchal Dev - the data scientist show048
Sunishchal Dev is a lead data scientist at Booster. He's helping to decarbonize the transportation industry by optimizing last mile delivery of renewable fuels. Previously, he was a management consultant. On the side, he volunteers with Project Drawdown to model the most effective solutions to climate change. He is also a mentor of future data scientist as a springboard by guiding them through real world projects. We talked about his career journey, supply chain optimization, how data science can help the environment. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
(0:00) Intro
(00:01:24) from business to data science
(00:06:36) the big impact of a small improvement
(00:08:50) data engineering vs predictive modeling
(00:11:48) routing optimization
(00:16:27) time series model
(00:21:32) use upsampling to simulate intermittent time series problem
(00:26:20) his modern data stack
(00:28:29) collaborate with engineers
(00:30:06) common mistakes people made in building time series model
(00:37:02) collaborate with truck drivers
(00:40:17) how to become a good communicator
(00:46:30) his experience in mentoring data scientist
(00:51:14) things people cannot learn at school
(00:53:16) the mistakes he made and the things he learnt from his mentor
(00:56:07) how data science can help the environment
Books recommended:
The Pyramid Principle: Logic in Writing and Thinking
The Book of Why: The New Science of Cause and Effect
Influence, New and Expanded: The Psychology of Persuasion
Product data science@Spotity, from management consultant to data scientist, salary negotiation, managing ADHD - Felicia Rutberg - the data scientist show047
Felicia Rutberg is a product strategy and analytics manager at Snap, previously she was a product data scientist at Spotify. She started her career as a management consultant at Accenture. She studied mathematics and cognitive psychology at the Vanderbilt University. Felicia reached out to me on Linkedin because she wanted to share how she became a data scientist while having ADHD. Today we’ll talk about product analytics at Spotify and Snap, her career journey, and ADHD. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Felicia's Linkedin: https://www.linkedin.com/in/feliciarutberg/
Highlights:
(00:01:29) from management consulting to data science
(00:12:20) financial data analyst at Spotify
(00:20:06) how to do internal job transition
(00:25:57) product data scientist at Spotify in the econometrics team
(00:29:33) how she became more vocal on the creative process
(00:33:48) how to get the last 1% of the work done
(00:38:53) how to ensure the quality of the analysis
(00:50:19) propensity score matching at Spotify
(00:57:09) how to validate causal inference outcomes
(01:00:51) lessons from working with economists
(01:19:16) from Spotify to Snap
(01:27:35) salary negotiation
(01:34:02) day-to-day at Snap
(01:38:33) Spotify vs Snap
(01:44:35) lessons from management consulting that helped her data science journey
(01:47:37) ADHD and self-compassion
(02:02:52) the books she recommended
(02:08:26) her future career
Data science interviews trends, from being laid off to landing a data scientist job at Airbnb - Emma Ding - the data scientist show #046
Emma Ding is a data scientist turned career coach. Previously she was a data scientist and software engineer at airbnb. I first discovered her through a viral Medium blog called “how I got 4 data science offers and doubled my income 2 months after being laid off". Today, her mission is to help data scientists land their dream offers by being strategic and efficient in their interview preparation at https://www.datainterviewpro.com/. Among the 80 clients she worked with, 90% of them received data scientist job offers from top tech companies, such as meta, linkedin, doordash, robinhood, etc. We talked about how she doubled her salary and got into Airbnb after she was laid off , her experience at Airbnb during the first half of the podcast, and then we’ll dive into new trends in data science interviews and her best strategy to get a data scientist job. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Emma's YouTube: https://www.youtube.com/c/
DataInterviewPro Free product case class: https://www.datainterviewpro.com/product-case-masterclass-registration
Books on causal inference: Mostly harmless econometrics and Mastering Metrics: The Path from Cause to Effect.
Emma's Linkedin: https://www.linkedin.com/in/emmading001/
(00:00) Intro
(00:04:24) her strategy to get the data scientist offer after the layoff
(00:07:00) advices for preparing interviews
(00:14:04) her day-to-day at Airbnb
(00:16:46) things she learnt from her mentor
(00:18:07) from a data scientist to a SDE to a data interview pro
(00:22:12) trends of data science interview
(00:26:48) data scientist tracks: analytics-driven vs algorithms-driven
(00:32:56) SQL interviews: readability and proficiency
(00:35:06) make a study plan, execute it and keep the confidence
(00:41:29) what she teaches in her datainterview.com
(00:43:45) how to tackle take-home challenges
(00:45:41) how to negotiate salaries
(00:46:56) how to build confidence in the job search process
(00:50:23) how to study efficiently different subjects
(00:54:26) how to transition to data science
(01:00:05) how to remedy mistakes during the interview
(01:03:37) is data scientist still in demand?
(01:08:43) advices for getting ready for the new career
Using ML to tackle disruptive behaviors in gaming@Activision, data science in the metaverse, cyber security - Carly Taylor - the data scientist show #045
Carly Taylor is a senior manager at Activision, leading a team of machine learning engineers to tackle disruptive behaviors in the game ‘Call of Duty’. Previously, she has held various roles including machine learning engineer, data scientist, product analyst, Analytical Chemist. She has a master degree in computational chemistry from the university of colorado. She’s passionate about video games and cyber security. She shares her insights on machine learning, gaming, and career with 33k Linkedin follower. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Carly's Linkedin: https://www.linkedin.com/in/carly-taylor0017/
Highlights:
(00:00) Intro
(00:01:14) from chemistry major to data scientist in gaming
(00:05:46) how she tackles disruptive behavior using machine learning
(00:11:38) feature engineering and model drift in fraud detection
(00:16:49) the challenge of dealing with the large scale of data
(00:27:10) data science in the Metaverse
(00:36:08) signal processing and anomaly detection
(00:40:31) dealing with the outliers
(00:45:49) gets the buy-ins from the leadership
(00:49:56) from an IC to a manager
(00:53:36) mentorship, mistakes, and other things she learnt from work
(00:58:48) Python or R?
(01:05:30) how she sees herself grow and how she deals with struggles
(01:07:56) the future of data science in gaming
From lawyer to senior data scientist at Amazon, data science in devices, HR, and real estate, how to 're-invent' yourself - Pauline Chow - the data scientist show #044
Pauline Chow is a data scientist and former legal attorney and active transportation advocate. She worked in banking, fashion and education start-ups, and Amazon. Currently, she is the data engineering lead for Thrackle, a blockchain research and modeling company. She has a master degree in computer science, Machine learning, from Georgia Institute of Technology, she also has a law degree JD from the university of wisconsin. She is also a certified yoga teacher and published writer.
We talked about her projects in three different teams in Amazon: devices, HR, and real estate; how her law degree helped her become a better data scientist; how she 're-invented' herself. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
- Her author website www.paulinechowstories.com or connect with her on twitter @itspaulinechow. Pauline's Linkedin: https://www.linkedin.com/in/paulinec/
- A More Beautiful Question: The Power of Inquiry to Spark Breakthrough Ideas -- examples of the purpose of questioning.
- The Four Tendencies by Gretchen Rubin (quiz, book). An interesting framework for considering how different people respond to internal and external expectations and pressures.
- Why only rewarding high-performers can be detrimental to an organization? Wharton People Analytics Conference. Case Studies: Network Analysis. (2015, December 13). https://www.youtube.com/watch?v=0fM6JYC2zfQ
From chemical engineer to data scientist@ExxonMobil, why he left to do data science freelancing, data career jumpstart, Avery Smith - the data scientist show#043
Avery Smith is a data science consultant and career coach at Data Career Jumpstar, and TA at MIT professional education. Previously, he was working on optimization and predictive analytics at ExxonMobil. We talked about his journey from from chemical engineer to data analytics, optimization problems in energy sector, why he left ExxonMobil, and his best advice for people to get into data science. Follow Daliana on Twitter (https://twitter.com/DalianaLiu) for more on data science and this podcast. If you like the show, subscribe and give me a 5-star review :)
Topics:
His first data science projects
His experience with ExxonMobil
Why he left ExxonMobil
Data science consulting
Challenges when working with clients
Why he built his own career coaching program
How Linkedin helped his career
TA at MIT, MIT's data engineering curriculum
how to build a data science portfolio
Avery's Linkedin: https://www.linkedin.com/in/averyjsmith/
Applied machine learning research methods, human-machine team, AI strategies, trends in machine learning, how to earn trust - Vin Vashishta - The data scientist show #042
Vin Vashishta is a chief data officer and AI strategist at V Squared, a company he founded in 2012 that provides AI strategy, transformation, and data organizational build-out services.
He teaches data professionals about strategy, communications, business acumen, and applied machine learning research methods. Vin has 130k+ followers on Linkedin talking about AI, analytics, and strategy. His website: https://www.datascience.vin/ If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Highlights:
(0:00) Intro
(00:03:37) "ML strategy" with 'pricing' as an example
(00:09:45) what is a good metric for ML
(00:13:16) how to translate a business problem into a data problem
(00:23:42) leverage users in the "Human Machine Teaming"
(00:48:22) how he earned the trust
(01:17:31) data science evolution from 2012 to 2022
(01:31:06) how he learns new domain knowledge
(01:36:25) the mistakes he made
(01:42:15) what he learnt from his mentor
Retail store forecasting with video and audio, ML in high frequency trading, from tech to politics, ML in Web3 - Greg Tanaka, the data scientist show #041
Greg Tanaka is a computer scientist turned CEO of an AI company. He started coding when he was 6, studied computer science at UC Berkeley, and has built many machine learning applications, he is the the founder and CEO of Percolata developing ”Forecast as a Service”. He is also the council member of Palo Alto in California, and just finished his campaign for congress. Today we’ll talk about his career journey, forecasting, machine learning in blockchain and political campaigns. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Greg's Linkedin: https://www.linkedin.com/in/gltanaka/, Twitter: https://twitter.com/GregTanaka
Greg's DAO: https://www.gregtanaka.org/dao
Highlights:
(00:02:10) use computer vision, audio, and Wi-Fi fingerprints to forecast the retail store traffic
(00:21:55) why time series forecast is hard
(00:26:39) how he made the forecasting more stable
(00:28:46) how he troubleshot the spikes and drops in data
(00:36:04) human trading vs algorithmic trading
(00:47:36) his vision of machine learning in blockchain
(00:54:57) why he got into politics
(01:05:57) advises for people who are interested in Web3
(01:11:04) AutoML and the future of machine learning
(01:15:36) things he wished he could learn earlier
Weather forecasting with AI, Kaggle tips and tricks, dealing with missing data, deep learning with Jesper Dramsch, The Data Scientist Show #040
Jesper Dramsch is a scientist for machine learning at the European Centre for Medium-Range Weather forecasts. They have a phd in applied Machine Learning to Geoscience from Technical University of Denmark. They are a Kaggle Kernals Expert and TPU star, ranking at top 81/100k worldwide. We talked about weather forecasting, things they learned from Kaggle, how to deal with missing data and ourliers, deep learning, Keras vs Pytorch, XGBoost, their struggles as a phd student, working in the EU vs US. Follow @DalianaLiu for more updates on data science and this show.
(00:01:27) how he got into in ML
(00:09:10) how he handled missing data
(00:28:34) Transformers are eating the world
(00:49:36) Hoover Loss is a fantastic metric to deal with extreme values
(00:54:48) his experience with Kaggle competition
(01:02:59) Kaggle tricks that helped his models perform better
(01:08:18) PyTorch vs Keras
(01:30:30) working in different countries and cultures
Resources shared by Jesper:
The newsletter with missing data:
https://buttondown.email/jesper/archive/towels-have-quite-a-dry-sense-of-humor/
The paper by Gael about missing data:
https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giac013/6568998
The Huber Loss:
https://en.wikipedia.org/wiki/Huber_loss
Skill Scores:
https://en.wikipedia.org/wiki/Forecast_skill
Brier Skill in Weather:
https://www.dwd.de/EN/ourservices/seasonals_forecasts/forecast_reliability.html
CRPS Continuous Ranked Probability Score
ConvNext, Convnets for the 2020s:
https://arxiv.org/abs/2201.03545
Transformers for ensemble forecasts:
https://arxiv.org/abs/2106.13924
Books I recommend:
https://www.amazon.com/shop/jesperdramsch/list/2DYS5KVR5TX0E
Blog posts I wrote about these books:
https://dramsch.net/tags/books/
Short I made about Test-Time Augmentation
https://www.youtube.com/shorts/w4sAh9lKyls
Their links: https://dramsch.net/links
Their open PhD thesis: https://dramsch.net/phd
Newsletter: https://dramsch.net/newsletter
Twitter: https://dramsch.net/twitter
Youtube: https://dramsch.net/youtube
Linkedin: https://dramsch.net/linkedin
Kaggle: https://dramsch.net/