The Data Life Podcast

By Sanket Gupta

This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field.

Note: Opinions expressed are my own and do not express the views or opinions of my employer.

Listen on SpotifySend voice message

Available on

Report content on Spotify

22: Transfer Learning for NLP - With Paul Azunre

The Data Life PodcastApr 13, 2020

00:00

46:47

27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

We talk with Michel Tricot, who is the Founder and CEO of Airbyte, which is an open source data integration Y Combinator startup. It has raised over $30M in capital and has been growing quite fast. It was a great conversation and I think you will also enjoy it. 🎉

We cover lots of things in the podcast including:

1. Technical aspects of what Airbyte does, how it sits in the ETL/ ELT landscape, how it differs from other tools such as Fivetran, Stich etc.

2. Data Warehouses being a canonical source of data and how Airbyte helps with bringing the data into the warehouse.

3. How Airbyte works as an open source data tool.

4. Life aspects of running a fast growing start-up including raising capital, hiring etc.

Links to the tools/ services mentioned:

1. Airbyte: airbyte.io

2. Airbyte Slack where you can talk with the team: slack.airbyte.io

3. Dbt for transformation in ELT: getdbt.com

4. Airflow which is a data orchestration tool: https://airflow.apache.org/

5. Astronomer which can host Airflow: https://astronomer.io/

Pay as you use data warehouses:

6. Snowflake Data Warehouse: https://www.snowflake.com/

7. BigQuery Data Warehouse: https://cloud.google.com/bigquery

Set up your own infrastructure:

8. Redshift Data Warehouse: https://aws.amazon.com/redshift/

Oct 11, 202144:56

26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)

Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you’d collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines.

Using an example of running a necklace business from shells - we learn about the following data engineering concepts:

1. ETL - Extract Transform Load vs ELT - Extract Load Transform concepts. Why Data Warehouses are great for analytics.

2. Spark for large data processing and hosting / running

3. Data orchestration using Airflow

My blog on Towards Data Science about moving from Pandas to Spark: https://towardsdatascience.com/moving-from-pandas-to-spark-7b0b7d956adb

Great book to learn about Spark: https://www.amazon.com/dp/1492050040/?tag=omnilence-20

Tools covered in the episode:

dbt: https://www.getdbt.com/

Databricks: https://databricks.com/

EMR: https://aws.amazon.com/emr/

AWS Redshift: https://aws.amazon.com/redshift/

Snowflake: https://www.snowflake.com/

Delta Lake: https://databricks.com/product/delta-lake-on-databricks

Aug 18, 202139:30

25: Talking Data Privacy with Jeff Bermant

In this episode, I'm excited to be talking with Jeff Bermant, who is the founder and CEO of Cocoon Mydata Rewards browser. It is a browser based off Chrome and it pays people to use it! ✨

In this episode we talk about data ethics and privacy, and how Jeff believes that users should be paid for their data. We talk about GDPR and similar laws in US, future of data privacy and more!

Go to https://getcocoon.com to download and use Cocoon Rewards Browser.

~Thanks for listening~

Aug 04, 202128:11

24: Promoting Women in Tech - With Rupal Gupta

In this episode, we are talking about women in tech with Rupal Gupta. Rupal, a recent graduate from Online MS in CS from Georgia Tech, is a data engineer in the industry and is passionate to help promote women in tech. She also has some great tips and resources for anyone trying to break into data science and tech!

In this episode we talk about things that can help promote women in tech, women in tech conferences such as Grace Hopper, looking for jobs, resources to prepare for the interviews etc.

If you want to reach out to Rupal for any help or to collaborate with her project womenmentors.co, here is her LinkedIn: https://www.linkedin.com/in/rupalgupta15/

FREE Women in Tech Conference by Manning Publications on Oct 13th at 12pm ET on Twitch: https://freecontent.manning.com/livemanning-conferences-women-in-tech/ 🎉 There will be women in tech speakers from Dropbox, Microsoft, Warby Parker and more.

🌟 Programs and conferences covered in the episode:
OMSCS program at Georgia Tech: https://omscs.gatech.edu/
Grace Hopper conference: https://ghc.anitab.org/
Anita Borg Institute: https://anitab.org/

🌟 Interviewing resources:
1. Pramp: https://www.pramp.com/#/
2. Interviewing.io: https://interviewing.io/
3. Educative "Grokking the System Design Interview": https://www.educative.io/courses/grokking-the-system-design-interview
4. AWS Certifications: https://aws.amazon.com/certification/

Disclaimer: All opinions on this podcast are our own and not the views of our employers or organizations.

~Thanks for listening~

Oct 08, 202015:06

23: Let’s Talk AWS SageMaker for ML Model Deployment

In this episode, we talk about Amazon SageMaker and how it can help with ML model development including model building, training and deployment. We cover 3 advantages in each of these 3 areas.
We cover points such as:
1. Host ML endpoints for deploying models to thousands or millions of users.
2. Saving costs for model training using SageMaker.
3. Use CloudWatch logs with SageMaker endpoints to debug ML models.
4. Use preconfigured environments or models provided by AWS.
5. Automatically save model artifacts in AWS S3 as you train in SageMaker.
6. Use of version control for SageMaker notebooks with Github.
and more…
Please rate, subscribe and share this episode with anyone who might find SageMaker useful in their work. I feel that SageMaker is a great tool and want to share about it with data scientists.
For comments/feedback/questions or if you think I have missed something in the episode, please reach out to me at LinkedIn: www.linkedin.com/in/sanketgupta107/

Jun 17, 202019:46

22: Transfer Learning for NLP - With Paul Azunre

In this episode, we are talking with Paul Azunre. Paul is one of the world’s experts in the area of Transfer Learning for NLP and is also an author of the upcoming book Transfer Learning for NLP published by Manning Publications. In this episode we talk about things such as:

1) Paul’s background and how his background in maths and optimization as well as fake news detection got him started in transfer learning in NLP.
2) How Paul got started with the book, book writing process as well as tips to the listeners for writing a technical book.
3) High level summary of transfer learning in both computer vision and NLP and why this is the ImageNet moment of NLP.
4) Why ML and NLP practitioners today should be excited about transfer learning (such as how students in Ghana are able to build their own Google Translate using transfer learning)
5) How BERT, ELMo and ALBERT work at the high level and how they differ from traditional techniques like Word2Vec or FastText.
6) Differences between BERT, ELMo and ALBERT.
7) What makes Paul’s new book a must-read for anyone interested in this field.

✨Paul's Info👇

Paul’s Website: azunre.com (with all social media handles)
Please reach out to Paul if you have any questions about transfer learning in NLP or the book.

✨Chance for one of 2 free copies of Transfer Learning for NLP 🎉

Get a chance to win the free copy of Paul's book! Please share this episode on Twitter and add my Twitter handle "sanket107" to it, you will get a chance to win one of 2 free books. My Twitter: https://twitter.com/sanket107

✨Discount Code for all Manning Publications books! 🎊🤩

Special Link to get extra discount for Paul’s book:
https://www.manning.com/books/transfer-learning-for-natural-language-processing?a_aid=Omnilence&a_bid=d53fed17
As The Data Life Podcast listeners, you can also go to this link http://www.manning.com/?a_aid=Omnilence to get any Manning book with 40% discount with the code: poddlife20

This will help support this show as well and is much appreciated.

Thank you Manning Publications and Paul as well as sponsors to make this show a reality.

~Thanks for listening~

Apr 13, 202046:47

21: Why Scikit-Learn and Keras are Awesome for ML

In this episode, we talk about why the two libraries Scikit-Learn and Keras are great for machine learning. These two libraries combined with Pandas form the 3 core libraries in Python for a data scientist today.

We cover things like:

1) Data Exploration and data cleaning - how Pandas and Jupyter notebooks provide a good way to get started here.
2) Data Transformation - how Scikit-Learn provides many useful functions like train_test_split, Scalers, PCA etc.
3) Data Fitting - how Scikit-Learn provides good shallow models and Keras provides great support to quickly get started with neural networks.

We also cover various tidbits on things to take note in building ML pipelines and preparing models to be deployed in production, so tune into the episode to find out!

Fantastic Resources:
1) Book by head of Youtube DS team Aurelien Geron: https://www.amazon.com/dp/1492032646/?tag=omnilence-20
This is one of the best book I have read on this topic as it covers practical tips incl. Scikit-Learn API etc.
2) Developing Scikit-Learn estimators: https://scikit-learn.org/stable/developers/develop.html
3) Guide to Keras Sequential API: https://keras.io/getting-started/sequential-model-guide/
4) Guide to Keras Functional API: https://keras.io/getting-started/functional-api-guide/
5) My previous episode on Pandas: https://podcasts.apple.com/us/podcast/17-why-pandas-is-the-new-excel/id1453716761?i=1000454831790

Thanks for listening! Please consider supporting this podcast from the link in the end.

Jan 26, 202019:55

20: Yogi's Guide to Analytics - An Interview with Akshay Kanade

In this episode, we talk with Akshay Kanade. He is a business analyst working in New York City who likes taking a big view of data, and has very interesting spiritual views on data analytics and life in general, he is also a handwriting expert- he can read people’s handwriting and can recognize a lot about their personalities.

In this interview we will cover several things such as:
- How has been an analyst influenced Akshay's life?
- Introspection about data and analytics
- Taking high level view of data - connecting deep learning with deep thinking
- People who don’t have background in analytics- how they can use their unique backgrounds for decisions
- Power of consciousness and spirituality at work
- Hand-writing analysis and whether it is a science or an art

It was a fascinating conversation, and I took a lot away talking with Akshay's view points. This interview is a must-listen if you deal with data and analytics in your work.
Akshay's hand-writing analysis and mentorship website: www.pradnyatantra.com (will be live soon)
Reach Akshay on LinkedIn at https://www.linkedin.com/in/akshaykanade06/

Some of Akshay's favorite books:
1. Autobiography of a Yogi https://www.amazon.com/dp/8120725247/?tag=omnilence-20
2. The Monk Who Sold His Ferrari https://www.amazon.com/dp/0062515675/?tag=omnilence-20
3. Mastery https://www.amazon.com/dp/B00A6G9CGG/?tag=omnilence-20

To add to this list, one of my favorite books is:
The Power of Now https://www.amazon.com/dp/B00A6G9CGG/?tag=omnilence-20

If you have any feedback drop me a note at thedatalifepodcast@gmail.com or reach me on LinkedIn at https://www.linkedin.com/in/sanketgupta107/

~ Thanks for listening~

Dec 01, 201935:44

19: Statistics and Data Science- An Interview with Patrick McClory

In this podcast episode, we do an interview! We talk with Patrick McClory, who is the founder and CEO of IntrospectData. He is an expert working in areas of data science consulting, large machine learning projects, math, statistics and more.

In this episode we cover several interesting topics such as:
1) What makes a good data scientist?
2) The different roles in the industry such as data engineer, machine learning engineer, data analyst etc.
3) The first mile problem: Data ownership and ethics of data collection.

Patrick can be reached at patrick@introspectdata.com and you can read more about IntrospectData's projects at introspectdata.com/
Some books discussed in the episode:
1. The Field Guide to Understanding Human Error
2. Information Theory: A Tutorial Introduction
If you enjoyed this episode or have any feedback drop me a note at thedatalifepodcast@gmail.com ~ Thanks for listening ~

Nov 22, 201956:21

18: 5 Things to Consider for Master of Science (MS) in US

What should you consider for pursuing MS in US? There might be several questions in your mind as you explore this question. In this episode we cover some of the main things to consider before you make the decision. I also go into details about things which I wish I knew before coming to US for MS.

The things I cover in the podcast are to consider for MS in US are:

1) Location matter more than rankings.
2) Talk to professors before applying.
3) Culture of hard work, and advantage of having prior work experience.
4) Cost is High and low cost alternates.
5) Visa situation is uncertain.

Hope you enjoy this episode, this was an episode that I wish I listened to before flying to US.
Reach out with your questions/feedback at thedatalifepodcast@gmail.com

Resources:
Although I did not cover GRE or TOEFL topics in detail, I am linking to some great resources for their preparation.
1) Essential Words for the GRE https://www.amazon.com/dp/1438007493/?tag=omnilence-20
2) GRE Prep Guide by Kaplan https://www.amazon.com/dp/150624890X/?tag=omnilence-20
3) GRE Guide by Barrons https://www.amazon.com/dp/1438009151/?tag=omnilence-20
4) TOEFL Guide by Barrons https://www.amazon.com/dp/1438076258/?tag=omnilence-20
5) Blog version of this podcast episode https://medium.com/the-data-life/ms-in-us-for-data-science-57079509ded9

Thanks for listening. Please support us via the link in the end for Anchor Payments. It would allow us to build more of this content!

Nov 15, 201918:60

17: Why Pandas is the new Excel

The Data Life Podcast is a podcast where we talk all-about real life experiences with data and data science science tools, techniques, models and personalities.

In this episode, we will talk about how Pandas is becoming a tool of choice for many data scientists for doing their data analysis work. We will explore how Pandas wins over Excel in several key areas that are important for businesses today:

1) Large dataset sizes
2) Different kinds of input formats such as JSON, CSV, HTML, SQL etc
3) Complex business logic
4) Linking data analysis work to websites and databases
5) Cost

Pandas has lots of helpful functions such as read_csv, read_json, read_sql that allow easy input of data into dataframes. DataFrames have several useful methods like "describe", "value_counts", "groupby", "loc" and more that allow easy understanding of your dataset. It also supports plotting out of the box with "plot" method.

We also cover how Pandas differs from SQL in things like ease of handling time series data, visualizations and more.
Tune in to the episode to learn more about how Pandas might be the tool for your data analysis needs to take your business to next level!

Fantastic Resources:
1) Book by Pandas creator Wes McKinney: https://www.amazon.com/dp/1491957662/?tag=omnilence-20
2) Great workshop video by Kevin Markham in PyCon: https://www.youtube.com/watch?v=0hsKLYfyQZc
3) Input output methods for Pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
4) Comparison of some operations of Pandas with SQL https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html

Thanks for listening! Please consider supporting this podcast from the link in the end.

Oct 25, 201916:37

16: Getting Started with Natural Language Processing

So many tweets and news articles and unstructured text surrounds us. How do we make sense of all of these? Natural language processing or NLP can help. NLP refers to algorithms that process, understand and generate aspects of natural language either in text or in spoken voice. In this episode we will cover some of the common techniques in NLP to help get started in this exciting field!

We cover several tasks in a NLP pipeline:
1. Tokenization and punctuation removal
2. Stemming and Lemmatization
3. One hot vectors
4. Word embeddings including Word2Vec and Glove
5. Recurrent Neural Networks and LSTMs
6. tf and tf-idf approaches - when to use word embeddings, when to use tf / tf-idf approaches?
7. Generating text using encoder-decoder or sequence to sequence models

Some resources:
1. Sequence Models - course by Andrew Ng on Coursera - one of the best courses I have seen on this topic! https://www.coursera.org/learn/nlp-sequence-models
2. Awesome collection of resources for NLP for Python, C++, Scala etc. and popular resource: https://github.com/keon/awesome-nlp
3. Overview of Text Similarity Metrics (a blog written by me on Medium): https://towardsdatascience.com/overview-of-text-similarity-metrics-3397c4601f50
4. How to train custom word embeddings on a GPU https://towardsdatascience.com/how-to-train-custom-word-embeddings-using-gpu-on-aws-f62727a1e3f6

Thanks for listening, please support this podcast by following the link in the end.

Oct 05, 201919:31

15: Using Flask, REST API and Vue.js to build a Single Page Web Application

As a data scientist, you will work on machine learning models that are deployed on websites - usually wrapped around a REST API, these days they also call this approach a “micro-service”. It is for this reason it is important to know how backends and front ends work and how to build them. In this episode, we talk about building a note app which is a Single Page Application or SPA using Pythons flask library for backend and Vue.js for frontend. We use REST API to communicate between them.

We cover following topics in Q and A format:

1. Why should data scientists care about building frontend and backend and rest api?

2. What is a single page application?

3. Why Vue.js?

4. Why do we need server side code?

5. What is REST API?

6. How does Flask help with building rest api?

Then we go into the exact mechanics of building the SPA:

Step 1: Database setup

Step 2: Write REST API in flask

Step 3: Postman setup and testing of the API

Step 4: Build frontend and write forms to get information

Step 5: Build routing and login pages

Step 6: Front end design and UI/UX

Finally you can deploy both the server and client separately on AWS or Heroku so that other users can see it and use it.

Dependencies:

1) Flask to build server side REST APIs

2) Sqlalchemy which is ORM to access database

3) Bcrypt for hashing user passwords to store in your database

4) Vue for building frontend

5) Bootstrap-Vue for using bootstrap with Vue.js

6) Axios to communicate via AJAX between client and server

7) Vue CLI 3 to manage the tooling of the client

Really awesome resources:

1) Learn Vue.JS from scratch by the awesome teacher Net Ninja - YouTube https://www.youtube.com/watch?v=5LYrN_cAJoA&list=PL4cUxeGkcC9gQcYgjhBoeQH7wiAyZNrYa&index=1

2) Building book recording app using Vue and Flask https://testdriven.io/blog/developing-a-single-page-app-with-flask-and-vuejs/#bootstrap-vue

3) Managing state in Vue.js including Vuex and simple global store: https://medium.com/fullstackio/managing-state-in-vue-js-23a0352b1c87

4) Authenticating a Flask API Using JSON Web Tokens - YouTube https://www.youtube.com/watch?v=J5bIPtEbS0Q

5) Really nice tutorial for using databases with Flask by Corey Schafer - YouTube https://www.youtube.com/watch?v=cYWiDiIUxQc&list=PL-osiE80TeTs4UjLw5MM6OjgkjFeUxCYH&index=4

If this has been of value please consider supporting me by buying me a coffee at the Anchor link at the end. If you support, I will provide extra bonus content for you. Thanks for listening!

Sep 16, 201920:39

14: Building a Character-Based Text Classifier

Ever wonder how to automatically detect language from a script? How does Google do it?

Ever wonder how Amazon knows whether you are searching for a product or a SKU on its search bar?

We look into character-based text classifiers in this episode. We cover 2 types of models. First is the bag-of-words models such as Naive Bayes, logistic regression and vanilla neural network. Second we cover sequence models such as LSTMs and how to prepare your characters for the LSTMs including things like one-hot encoding, padding, creating character embeddings and then feeding these into LSTMs. We also cover how to set up and compile these sequence models.

Thanks for listening, and if you find this content useful, please leave a review and consider supporting this podcast from the link below.

Aug 07, 201923:20

13: Statistics of A/B Testing

You and your team might spend a lot of time building a new feature. But how do you know if this feature will be liked by the users? One of the ways to statistically prove this is by using A/B testing. Listen to this episode to get tips, tricks and intuition behind hypothesis testing, alpha, beta, p-values, two-sample t-tests and more.

These understandings have been learnt from experiences deploying A/B tests in the field, and talking to experts.

These ideas are typically not covered in traditional A/B testing texts which tend to focus a lot on math without the intuition, and that's why I really wanted to cover it in this podcast episode. Thanks for listening! I'd really appreciate your support for this podcast. Follow the link below.

Jul 17, 201921:23

12: Your Users Don't Care How Smart You Are

In this episode, we will talk about the importance of business impact in data science. "Your users don't care how smart you are" was a quote I read that got me started in thinking about this.

The right way to do data science is to think of users, revenue impact, business value and go for the simplest solution possible. The wrong way to do data science is to just find a nail to hit the hammer with rather than the other way around.

We will cover about all this and more!

Amazon link of Inspired by Marty Cagan (a great read to get better at product thinking): https://www.amazon.com/dp/1119387507/?tag=omnilence-20

Please consider buying me a coffee if you find this content useful. Refer to link at the bottom.

Jun 25, 201905:05

11: The Ten Essential Machine Learning Questions

This episode covers the ten essential machine learning questions. Disclaimer: Baseline answers have been provided in the episode for guidance. For complete accuracy, please refer to textbooks or to courses by Andrew Ng on Coursera.

If this content is useful, please consider buying me a coffee via the link https://anchor.fm/the-data-life-podcast/support

Resources:
1. Machine Learning Course by Andrew Ng: https://www.coursera.org/learn/machine-learning
2. Deep Learning Course by Andrew Ng: https://www.coursera.org/specializations/deep-learning

Questions:
1. What is underfitting and overfitting? How to avoid it?
2. What is the difference between batch, SGD and mini-batch gradient descents? When will you use each?
3. How to choose a machine learning model?
4. How to improve the latency of a machine learning model in production?
5. If your training and cross validation accuracies are high, but testing accuracy is less - how would you debug this?
6. Name 3 hyper-parameters. Why can’t we train them as hyper-parameters, why should only humans set them?
7. Which metric should be used to evaluate a classifier? How do you connect it to business value?
8. What prevents someone to select deep learning model for everything?
9. Say you have to classify a lot of data, but you don’t have labelled training examples. How would you begin to solve the problem? How many training data points are needed?
10. Say you have a perfectly working machine learning model. How do you deploy this in production? How do you check if users will actually like it?

Please leave a review on Apple Podcasts or wherever you listen to this.
Thanks for listening!

Jun 21, 201919:09

Mining Twitter Data for Sentiment Analysis of Events

Twitter is a rich source of live information. Is it possible to run sentiment analysis on what the world is thinking as an event unfolds over time? Could we track Twitter data and see if it correlates to news that affects stock market movements? These are some of the questions that we will answer in this podcast episode.

There are 6 steps for mining Twitter data for sentiment analysis of events that we will cover:

1) Get Twitter API Credentials
2) Setup API Credentials in Python
3) Get Tweet Data via Streaming API using Tweepy
4) Use out-of-the-box sentiment analysis libraries to get sentiment information
5) Plot sentiment information to see trends for events
6) Set this up on AWS or Google Cloud Platform

This episode covers information about saving the tweets in a database, and using them to plot sentiment information.

Corresponding Blog Post With Code: https://towardsdatascience.com/mining-live-twitter-data-for-sentiment-analysis-of-events-d69aa2d136a1?source=friends_link&sk=e06ae49f4ce6fb52157ea0eaee72f4c4
Tweepy: https://github.com/tweepy/tweepy
TextBlob: https://textblob.readthedocs.io/en/dev/
Vader Sentiment: https://github.com/cjhutto/vaderSentiment
Set up AWS instance: https://aws.amazon.com/ec2/getting-started/
Set up GCP instance: https://cloud.google.com/compute/docs/quickstart-linux

My Twitter Profile: https://twitter.com/sanket107
Thanks for listening!

Jun 01, 201918:44

Don't Be Shy To Pursue Your Interest

In this episode, we will talk about things like Maslow's Hierarchy of Needs, and focussing on higher level needs such as satisfaction and achieving full potential. In the area of tech, data science and software development, admitting your interest could involve "shyness" as the next shiny cool thing is pursued by everyone. But if your interest is in a niche, don't let others stop you from putting in an effort to become great at it.

Thanks for listening, and please show your support to keep this podcast going!

May 19, 201905:00

Review of Udacity Nanodegrees - are they worth it?

Udacity has become a popular platform for learning about various things in data science, machine learning and programming in general. In this episode, we will discuss the good, bad and ugly of the Udacity nanodegrees. I will also cover my experiences with Deep Learning and NLP Nanodegrees.

We will cover things like how Udacity has great production quality and has nice intro courses, but due to their lack of depth and low community engagement, the high costs might not be justified (most of their nanodegrees are around $1,000 currently) But if cost is not a concern, then Udacity could be a good way to get into a new area. If you prefer a structured approach with timelines, they could be good too but if you don't mind doing your own research, reading of blogs and watching free videos online, then again Udacity nanodegrees may not be worth the cost.

Resources:

1) Deep Learning Nanodegree: https://www.udacity.com/course/deep-learning-nanodegree--nd101

2) NLP Nanodegree: https://www.udacity.com/course/natural-language-processing-nanodegree--nd892

3) DeepLearning.AI by Andrew Ng: https://www.coursera.org/deeplearning-ai

Please support the podcast by rating it in Apple Podcasts, and also leaving a review :) Thanks for listening!

May 03, 201913:03

6 Steps to Transition to Data Science from non-CS background

In this episode we will talk all about the various steps to transition to data science from non computer science backgrounds.
One of the main difficulties people face from non-CS backgrounds is how overwhelming it can be to transition to data science field, I talk about my own journey, and share the 6 steps which can help you in your own data science career!

00:00 to 02:10: Introduction

02:11 to 06:00: My Background of moving to data science from electrical engineering

06:01 to 10:56: Steps 1 to 3 covering things like using external APIs, already processed datasets and performing full stack data science work

10:57 to 11:55: Break sponsored by Anchor

11:56: End: Steps 4 to 6 covering things like math and statistics, machine learning pipelines and data structures & algorithms

Some useful links:

1) Andrew Ng Deep Learning Specialization Coursera https://www.coursera.org/specializations/deep-learning

2) Intro to Statistics by Sebastien Thrun https://www.udacity.com/course/intro-to-statistics--st101

3) Aurelion Geron's book on machine learning https://www.amazon.com/dp/1491962291/?tag=omnilence-20

4) Pramp for mock algorithm sessions on video https://www.pramp.com/

5) Leetcode for algorithm question datasets https://leetcode.com/

Some great datasets to get started in machine learning:

6) MNIST for hand written digits https://www.kaggle.com/c/digit-recognizer

7) Iris dataset for flower classification http://archive.ics.uci.edu/ml/datasets/iris

8) IMDB movie reviews https://ai.stanford.edu/~amaas/data/sentiment/

Thanks for listening!

Apr 20, 201915:59

The Top 5 Data Science Podcasts

Welcome! In this episode, we will cover some of the top data science podcasts, that have helped me a lot in my own journey, and hopefully will be helpful to you as well.
The top 5 podcasts are (linked to my favorite episodes):
1) AI in Industry with Daniel Faggella
2) This week in Machine Learning and AI (TWiML)
3) DataFramed
4) Data Skeptic
5) Talk Python to Me

Listen to the episode for the sixth bonus podcast!
If you think I should mention another podcast here, let me know and I will add it in the show notes!
Thanks for listening!

Apr 10, 201908:21

What I learnt building a data science course

Have you ever thought about building a video course? Have you wanted to share your expertise with other people via a video course on different platforms like Udemy? Have you wondered what are the economics and revenue details of building a course? This podcast episode is for you!

In this episode, I talk about my experience in building my first data science video course, lessons learnt and how you can use these in your own video course.

00:00 to 09:30- I talk about my experience with Packt Publishing in developing the video course.
09:30 onwards- I talk about the 3 lessons learnt and how you can leverage these to fully maximize the potential of your video course.

Links:
1) My First Video course: www.packtpub.com/big-data-and-business-intelligence/hands-fundamentals-data-science-go-video
2) Link to my previous podcast on recommendation engines
3) Github link to the starter code of recommendation engines on movie reviews: github.com/sanketg10/the-data-life-podcast
4) Link to my new course on "Overview of Query Understanding Techniques": sanketgupta.teachable.com/p/query-understanding-techniques
5) Google Ads Keyword Planner: ads.google.com/home/tools/keyword-planner/
#video-course #course #teachable #udemy #packt #data-science

Mar 30, 201918:17

Overview of Netflix and Spotify like recommendation engines

In this episode, we cover the two main types of recommendation engines used at companies like Netflix and Spotify.

1) Content based recommendation systems use the genres or tags of each product to find other similar products to recommend to users.
2) Collaborative filtering based recommendation systems use user activity and user ratings on the website to recommend products.

We go through the pros and cons of each, the challenges, how do companies like Netflix and Spotify scale their recommendation engines for millions of users and more!

My code in the Github repo which implements these concepts from scratch using MovieLens dataset.

Links:
1) Youtube talk by Xavier Amatriain from Netflix
2) Youtube talk on "Machine Learning & Big Data for Music Discovery presented by Spotify"
3) Youtube tutorial by Luis Serrano on how Netflix recommends movies

#netflix #spotify #movielens #recommendations #recommendation-engines

Mar 22, 201913:28

3 Mistakes to Avoid in a Machine Learning Project

You and your team might spend weeks or even months building a model. These are the 3 mistakes to avoid in your next machine learning project! This can save you a lot of time and effort in your next project.

These tips have been learnt from experiences deploying ML models in production as well as hearing from experts in the field.

These tips and mistakes are typically not covered in traditional machine learning texts and courses, and that's why I really wanted to cover it in this podcast episode. I'd really appreciate your support for this podcast. Please visit the podcast webpage and support, so that I can continue to develop podcast episodes. Thanks for listening!

Mar 15, 201910:18

Flask is a Great Tool for Full Stack Data Science

In this episode, we will talk all about what makes Flask such a great tool for both beginner and experienced data scientists to know. It was one of the first tools I learnt in my data science journey, and it has been so useful along the way.

Flask is a micro-framework in Python which allows to build websites in a simple way. Flask will make you as a data scientist work better with the front end engineers. Also, it is a great way to build something like say recommender systems where, users can input a product they have liked, and you have a machine learning model in Python that reads this and recommends another product.

Resources:
1) Miguel Grinberg's Flask Mega Tutorial
2) Vue.JS Tutorials by Net Ninja

Thanks for listening to The Data Life Podcast!

Mar 05, 201910:44

Hello, World!

To kick things off, I talk about the kind of topics you can expect to hear in this podcast. Welcome to The Data Life!

Feb 19, 201901:59