Skip to main content
Your Data Teacher Podcast

Your Data Teacher Podcast

By Your Data Teacher
A podcast about data science, machine learning, artificial intelligence, statistics and everything related to data.

Home Page:
Listen on
Where to listen
Breaker Logo


Google Podcasts Logo

Google Podcasts

Pocket Casts Logo

Pocket Casts

RadioPublic Logo


Spotify Logo


Episode 5 - Tuning the threshold in binary classification tasks
In this episode, I'll talk about tuning the threshold in binary classification tasks. The usual value for the threshold is 0.5, but it's useful to optimize it in order to make the model fit our needs. I talk about optimizing according to the ROC curve and maximizing the balanced accuracy.   Link to the article:
June 14, 2021
Episode 4 - Ensemble models. Bagging and boosting
In this episode, I'm going to talk about ensemble models, particularly bagging and boosting. Bagging is very useful for reducing variance, boosting is used for reducing bias. The most common bagging algorithm is Random Forest, the most common boosting algorithm is Gradient Boosting, whose most common implementations are XGBoost, LightGBM and CatBoost. Home Page:
June 10, 2021
Episode 3 - Precision, recall, accuracy. How to choose?
In this episode, I talk about accuracy, precision and recall. We're going to focus on what they are and when to use them in machine learning projects. Link to the article:
June 8, 2021
Episode 2 - How to explain neural networks using SHAP
Today we're going to talk about how we can explain neural networks. Neural networks are like black boxes that hide the way they model and represent data. That's why explaining them is very difficult. A very powerful approach is called SHAP. Using this method, we can calculate the impact of a feature according to a given model independently of the type of model we're using. It's very useful for black boxes like neural networks. Home page: Link to the article:
June 4, 2021
Episode 1 - How accurate is your accuracy?
Today we're going to talk about the standard error on proportions. In data science, it's very important to calculate the standard error on every estimate we calculate in order to see if finite-size effects are lowering the precision too much and in order to compare two different measurement results with each other. Home page: Link to the article:
May 31, 2021