Skip to main content
Open Source Sports

Open Source Sports

By Ron Yurko
Ron Yurko and Kostas Pelechrinis host the 'Open Source Sports' podcast to serve as a public reading group for discussing the latest research in sports analytics. Each episode focuses on a single paper featuring authors as guests, with discussions about the statistical methodology, relevance and future directions of the research.
Listen on
Where to listen
Apple Podcasts Logo

Apple Podcasts

Breaker Logo

Breaker

Google Podcasts Logo

Google Podcasts

Overcast Logo

Overcast

Pocket Casts Logo

Pocket Casts

RadioPublic Logo

RadioPublic

Spotify Logo

Spotify

Grinding the Mocks with Benjamin Robinson
We discuss Grinding the Bayes: A Hierarchical Modeling Approach to Predicting the NFL Draft with Benjamin Robinson (@benj_robinson). This paper was a finalist in the Carnegie Mellon Sports Analytics Conference Reproducible Research Competition in October 2020. You can submit an abstract to enter the 2021 Reproducible Research Competition now! Benjamin Robinson is a data scientist living in Washington, D.C. and the creator of Grinding the Mocks, where since 2018 he has used mock drafts, the wisdom of crowds, and data science to predict the NFL Draft.  He is a 2012 graduate of the University of Pittsburgh with degrees in Economics and Urban Studies and earned a Master of Public Policy degree from the University of Southern California in 2014.  You can follow him on Twitter @benj_robinson and find the Grinding the Mocks project at grindingthemocks.com and @GrindingMocks. For additional references mentioned in the show: Ben's bitbucket repository of data: https://bitbucket.org/benjamin_robinson/grindingthebayes/ Bayesian modeling in R with the brms package: https://paul-buerkner.github.io/brms/ CMSAC Reproducible Competition abstract submission: http://stat.cmu.edu/cmsac/conference/2021/#mu-research Saiem Gilani's (@SaiemGilani) collection of software: https://sportsdataverse.org/
01:05:53
August 24, 2021
Expected Hypothetical Completion Probability with Sameer Deshpande and Katherine Evans
We discuss a previous Big Data Bowl finalist paper `Expected Hypothetical Completion Probability` (https://arxiv.org/abs/1910.12337) with authors Sameer Deshpande (@skdeshpande91) and Kathy Evans (@CausalKathy).  Sameer is a postdoctoral associate at MIT. Prior to that, he completed his Ph.D. at the Wharton School of the University of Pennsylvania. He is broadly interested in Bayesian methods and causal inference. He is a long-suffering but unapologetic fan of America's Team. He's also a fan of the Dallas Mavericks. Kathy is the Director of Strategic Research for the Toronto Raptors. She completed her Ph.D. in Biostatistics at Harvard University. She doesn't have an opinion on Frequentist vs Bayesian or R vs Python, but will get very upset if Rise of Skywalker is your favorite Star Wars movie. For additional references mentioned in the show: Big Data Bowl notebooks: https://www.kaggle.com/c/nfl-big-data-bowl-2021/notebooks BART: https://arxiv.org/abs/0806.3286 XBART: Accelerated Bayesian Additive Regression Trees https://jingyuhe.com/xbart.html Matthew Reyers (@Stats_By_Matt) thesis: https://twitter.com/Stats_By_Matt/status/1296570171687989249?s=20 
01:16:19
January 10, 2021
Bang the can slowly with Ryan Elmore and Gregory J. Matthews
We discuss Bang the Can Slowly: An Investigation into the 2017 Houston Astros with Ryan Elmore (@rtelmore) and Gregory J. Matthews (@StatsInTheWild).  This paper was the winner of the Carnegie Mellon Sports Analytics Conference Reproducible Research Competition in October 2020. Ryan Elmore is an Assistant Professor in the Department of Business Information and Analytics in the Daniels College of Business at the University of Denver (DU). He earned his Ph.D. in statistics at Penn State University and worked as a Senior Scientist at the National Renewable Energy Laboratory prior to DU. He has over 20 peer reviewed publications in outlets such as Journal of the American Statistical Association, Biometrika, The American Statistician, Big Data, Journal of Applied Statistics, Journal of Sports Economics, among others. He is currently an Associate Editor for the Journal of Quantitative Analysis in Sports and recently organized the conference “Rocky Mountain Symposium on Analytics in Sports” hosted at DU. Gregory Matthews completed his Ph.D. In statistics at the University of Connecticut in 2011.  From 2011-2014, he was a post-doc in the School of Public Health at the University of Massachusetts-Amherst.  Since 2014, he has been a professor of statistics at Loyola University Chicago.  He was recently promoted to Associate professor with tenure in March 2020. For additional references mentioned in the show: Tony Adams' (@adams_at) Houston Astros trash can banging data website: http://signstealingscandal.com/ Ryan and Greg's GitHub repository with code and data: https://github.com/gjm112/Astros_sign_stealing The causal effect of a timeout at stopping an opposing run in the NBA by Connor Gibbs (@cgibbs_10), Ryan Elmore, and Bailey Fosdick (@baileyfosdick)
01:16:38
December 12, 2020
How often does the best team win with Michael Lopez
We discuss 'How often does the best team win? A unified approach to understanding randomness in North American sport' with Michael Lopez.  Michael Lopez (@StatsbyLopez) is the Director of Football Data and Analytics at the National Football League and a Lecturer of Statistics and Research Associate at Skidmore College. At the National Football League, his work centers on how to use data to enhance and better understand the game of football.  For additional references mentioned in the show: NESSIS 2017 talk: https://www.youtube.com/watch?v=obb_wpn4IvE CMSAC 2017 talk: https://www.youtube.com/watch?v=owOpU_diCVI 'teamcolors' package by Ben Baumer (@BaumerBen) and Gregory J. Matthews (@StatsInTheWild) Mike's tutorial posts on the paper's modeling framework: https://statsbylopez.netlify.app/post/a-state-space-model-to-evaluate-sports-teams/ Follow Tom Bliss (@DataWithBliss) and check out his presentation at UCSAS20 Dan Cervone's archived 'Win Probability Probabilities' post: http://web.archive.org/web/20200808064442/http://xyresearch.com/posts/win-probability-probabilities with code available here https://github.com/dcervone/winProb Big Data Bowl 2021: https://www.kaggle.com/c/nfl-big-data-bowl-2021
01:03:30
October 31, 2020
Player Chemistry in Soccer with Lotte Bransen
We discuss 'Player Chemistry: Striving for a Perfectly Balanced Soccer Team' with Lotte Bransen. This paper builds on the VAEP framework previously introduced Lotte and her colleagues, in order to quantify player chemistry. Our discussion covers details of the paper along with general challenges of estimating player chemistry in soccer and other sports, as well as the importance of interpretable machine learning. Lotte Bransen (@LotteBransen) is a Lead Data Scientist at SciSports, where she leads the Data Analytics team that develops analytical tools to derive actionable insights from soccer data. An avid soccer player herself, Lotte primarily works on developing machine learning models to measure the impact of soccer players’ in-game actions and decisions on the courses and outcomes of matches. Prior to SciSports, Lotte obtained a Master of Science degree in Econometrics & Management Science from Erasmus University Rotterdam and a Bachelor of Science degree in Mathematics from Utrecht University. References: 'Player Chemistry: Striving for a Perfectly Balanced Soccer Team' - https://arxiv.org/pdf/2003.01712.pdf 'Actions Speak Louder than Goals: Valuing Player Actions in Soccer' - https://arxiv.org/pdf/1802.07127.pdf 'Wide Open Spaces: A statistical technique for measuring space creation in professional soccer' - http://www.sloansportsconference.com/wp-content/uploads/2018/03/1003.pdf Interpretable Machine Learning - https://christophm.github.io/interpretable-ml-book/ San Francisco 49ers recently hired Harvard Biostatistics PhD Matt Ploenzke (@MPloenzke) whose thesis was on 'Interpretable Machine Learning Methods with Applications in Genomics'
35:52
September 13, 2020
Models for hockey player ratings with Andrew Thomas and Sam Ventura
In the third episode of the show we discuss 'Competing process hazard function models for player ratings in ice hockey' with two guests, Andrew Thomas and Sam Ventura.  The discussion ranges from paper details to thoughts on modeling in hockey and sports in general. Andrew Thomas (@acthomasca) is the Director of Data Science for SMT (SportsMEDIA Technology), and former lead hockey researcher for the Minnesota Wild. He received his PhD in Statistics at Harvard University. Sam Ventura is the Director of Hockey Research for the Pittsburgh Penguins, and an affiliated faculty member at Carnegie Mellon's Statistics & Data Science department, where he received his PhD in Statistics.  Along with Andrew, he is the co-creator of war-on-ice.com and nhlscrapr. Additionally, he is the co-creator of nflscrapr with Maksim Horowitz and Ron Yurko, which no longer works... Additional resources mentioned include: Previous work by Brian Macdonald, e.g. https://arxiv.org/abs/1201.0317 Total Hockey Rating by Michael Schuckers and James Curro Asmae Toumi - 'From Grapes and Prunes to Apples and Apples: Using Matched Methods to Estimate Optimal Zone Entry Decision-Making in the National Hockey League' And check out recent work by Micah Blake McCurdy and Evolving Wild
01:14:28
August 10, 2020
Rao-Blackwellizing FG% with Daniel Daly-Grafstein
In the second episode we discuss two papers by our guest Daniel Daly-Grafstein and Luke Bornn: Rao-Blackwellizing field goal percentage (published in JQAS and available at: http://www.lukebornn.com/papers/dalygrafstein_jqas_2019.pdf) and Using In-Game Shot Trajectories to Better Understand Defensive Impact in the NBA (available at: https://arxiv.org/pdf/1905.00822.pdf). Daniel is currently a soccer data analyst at Sportlogiq, an sports AI company that, in soccer, focuses on generating tracking data using computer vision.  The papers discussed in this episode were part of Daniel’s Master's degree in statistics at Simon Fraser University. In the fall Daniel is going to be starting his PhD in Statistics at the University of British Columbia. Additional resources mentioned in the show: Daniel's GitHub repository: https://github.com/danieldalygrafstein/nba-raoblackwellizing-field-goal Sloan conference papers by: (1) Rachel Marty and Simon Lucey: A data-driven method for understanding and increasing 3-point shooting percentage (http://www.sloansportsconference.com/wp-content/uploads/2017/02/1505.pdf) and (2) Rachel Marty: High-resolution shot capture reveals systematic biases and an improved method for shooter evaluation (http://www.sloansportsconference.com/wp-content/uploads/2018/02/1005.pdf) Also you should read the wikipedia page on the Rao-Blackwell theorem: https://en.wikipedia.org/wiki/Rao%E2%80%93Blackwell_theorem
43:53
June 15, 2020
openWAR with Gregory J. Matthews
In the first official Open Source Sports podcast episode, we discuss openWAR (available on arXiv https://arxiv.org/abs/1312.7158 and JQAS ) with author Gregory J. Matthews (@StatsInTheWild), Associate Professor of Statistics at Loyola University Chicago. Additional resources mentioned in the show: openWAR code repository: https://github.com/beanumber/openWAR Baseball Prospectus example articles (*deserved runs created, NOT defensive runs created as incorrectly stated in the show):  https://www.baseballprospectus.com/news/article/48293/entirely-beyond-wowy-a-breakdown-of-drc/ and https://www.baseballprospectus.com/news/article/41748/prospectus-feature-the-expected-contribution/ Bill Petti’s baseballr package http://billpetti.github.io/baseballr/
01:39:19
May 10, 2020
Open Source Sports preview!
Ron Yurko and Kostas Pelechrinis host the 'Open Source Sports' podcast to serve as a public reading group for discussing the latest research from sports analytics and statistics in sports. This teaser episode introduces the hosts and discusses the podcast format. Each episode will focus on a single paper featuring authors as guests, with discussions about the statistical methodology, relevance and future directions of the research. Follow along on Twitter: Open Source Podcast: @OpenSrcSports Ron: @Stat_Ron Kostas: @kpelechrinis
07:49
May 6, 2020