RAPIDSFire
By RAPIDS
Join the conversation and send feedback on Twitter @rapidsai
RAPIDSFireFeb 09, 2021
RAPIDSFire Sports Spectacular 1 - Sam Moss and Cameron Weinert of Every Day is Saturday
I talk with Sam Moss and Cameron Weinert about using data science to predict college football. We talk about feature engineering, following your passing, the role of analytics in sports and sports fandom, how to be an intelligent consumer of data science as a non-data scientist, and a lot more.
Everyday is Saturday on Spotify.
Everyday is Saturday on Apple.
Raw data on college football here at collegefootballdata.com
Marlene Mhangami on Python, Pivots, and Personal Growth (and RAPIDS on Windows)
We talk with Marlene Mhangami, a director and chair of the Python Software Foundation, co-founder of coding education non-profit ZimboPy, someone that took a huge career pivot from pre-med to software engineering, and one of the folks that helped bring RAPIDS to Windows. We talk about changing careers, creativity and confidence in tech, and of course RAPIDS on Windows.
Marlene's home page
https://marlenemhangami.com/
Marlene's blog post about RAPIDS on Windows
https://medium.com/rapids-ai/running-rapids-on-microsoft-windows-10-using-wsl-2-the-windows-subsystem-for-linux-c5cbb2c56e04
Tutorial on using RAPIDS on Windows via WSL2
https://www.youtube.com/watch?v=jnEd3IDsF-I
ZimboPy on github
https://github.com/ZimboPy
Even Oldridge on Tabular Deep Learning and the Future of Recommender Systems
This week we’re joined by Even Oldridge, Senior Manager, RecSys Platform Team at NVIDIA. We talk about Tabular Deep Learning, NVMerlin, how bookstores aren’t like recommender systems, his team’s recent repeat win in the ACM Recsys Challenge, the future of recommender systems and more.
NVIDIA Merlin on the NVIDIA Developer Blog
https://developer.nvidia.com/blog/tag/merlin/
NVIDIA Merlin blogs on Medium
https://medium.com/nvidia-merlin
Merlin on Github
https://github.com/NVIDIA-Merlin/Merlin
NVTabular Blogs
https://developer.nvidia.com/blog/tag/nvtabular/
NVTabular on Github
https://github.com/NVIDIA/NVTabular
REES46 data set mentioned toward the end of the podcast
Way of the Grandmaster 2 with Christof Henkel
We talk with 4-time Kaggle winner Christof Henkel about how he got started in Kaggle, important skills for Kaggle success, his most memorable contests, his most recent victory, how an alien radio signal is like a bird call, climbing at the 2021 Olympics, and much more!
Christof's Kaggle Profile: https://www.kaggle.com/christofhenkel
Christof's Twitter: https://twitter.com/kagglingdieter
Way of the Grandmaster with Chris Deotte
We sit down and talk with 4x Kaggle Grandmaster Chris Deotte about his career, how he got started doing Kaggle, how you can get started doing Kaggle, feature engineering, the perks of AGI, and a lot more!
Chris on Kaggle: https://www.kaggle.com/cdeotte
Data Science, Social Science, and the Near Future of RAPIDS with John Zedlewski
I sit down and talk with the new Director of Engineering for RAPIDS at NVIDIA, John Zedlewski about what economics can learn from machine learning practitioners, engineering challenges that ended up being harder than first thought, how increased automation will change the day-to-day work of data scientists, and much more.
Simulating large-scale numerical models in natural science with Zahra Ronaghi and Christoph Keller
I talk with Zahra Ronaghi, Engineering Manager of AI Infrastructure at NVIDIA and Christoph Keller, Atmospheric Chemist with the NASA Goddard Space Flight Center about their collaboration to bring GPU-accelerated data science to the study of air pollution.
You can find their first blog on the collaboration here and their work around the atmospheric impact of COVID here.
For more about GPU accelerated shape this blog is a good place to start.
Community, Whisky, Fitness, and Data Science with Jim Scott
On this week’s episode, we have NVIDIA’s Head of Developer Relations, Data Science, Jim Scott. We talk about the data science of fine whiskey, data science for fitness, the “secret” of Kaggle Grand Masters (spoiler: it’s giving back to the community), learning and community resources as the future of data science, classic “paradoxes” in basic probability, and some great resources for being a better data scientist.
Kaggle Grandmaster Youtube Interviews - Here’s the most recent sit down Jim did with the Kaggle Grand Masters of NVIDIA. https://www.youtube.com/watch?v=bHuww-l_Sq0
Data Science of the Day - we talk about this toward the end of the episode, and this is a GREAT resource to keep up-to-date with everything going on in data science. https://forums.developer.nvidia.com/c/ai-data-science/data-science-of-the-day/323/none
Jim on Twitter: https://twitter.com/kingmesal
Jim and I reminisce about the Birthday Paradox - here’s a good piece on it from Scientific American. Jim and I were way off on remembering how likely birthday sharing is in a small handful of people. https://www.scientificamerican.com/article/bring-science-home-probability-birthday-paradox/
Don’t let us get your goat talking about the Monty Hall Problem. This explainer shows how an example with a larger number of doors can help give more intuition about what’s actually happening by changing your guess. https://www.statisticshowto.com/probability-and-statistics/monty-hall-problem/
Cantor’s Diagonalization Theorem mentioned in passing. Here’s a link to the wikipedia article - if you aren’t familiar with it, you should check it out. https://en.wikipedia.org/wiki/Cantor%27s_diagonal_argument
Neural Nets, the History of Data Science, and Applied Spacial Analysis with John Murray
This week I talk with John Murray. John has been a data scientist, a CTO, and a professor and has unique insight on the history of data science and where it is going. It’s a great episode and I hope you’ll enjoy!
Links described in the episode:
Pandas and Arrow with Wes McKinney
Our guest this week in the one and only Wes McKinney, creator of Pandas and Apache Arrow. We have a great conversation about his career journey, funding and maintaining open-source software projects, his new company Ursa Computing, how Pandas grew from a passion project to the lingua franca of Python data science, and a lot more.
Data Visualization at Scale with Allan Enemark and Bryan Van de Ven
We sit down and talk with Allan Enemark, data viz lead for RAPIDS and Bryan Van de Ven, Senior Engineer and co-creator of Bokeh to talk about what GPUs are doing for the visualization of data sets across many different tools, and what the future holds for showing your audience what the data is saying.
Links to things discussed in the episode:
Datashader
Plotly
HoloViz
Bokeh
Vis.gl
JupyterCon Tutorial - check it out!
cuxfilter (pronounced "cu - crossfilter") - code on github
Twitter accounts to follow to keep your finger on the pulse of the latest in data viz:
https://twitter.com/DataVizSociety
https://twitter.com/jonmmease
https://twitter.com/AlbertoCairo
https://twitter.com/visualisingdata
https://twitter.com/Elijah_Meeks
https://twitter.com/viegasf
https://twitter.com/giorgialupi
https://twitter.com/flowingdata
https://twitter.com/infobeautiful
BlazingSQL with Felipe Aramburu and William Malpica
Join me as I sit down with Felipe Aramburu and William Malpica as we talk about BlazingSQL's GPU-accelerated SQL queries, start-up life, the tech talent in Peru, things we used to hate about SQL and a lot more.
Give BlazingSQL a try at app.blazingsql.com and once you're convinced, and go here beta.blazingsql.com for their beta of the paid version that will give you access to very large GPU clusters. Thanks!
Artificial General Intelligence in Our Lifetimes with Rachel Allen
In the first bonus episode of RAPIDSFire, I sit down with Rachel Allen, who holds a PhD in Neuroscience. We talk about how neural net models relate to and differ from real brains, ethical issues around conscience machines and their training, and what steps might be taken to get closer to true thinking machines. I had a lot of fun recording this, and I hope you enjoy it!
Reconstructing visual experiences from brain activity evoked by natural movies
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326357/
Dead Salmon Study
http://prefrontal.org/files/posters/Bennett-Salmon-2009.jpg
Existential Comics - Turing Tests and Other Things of That Nature
https://existentialcomics.com/comic/357
Spiteful Octopi
https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecy.3266
Corrupted Microsoft Language Model
Cybersecurity Data Science with Rachel Allen and Bartley Richardson
Join me as I talk with Rachel Allen and Bartley Richardson about applying data science to cybersecurity with GPUs and RAPIDS. We'll also talk in-depth about an amazing extension of the BERT transformer model: CyBERT, the pre-built GPU pipelines in CLX, a super fast GPU tokenizer, and what to expect from them next.
The link to the repos discussed in the episode is here: https://github.com/rapidsai/clx
RAPIDSFire Episode One: The Birth of RAPIDS with Josh Patterson and Keith Kraus
Welcome to the first episode of RAPIDSFire! My rotating cohost this week is Josh Patterson, Senior Director of Engineering at NVIDIA, and Keith Kraus, Systems Software Senior Manager at NVIDIA. These two gentlemen were driving forces behind RAPIDS from the very start, and this is an illuminating talk about GPU data science, open source software, and the past, present, and future of RAPIDS.