Skip to main content
Plumbers of Data Science

Plumbers of Data Science

By Andreas Kretz

Data Engineering is the plumbing of data science. Almost invisible, but super important and a big mess when done wrong.
We talk about interesting Data Engineering trends and topics. I also train Data Engineering in my Data Engineering Academy at LearnDataEngineering.com
Available on
Apple Podcasts Logo
Google Podcasts Logo
Overcast Logo
Pocket Casts Logo
RadioPublic Logo
Spotify Logo
Currently playing episode

#033 How APIs Rule The World

Plumbers of Data ScienceSep 03, 2018

00:00
36:15
#90 Taylor McGrath - The Future of the Modern Data Stack

#90 Taylor McGrath - The Future of the Modern Data Stack

Super happy to have Taylor with me on this stream. She is the VP of Data Labs at Rivery and therefore has a lot of experience with data platforms. We'll talk about the modern data stack and where it's going. I'm excited to hear her experience about the changes that are happening in the data space, and what that means for data engineers & data teams.

Jan 25, 202347:01
#89 Piyush Sachdeva - Getting Into Google After Eight Rejections from Amazon!

#89 Piyush Sachdeva - Getting Into Google After Eight Rejections from Amazon!

In this video I talk to Piyush who's an engineer at Google and has his own YouTube channel: "Tech Tutorials with Piyush". He's a really good guy and I love how he's dedicated to teaching engineering. We are talking about some awesome topics like: 

  • Is Linkedin a must for getting a job?
  • Tips for recording yourself 
  • Cloud Engineering vs Data Engineering
  • Which Cloud Platform should you choose right now?
  • The amazing Google work culture explained
  • Everybody should learn how to use Kubernetes
  • How getting rejected over and over at Amazon got him into Google 
  • The hiring process at Google  

Have fun!

You can also check this on out on YouTube: https://youtu.be/FZemVaqQcnM

If you want to get into Data Engineering check out my Academy at https://learndataengineering.com

Jan 16, 202344:27
#88 - Wouter Trappers - How to Realize a Data Strategy Like a Pro!

#88 - Wouter Trappers - How to Realize a Data Strategy Like a Pro!

I have seen people doing that wrong a few times. Luckily Wouter Trappers who is helping companies as a professional can help. We talked about The steps you need to take from value proposition to dashboards. Wouter is really knowledgeable and it was super fun talking with him and hearing his approach.

Apr 12, 202239:48
#87 - Dhruba Borthakur - From Hadoop to real time analytics

#87 - Dhruba Borthakur - From Hadoop to real time analytics

Dhruba Borthakur is CTO at Rockset and a passionate Data Engineer. Before co-founding Rockset he played a big role in development of Hadoop HDFS at Yahoo as well as HBase and RocksDB at Facebook. His current project is the serverless Rockset platform where you can gain real time analytics insight into your data. I tried it out before our talk and really liked it.

Apr 12, 202201:05:37
#86 The Ultimate Data Engineering Introduction

#86 The Ultimate Data Engineering Introduction

The Podcast is back!!!! I promise I am going to keep it up to date this time ;)
In this episode I talk about my newest Data Engineering course. I think it's the ultimate 1 hour 15 minutes introduction to Data Engineering. 
There were also a ton of questions from the chat that I answered. Think you really enjoy this.

Jan 14, 202101:14:35
#085 Big Data and Data Science Landscape plus trying to read Tweets with Nifi

#085 Big Data and Data Science Landscape plus trying to read Tweets with Nifi

We are looking into the network communication protocol map. I first saw this like 10 years ago and its awesome. 

Then we check out the Big Data and Data Science Landscape image. It shows you all the tools available to do data science, machine learning and data engineering. Which is very helpful if you are researching for tools to use. 

Before using the Twitter API you got to create a developer account. So, I show you how I created one. After that I tried to get Nifi to download Tweets but it is not working.


May 28, 201943:07
#084 Behind the scenes: Audio podcast, free transcriptions and GitHub

#084 Behind the scenes: Audio podcast, free transcriptions and GitHub

Today's podcast is a bit of a behind the scenes. 

What it takes to do a audio podcast. How you can get audio to text transcriptions for free. 

.Also Github questions on how to work with branches on the Cookbook

May 27, 201951:21
#083 Data Engineering at OLX Case Study
May 27, 201901:10:53
#082 Reading Tweets With Apache Nifi & IaaS vs PaaS vs SaaS

#082 Reading Tweets With Apache Nifi & IaaS vs PaaS vs SaaS

In this episode we install the Nifi docker container and look into how we can extract the twitter data.

We are also talking about the differences between infrastructure as a service, platform as a service and application as a service.

May 27, 201901:19:07
#081 How to get tweets from the Twitter API

#081 How to get tweets from the Twitter API

In this episode we look into the Twitter API documentation, which I love by the way.

How can we get old tweets for a certain hashtags and how to get current live tweets for these hashtags.

May 27, 201901:09:47
#080 How To Find A Job In Germany & Answering Mails

#080 How To Find A Job In Germany & Answering Mails

Tips on how you find a job in Germany and two super interesting mails.

May 27, 201954:54
#079 Trying to stay true to myself and making the cookbook public on GitHub
May 27, 201924:34
#078 Cookbook collaboration and updates

#078 Cookbook collaboration and updates

Updates of the cookbook and how to collaborate on it

May 27, 201931:08
#077 Lambda and Kappa Architecture

#077 Lambda and Kappa Architecture

In this episode we talk about the lambda architecture with stream and batch processing as well as a alternative the Kappa Architecture that consists only of streaming. Also Data engineer vs data scientist and we discuss Andrew Ng's AI Transformation Playbook

May 27, 201901:22:02
#076 Cloud vs On Premise How To Decide

#076 Cloud vs On Premise How To Decide

How do you choose between Cloud vs On-Premise, pros and cons and what you have to think about. Because there are good reasons to not go cloud.

Also thoughts on how to choose between the cloud providers by just comparing instance prices. Otherwise the comparison will drive you insane.

May 27, 201901:15:56
#075 Creating the Course Structure For My Data Engineering Course

#075 Creating the Course Structure For My Data Engineering Course

In this episode we go over the ideas I have for the data engineering course structure. It was your chance for you to influence what we put in there.

May 27, 201953:19
#074 Starting My Data Engineering Online Course

#074 Starting My Data Engineering Online Course

In this video we go over some of the 100+ comments I received on LinkedIn about a data engineering training. 

May 27, 201901:01:19
#073 Data Engineering At LinkedIn Case Study

#073 Data Engineering At LinkedIn Case Study

Let's check out how LinkedIn is processing data

May 27, 201901:12:21
#072 Data Engineering At Twitter Case Study

#072 Data Engineering At Twitter Case Study

How is Twitter doing Data Engineering? Oh man, they have a lot of cool things to share these tweets. 

May 27, 201956:27
#071 Data Engineering At Spotify Case Study

#071 Data Engineering At Spotify Case Study

In this episode we are looking at the data engineering at Spotify, my favorite music streaming service. How do they process all that data?

May 27, 201943:04
#070 The Engineering Culture At Spotify

#070 The Engineering Culture At Spotify

In this podcast we look at the engineering culture at Spotify, my favorite music streaming service. 

The process behind the development of Spotify is really awesome.

May 27, 201954:37
#069 Data Engineering At Pinterest Case Study

#069 Data Engineering At Pinterest Case Study

A look into how Pinterest is doing data engineering.

May 27, 201901:06:57
#068 A Budget Data Science PC Build
May 27, 201921:25
#067 Data Engineering At NASA Case Study

#067 Data Engineering At NASA Case Study

A look into how NASA is doing data engineering.

May 27, 201901:01:43
#066 How To Do Data Science From A Data Engineers Perspective

#066 How To Do Data Science From A Data Engineers Perspective

A simple introduction how to do data science in the context of the internet of things. 

May 27, 201931:48
#065 Data Engineering At CERN Case Study

#065 Data Engineering At CERN Case Study

A look into how CERN is doing Data Engineering. They get huge amounts of data from the Large Hydron Colider. Let's check it out.

May 27, 201901:16:24
#064 Data Engineering At Booking.com Case Study

#064 Data Engineering At Booking.com Case Study

A look into how booking.com is doing data engineering.

May 27, 201959:09
#063 Data Engineering At Airbnb Case Study

#063 Data Engineering At Airbnb Case Study

A look into how Airbnb is doing Data Engineering.

May 27, 201901:02:59
#062 Data Engineering At Netflix Case Study

#062 Data Engineering At Netflix Case Study

How Netflix is doing Data Engineering using their Keystone platform

May 27, 201949:15
#061 Reworking My Cookbook For Data Engineering

#061 Reworking My Cookbook For Data Engineering

I decided to rework the cookbook focusing more on case studies and less on explaining tools.

People keep asking me for a path to become a data engineer and, let's be honest, you will never achieve that with just knowledge of the tools.

Finding out how companies do data engineering on their data science platforms is way more useful.

Over the next weeks we will go over each study on my YouTube channel. The stuff we talk about will then go into the cookbook too.

May 27, 201916:58
#060 What Is Hadoop And Is Hadoop Still Relevant In 2019?

#060 What Is Hadoop And Is Hadoop Still Relevant In 2019?

A Introduction into Hadoop HDFS, YARN and MapReduce. 

Yes, Hadoop is still relevant in 2019 even if you look into serverless tools. 

May 27, 201924:13
#059 A Look Into The Siemens Mindsphere IoT Platform? | #059

#059 A Look Into The Siemens Mindsphere IoT Platform? | #059

The Internet of things is a huge deal. There are many platforms available. But, which one is actually good?

Join me on a 50 minute dive into the Siemens Mindsphere online documentation.

 I have to say I was super unimpressed by what I found. 

Many limitations, unclear architecture and no pricing available? 

Not good!

May 27, 201953:31
#058 Guitars And Data Live Stream

#058 Guitars And Data Live Stream

A stream full of mediocre guitar playing and great Q&A about Hadoop. 

May 27, 201901:36:30
#057 Introducing The Plumbers Medium Publication

#057 Introducing The Plumbers Medium Publication

I have created a Medium Publication especially for us Plumbers of Data Science who work in Data Engineering and Big Data.

It's called, you guessed it, Plumbers of Data Science.

May 27, 201912:59
#056 NoSQL Key Value Stores Explained With HBase

#056 NoSQL Key Value Stores Explained With HBase

What is the difference between SQL and NoSQL?

In this episode I show you on the example of HBase how a key/value store works. 

May 27, 201958:16
#055 Data Warehouse vs Data Lake

#055 Data Warehouse vs Data Lake

On this podcast I talk about data warehouses and data lakes.

When do people use which? What are the pros and cons of both?

Architecture examples for both and does it make sense to completely move to a data lake?

May 27, 201934:60
#054 How to Market Yourself in 2019 Student or Professional

#054 How to Market Yourself in 2019 Student or Professional

In this episode I talk about how you can gain a competitive edge on the job market. It's super simple, you can and should start with it TODAY by putting yourself out there. 

May 27, 201942:11
#053 The Data Science Depression Is Coming? What You Can Do

#053 The Data Science Depression Is Coming? What You Can Do

The Data Science Hype is still strong. Where's the industry going, towards a cliff? Here's what can you do?

May 27, 201941:53
#052 Data Engineering Cookbook Live Stream

#052 Data Engineering Cookbook Live Stream

In this episode I show you the first version of my data engineering cookbook.

May 27, 201955:04
#051 Five Books To Buy As A Data Engineer & My Book Buying Strategy

#051 Five Books To Buy As A Data Engineer & My Book Buying Strategy

Getting a book and reading it cover to cover is useless. In this episode I show you my strategy of buying books complimentary to your work. And 5 great books I read over the years that helped me get where I am now.

May 27, 201912:49
#050 Data Engineer Scientist or Analyst Which One Is For You?

#050 Data Engineer Scientist or Analyst Which One Is For You?

In this podcast we talk about the differences between data scientists, analysts and engineers. Which are the three main data science jobs.

All three super important.

May 27, 201905:29
#049 I Found A REAL Use For Blockchain, At Least I thought So

#049 I Found A REAL Use For Blockchain, At Least I thought So

After all the BS solutions using Blockchain I thought I finally found one that makes sense. Of all the possibilities it's the EU data protection law GDPR. Well, one problem I overlooked in this podcast is, that it is impossible to delete data after it is in the chain. That's however a rule for GDPR.

So, I was wrong. Again :D


May 27, 201910:07
#048 From Wannabe Data Scientist To Engineer My Journey

#048 From Wannabe Data Scientist To Engineer My Journey

In this episode Kate Strachnyi interviews me for her humans of data science podcast. We talk about how I found out that I am more into the engineering part of data science. 

May 27, 201912:17
#047 The Truth About Data Science Salary For Graduates

#047 The Truth About Data Science Salary For Graduates

In this episode I show you how much data science graduates are actually payed in Germany.

All over the internet you can find that Data Science salary is over 100k Dollars. Data Engineer or Data Scientist. It's way lower then that.

Then I give you a few really good tips on how to choose the right company to work for. Huge corporation, startup or small company? Here's how to choose.

May 27, 201916:49
#046 How To Use GitHub for LaTeX Version Control

#046 How To Use GitHub for LaTeX Version Control

In this podcast I am showing you how I use GitHub to write my Data Engineering Cookbook with LaTex.

May 27, 201905:57
#045 Why I Use LaTeX to Write Professionally And You Should Too
Dec 07, 201812:59
#044 How to Increase Your Chances for Internships or a Full-time Job
Nov 27, 201811:33
#041 Agile Development Is Important But Please Don't Do Scrum

#041 Agile Development Is Important But Please Don't Do Scrum

I love agile development. People keep telling you to do Scrum, like it's the only and best choice to be agile. It's not. Here's my take on scrum and my four main beefs with it. Watch out for these issues if you are doing scrum.
Oct 18, 201818:48
#040 Huge Big Data News! Cloudera and Hortonworks Merge

#040 Huge Big Data News! Cloudera and Hortonworks Merge

So, Cloudera and Hortonworks merge... In today's Plumbers of Data Science Podcast I talk about what these, big data vendors do. How they enable companies, admins and developers to do data science and many more things.

If you are interested in the whole hadoop ecosystem you need to check out this episode. You won't regret it ;)
Oct 09, 201823:55
#039 Is ETL Dead For Data Science and Big Data?

#039 Is ETL Dead For Data Science and Big Data?

Is ETL dead in Data Science and Big Data? In today's podcast I share with you my views on your questions regarding ETL (extract, transform, load). Data Lakes & Data Warehouse where is the difference? Is ETL still practiced or did pre processing & cleansing replace it What would replace ETL in Data Engineering? How to become a data engineer? (check out my facebook note) How to get experience training at home? Real time analytics with RDBMS or HDFS?
Oct 03, 201828:47