DataTalks.Club
By DataTalks.Club
DataTalks.ClubFeb 11, 2022
Community Building and Teaching in AI & Tech - Erum Afzal
Links:
LinkedIn: https://www.linkedin.com/in/erum-afzal-64827b24/
Twitter: https://twitter.com/Erum55449739
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Working in Open Source - Probabl.ai and sklearn - Vincent Warmerdam
Links:
- probabl. YouTube channel: https://www.youtube.com/@UCIat2Cdg661wF5DQDWTQAmg
- Calmcode website: https://calmcode.io/
- probabl. website: https://probabl.ai/
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
AI for Ecology, Biodiversity, and Conservation - Tanya Berger-Wolf
Links:
- Biodiversity and Artificial Intelligence pdf: https://www.gpai.ai/projects/responsible-ai/environment/biodiversity-and-AI-opportunities-recommendations-for-action.pdf
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Knowledge Graphs and LLMs Across Academia and Industry - Anahita Pakiman
Links:
- GitHub repo: https://github.com/antahiap/ADPT-LRN-PHYS/tree/main
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Inclusive Data Leadership Coaching - Tereza Iofciu
We talked about:
- Tereza’s background
- Switching from an Individual Contributor to Lead
- Python Pizza and the pizza management metaphor
- Learning to figure things out on your own and how to receive feedback
- Tereza as a leadership coach
- Podcasts
- Tereza’s coaching framework (selling yourself vs bragging)
- The importance of retrospectives
- The importance of communication and active listening
- Convincing people you don’t have power over
- Building relationships and empathy
- Inclusive leadership
Links:
- LinkedIn: https://www.linkedin.com/in/tereza-iofciu/
- Twitter: https://twitter.com/terezaif
- Github: https://github.com/terezaif
- Website: https:// terezaiofciu.com
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Building Production Search Systems - Daniel Svonava
Links:
- VectorHub: https://superlinked.com/vectorhub/?utm_source=community&utm_medium=podcast&utm_campaign=datatalks
- Daniel's LinkedIn: https://www.linkedin.com/in/svonava/
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html This podcast is sponsored by VectorHub, a free open-source learning community for all things vector embeddings and information retrieval systems.
Building Machine Learning Products - Reem Mahmoud
We talked about:
- Reem’s background
- Context-aware sensing and transfer learning
- Shifting focus from PhD to industry
- Reem’s experience with startups and dealing with prejudices towards PhDs
- AI interviewing solution
- How candidates react to getting interviewed by an AI avatar
- End-to-end overview of a machine learning project
- The pitfalls of using LLMs in your process
- Mitigating biases
- Addressing specific requirements for specific roles
- Reem’s resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/reemmahmoud/recent-activity/all/
- Website: https://topmate.io/reem_mahmoud
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Make an Impact Through Volunteering Open Source Work - Sara EL-ATEIF
We talked about:
- Sara’s background
- On being a Google PhD fellow
- Sara’s volunteer work
- Finding AI volunteer work
- Sara’s Fruit Punch challenge
- How to take part in AI challenges
- AI Wonder Girls
- Hackathons
- Things people often miss in AI projects and hackathons
- Getting creative
- Fostering your social media
- Tips on applying for volunteer projects
- Why it’s worth doing volunteer projects
- Opportunities for data engineers and students
- Sara’s newsletter suggestions
Links:
- Dev and AI hackathons: https://devpost.com/
- Healthcare-focused challenges: https://grand-challenge.org/challenges/
- Volunteering in projects (AI4Good): https://www.fruitpunch.ai/
- Volunteering in projects (AI4Good) 2: https://www.omdena.com/
- Twitter: https://twitter.com/el_ateifSara
- Instagram: https://www.instagram.com/saraelateif/
- LinkedIn: https://www.linkedin.com/in/sara-el-ateif/
- Youtube: www.youtube.com/@elateifsara
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Accelerating The Job Hunt for The Perfect Job in Tech - Sarah Mestiri
We talked about:
- Sarah’s background
- How Sarah became a coach and found her niche
- Sarah’s clients
- How Sarah helps her clients find the perfect job
- Finding a specialization
- Informational interviews
- Building a connection for mutual benefit
- The networking strategy
- Listing your projects in the CV
- The importance of doing research yourself and establishing your interests
- How to land a part-time job when the company wants full-time
- Age is not a factor
- Applying for jobs after finishing a course and the importance of sharing your learnings
- Sarah resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/sarahmestiri/
- Website: https://thrivingcareermoms.com/
- Personal Website: https://www.sarahmestiri.com/
- Youtube channel: https://www.youtube.com/@thrivingcareermoms444
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Machine Learning Engineering in Finance - Nemanja Radojkovic
We talked about:
- Nemanja’s background
- When Nemanja first work as a data person
- Typical problems that ML Ops folks solve in the financial sector
- What Nemanja currently does as an ML Engineer
- The obstacle of implementing new things in financial sector companies
- Going through the hurdles of DevOps
- Working with an on-premises cluster
- “ML Ops on a Shoestring” (You don’t need fancy stuff to start w/ ML Ops)
- Tactical solutions
- Platform work and code work
- Programming and soft skills needed to be an ML Engineer
- The challenges of transitioning from and electrical engineering and sales to ML Ops
- The ML Ops tech stack for beginners
- Working on projects to determine which skills you need
Links:
- LinkedIn: https://www.linkedin.com/in/radojkovic/
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Stock Market Analysis with Python and Machine Learning - Ivan Brigida
We talked about:
- Ivan’s background
- How Ivan became interested in investing
- Getting financial data to run simulations
- Open, High, Low, Close, Volume
- Risk management strategy
- Testing your trading strategies
- Sticking to your strategy
- Important metrics and remembering about trading fees
- Important features
- Deployment
- How DataTalks.Club courses helped Ivan
- Ivan’s site and course sign-up
Links:
- Exploring Finance APIs: https://pythoninvest.com/long-read/exploring-finance-apis
- Python Invest Blog Articles: https://pythoninvest.com/blog
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Bayesian Modeling and Probabilistic Programming - Rob Zinkov
We talked about:
- Rob’s background
- Going from software engineering to Bayesian modeling
- Frequentist vs Bayesian modeling approach
- About integrals
- Probabilistic programming and samplers
- MCMC and Hakaru
- Language vs library
- Encoding dependencies and relationships into a model
- Stan, HMC (Hamiltonian Monte Carlo) , and NUTS
- Sources for learning about Bayesian modeling
- Reaching out to Rob
Links:
- Book 1: https://bayesiancomputationbook.com/welcome.html
- Book/Course: https://xcelab.net/rm/statistical-rethinking/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Navigating Challenges and Innovations in Search Technologies - Atita Arora
We talked about:
- Atita’s background
- How NLP relates to search
- Atita’s experience with Lucidworks and OpenSource Connections
- Atita’s experience with Qdrant and vector databases
- Utilizing vector search
- Major changes to search Atita has noticed throughout her career
- RAG (Retrieval-Augmented Generation)
- Building a chatbot out of transcripts with LLMs
- Ingesting the data and evaluating the results
- Keeping humans in the loop
- Application of vector databases for machine learning
- Collaborative filtering
- Atita’s resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/atitaarora/
- Twitter: https://x.com/atitaarora
- Github: https://github.com/atarora
- Human-in-the-Loop Machine Learning: https://www.manning.com/books/human-in-the-loop-machine-learning
- Relevant Search: https://www.manning.com/books/relevant-search
- Let's learn about Vectors: https://hub.superlinked.com/ Langchain: https://python.langchain.com/docs/get_started/introduction
- Qdrant blog: https://blog.qdrant.tech/
- OpenSource Connections Blog: https://opensourceconnections.com/blog/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
The Entrepreneurship Journey: From Freelancing to Starting a Company - Adrian Brudaru
We talked about:
- Adrian’s background
- The benefits of freelancing
- Having an agency vs freelancing
- What let Adrian switch over from freelancing
- The conception of DLT (Growth Full Stack)
- The investment required to start a company
- Growth through the provision of services
- Growth through teaching (product-market fit)
- Moving on to creating docs
- Adrian’s current role
- Strategic partnerships and community growth through DocDB
- Plans for the future of DLT
- DLT vs Airbyte vs Fivetran
- Adrian’s resource recommendations
Links:
- Adrian's LinkedIn: https://www.linkedin.com/in/data-team/
- Twitter: https://twitter.com/dlt_library
- Github: https://github.com/dlt-hub/dlt
- Website: https://dlthub.com/docs/intro
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Become a Data Freelancer - Dimitri Visnadi
We talked about:
- Dimitri’s background
- The first steps of transitioning into freelance
- Working with recruiters (contracting)
- Deciding on what to charge for your services
- Establishing your network
- Self-marketing
- Contracting vs freelancing
- Which channel is better for those starting out?
- Cutting out the middleman
- Where to look for clients and how to vet them
- The different way of getting into freelancing
- Going back to a full-time job after freelancing
- Common mistakes freelancers make
- Dimitri’s resource suggestions
- Reaching out to Dimitri
Links:
- LinkedIn profile: http://www.linkedin.com/in/visnadi
- The DataFreelancer website: https://thedatafreelancer.com/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
AI for Digital Health - Maria Bruckert
We talked about:
- Maria’s background
- Deciding to go into telecare (healthcare)
- Current difficulties in healthcare
- Getting into the healthcare industry as a lifestyle brand
- The importance of a plan B and being flexible
- What is SQIN and the importance of communication
- Going from lipstick to skin health analysis
- The importance of community and broadening your audience
- The importance of feedback and communicating benefits
- The current state and growth of SQIN
- Convincing investors and the importance of proving profitability
- Maria’s role at SQIN
- Balancing a newborn child and a new company
Links:
- Free ML Engineering course: http://mlzoomcamp.com
- Join DataTalks.Club: https://datatalks.club/slack.html
- Our events: https://datatalks.club/events.html
Cracking the Code: Machine Learning Made Understandable - Christoph Molnar
We talked about:
- Christoph’s background
- Kaggle and other competitions
- How Christoph became interested in interpretable machine learning
- Interpretability vs Accuracy
- Christoph’s current competition engagement
- How Christoph chooses topics for books
- Why Christoph started the writing journey with a book
- Self-publishing vs via a publisher
- Christoph’s other books
- What is conformal prediction?
- Christoph’s book on SHAP
- Explainable AI vs Interpretable AI
- Working alone vs with other people
- Christoph’s other engagements and how to stay hands-on
- Keeping a logbook
- Does one have to be an expert on the topic to write a book about it?
- Writing in the open and other feedback gathering methods
- Advice for those who want to be technical writers
- Self-publishing tools
- Finding Christoph online
Links:
- LinkedIn: https://www.linkedin.com/in/christoph-molnar/
- Website: https://christophmolnar.com/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
The Unwritten Rules for Success in Machine Learning - Jack Blandin
We talked about:
- Jack’s background
- Transitioning from IC to management
- Lesson not taught in traditional school
- The importance of people’s perception, trust, and respect
- How soft skills are relevant to machine learning
- How to put on a salesman hat in machine learning management
- The importance of visuals and building a POC as fast as possible
- 1st Rule of Machine Learning – don’t be afraid to start without machine learning
- The importance of understanding the reality that data represents
- The importance of putting yourself in the shoes of customers
- The importance of software engineering skills in machine learning
- Where to find Jack’s content
- Jack’s next venture
Links:
- Jack's LinkedIn profile: https://www.linkedin.com/in/jackblandin/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
From a Research Scientist at Amazon to a Machine learning/AI Consultant - Verena Webber
Links:
- Mini sound bath: https://www.youtube.com/watch?v=g-lDrcSqcrQ
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
From Marketing to Product Owner in Search - Lera Kaimashnіkova
We talked about:
- Lera’s background
- Lera’s move from Ukraine to Germany
- The transition from Marketing to Product Ownership
- The importance of communication and one-on-ones
- The role of Product Owner
- Utilizing Scrum as a Product Owner
- Building teams and cross-functionality
- Lera’s experience learning about search
- The importance of having both technical knowledge and business context
- Open developer positions at AUTODOC
- What experience Lera came to AUTODOC with
- How marketing skills helped Lera in her current role
- Lera’s resource recommendations
- Everything is possible
Links:
- Post: https://www.linkedin.com/posts/leracaiman_elasticsearch-ecommerce-activity-7106615081588674560-5WQO
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Collaborative Data Science in Business - Ioannis Mesionis
Links:
- LinkedIn: https://www.linkedin.com/in/ioannis-mesionis/
- Github: https://github.com/ioannismesionis
- Website: https://ioannismesionis.github.io/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Bridging Data Science and Healthcare - Eleni Stamatelou
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
DataTalks.Club Anniversary Interview - Alexey Grigorev, Johanna Bayer
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Data Engineering for Fraud Prevention - Angela Ramirez
We talked about:
- Angela's background
- Angela's role at Sam's Club
- The usefulness of knowing ML as a data engineer
- Angela's career path
- Transitioning from data analyst to data engineer/system designer
- Best practices for system design and data engineering
- Working with document databases
- Working with network-based databases
- Detecting fraud with a network-based database
- Selecting the database type to work with
- Neo4j vs Postgres
- The importance of having software engineering knowledge in data engineering
- Data quality check tooling
- The greatest challenges in data engineering
- Debugging and finding the root cause of a failed job
- What kinds of tools Angela uses on a daily basis
- Working with external data sources
- Angela's resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/aramirez1305/
- Twitter: https://twitter.com/angelamaria__r
- Github: https://github.com/aramir62
- Previous podcast talk: https://twitter.com/i/spaces/1OwGWwZAZDnGQ?s=20
Free ML Engineering course: http://mlzoomcamp.com
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
From Data Manager to Data Architect - Loïc Magnien
We talked about:
- Loïc's background
- Data management
- Loïc's transition to data engineer
- Challenges in the transition to data engineering
- What is a data architect?
- The output of a data architect's work
- Establishing metrics and dimensions
- The importance of communication
- Setting up best practices for the team
- Staying relevant and tech-watching
- Setting up specifications for a pipeline
- Be agile, create a POC, iterate ASAP, and build reusable templates
- Reaching out to Loïc for questions
Links:
- Loiic LinkedIn: https://www.linkedin.com/in/loicmagnien/
Free ML Engineering course: http://mlzoomcamp.com
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Pragmatic and Standardized MLOps - Maria Vechtomova
We talked about:
- Maria's background
- Marvelous MLOps
- Maria's definition of MLOps
- Alternate team setups without a central MLOps team
- Pragmatic vs non-pragmatic MLOps
- Must-have ML tools (categories)
- Maturity assessment
- What to start with in MLOps
- Standardized MLOps
- Convincing DevOps to implement
- Understanding what the tools are used for instead of knowing all the tools
- Maria's next project plans
- Is LLM Ops a thing?
- What Ahold Delhaize does
- Resource recommendations to learn more about MLOps
- The importance of data engineering knowledge for ML engineers
Links:
- LinkedIn: https://www.linkedin.com/company/marvelous-mlops/
- Website: https://marvelousmlops.substack.com/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Democratizing Causality - Aleksander Molak
We talked about:
- Aleksander's background
- Aleksander as a Causal Ambassador
- Using causality to make decisions
- Counterfactuals and and Judea Pearl
- Meta-learners vs classical ML models
- Average treatment effect
- Reducing causal bias, the super efficient estimator, and model uplifting
- Metrics for evaluating a causal model vs a traditional ML model
- Is the added complexity of a causal model worth implementing?
- Utilizing LLMs in causal models (text as outcome)
- Text as treatment and style extraction
- The viability of A/B tests in causal models
- Graphical structures and nonparametric identification
- Aleksander's resource recommendations
Links:
- The Book of Why: https://amzn.to/3OZpvBk
- Causal Inference and Discovery in Python: https://amzn.to/46Pperr
- Book's GitHub repo: https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python
- The Battle of Giants: Causality vs NLP (PyData Berlin 2023): https://www.youtube.com/watch?v=Bd1XtGZhnmw
- New Frontiers in Causal NLP (papers repo): https://bit.ly/3N0TFTL
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Mastering Data Engineering as a Remote Worker - José María Sánchez Salas
We talked about:
- José's background
- How José relocated to Norway and his schedule
- Tech companies in Norway and José role
- Challenges of working as a remote data engineer
- José's newsletter on how to make use of data
- The process of making data useful
- Where José gets inspiration for his newsletter
- Dealing with burnout
- When in Norway, do as the Norwegians do
- The legalities of working remotely in Norway
- The benefits of working remotely
Links:
- LinkedIn: https://www.linkedin.com/in/jmssalas
- Github: https://github.com/jmssalas
- Website & Newsletter: https://jmssalas.com
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
The Good, the Bad and the Ugly of GPT - Sandra Kublik
We talked about:
- Sandra's background
- Making a YouTube channel to break into the LLM space
- The business cases for LLMs
- LLMs as amplifiers
- The befits of keeping a human in the loop when using LLMs (AI limitations)
- Using LLMs as assistants
- Building an app that uses an LLM
- Prompt whisperers and how to improve your prompts
- Sandra's 7-day LLM experiment
- Sandra's LLM content recommendations
- Finding Sandra online
Links:
- LinkedIn: https://www.linkedin.com/in/sandrakublik/
- Twitter: https://twitter.com/sandra_kublik
- Youtube: https://www.youtube.com/@sandra_kublik
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
LLMs for Everyone - Meryem Arik
We talked about:
- Meryam's background
- The constant evolution of startups
- How Meryam became interested in LLMs
- What is an LLM (generative vs non-generative models)?
- Why LLMs are important
- Open source models vs API models
- What TitanML does
- How fine-tuning a model helps in LLM use cases
- Fine-tuning generative models
- How generative models change the landscape of human work
- How to adjust models over time
- Vector databases and LLMs
- How to choose an open source LLM or an API
- Measuring input data quality
- Meryam's resource recommendations
Links:
- Website: https://www.titanml.co/
- Beta docs: https://titanml.gitbook.io/iris-documentation/overview/guide-to-titanml...
- Using llama2.0 in TitanML Blog: https://medium.com/@TitanML/the-easiest-way-to-fine-tune-and-inference-llama-2-0-8d8900a57d57
- Discord: https://discord.gg/83RmHTjZgf
- Meryem LinkedIn: https://www.linkedin.com/in/meryemarik/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Investing in Open-Source Data Tools - Bela Wiertz
We talked about:
- Bela's background
- Why startups even need investors
- Why open source is a viable go-to-market strategy
- Building a bottom-up community
- The investment thesis for the TKM Family Office and the blurriness of the funding round naming convention
- Angel investors vs VC Funds vs family offices
- Bela's investment criteria and GitHub stars as a metric
- Inbound sourcing, outbound sourcing, and investor networking
- Making a good impression on an investor
- Balancing open and closed source parts of a product
- The future of open source
- Recent successes of open source companies
- Bela's resource recommendations
Links:
- Understand who is engaging with your open source project article: https://www.crowd.dev/
- Top 6 Books on Developer Community Building: https://www.crowd.dev/post/top-6-books-on-developer-community-building
- Which open source software metrics matter: https://www.bvp.com/atlas/measuring-the-engagement-of-an-open-source-software-community#Which-open-source-software-metrics-matter
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Why Machine Learning Design is Broken - Valerii Babushkin
Links:
- Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter
- Discount: poddatatalks21 (35% off)
- Evidently: https://www.evidentlyai.com/
- Article: https://medium.com/people-ai-engineering/design-documents-for-ml-models-bbcd30402ff7
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Interpretable AI and ML - Polina Mosolova
We talked about:
- Polina's background
- How common it is for PhD students to build ML pipelines end-to-end
- Simultaneous PhD and industry experience
- Support from both the academic and industry sides
- How common the industrial PhD setup is and how to get into one
- Organizational trust theory
- How price relates to trust
- How trust relates to explainability
- The importance of actionability
- Explainability vs interpretability vs actionability
- Complex glass box models
- Does the explainability of a model follow explainability?
- What explainable AI bring to customers and end users
- Can all trust be turned into KPI?
Links:
- LinkedIn: https://www.linkedin.com/in/polina-mosolova/
- Neural Additive Models paper: https://proceedings.neurips.cc/paper/2021/file/251bd0442dfcc53b5a761e050f8022b8-Paper.pdf
- Neural Basis Model paper: https://arxiv.org/pdf/2205.14120.pdf
- Interpretable Feature Spaces paper: https://kdd.org/exploration_files/vol24issue1_1._Interpretable_Feature_Spaces_revised.pdf
From Scratch to Success: Building an MLOps Team and ML Platform - Simon Stiebellehner
We talked about:
- Simon's background
- What MLOps is and what it isn't
- Skills needed to build an ML platform that serves 100s of models
- Ranking the importance of skills
- The point where you should think about building an ML platform
- The importance of processes in ML platforms
- Weighing your options with SaaS platforms
- The exploratory setup, experiment tracking, and model registry
- What comes after deployment?
- Stitching tools together to create an ML platform
- Keeping data governance in mind when building a platform
- What comes first – the model or the platform?
- Do MLOps engineers need to have deep knowledge of how models work?
- Is API design important for MLOps?
- Simon's recommendations for furthering MLOps knowledge
Links:
- LinkedIn: https://www.linkedin.com/in/simonstiebellehner/
- Github: https://github.com/stiebels
- Medium: https://medium.com/@sistel
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
From MLOps to DataOps - Santona Tuli
We talked about:
- Santona's background
- Focusing on data workflows
- Upsolver vs DBT
- ML pipelines vs Data pipelines
- MLOps vs DataOps
- Tools used for data pipelines and ML pipelines
- The “modern data stack” and today's data ecosystem
- Staging the data and the concept of a “lakehouse”
- Transforming the data after staging
- What happens after the modeling phase
- Human-centric vs Machine-centric pipeline
- Applying skills learned in academia to ML engineering
- Crafting user personas based on real stories
- A framework of curiosity
- Santona's book and resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/santona-tuli/
- Upsolver website: upsolver.com
- Why we built a SQL-based solution to unify batch and stream workflows: https://www.upsolver.com/blog/why-we-built-a-sql-based-solution-to-unify-batch-and-stream-workflows
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Data Developer Relations - Hugo Bowne-Anderson
We talked about:
- Hugo's background
- Why do tools and the companies that run them have wildly different names
- Hugo's other projects beside Metaflow
- Transitioning from educator to DevRel
- What is DevRel?
- DevRel vs Marketing
- How DevRel coordinates with developers
- How DevRel coordinates with marketers
- What skills a DevRel needs
- The challenges that come with being an educator
- Becoming a good writer: nature vs nurture
- Hugo's approach to writing and suggestions
- Establishing a goal for your content
- Choosing a form of media for your content
- Is DevRel intercompany or intracompany?
- The Vanishing Gradients podcast
- Finding Hugo online
Links:
- Hugo Browne's github: http://hugobowne.github.io/
- Vanishing Gradients: https://vanishinggradients.fireside.fm/
- MLOps and DevOps: Why Data Makes It Differenthttps://www.oreilly.com/radar/mlops-and-devops-why-data-makes-it-different/
- Evaluate Metaflow for free, right from your Browser: https://outerbounds.com/sandbox/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Lessons Learned from Freelancing and Working in a Start-up - Antonis Stellas
We talked about;
- Antonis' background
- The pros and cons of working for a startup
- Useful skills for working at a startup and the Lean way to work
- How Antonis joined the DataTalks.Club community
- Suggestions for students joining the MLOps course
- Antonis contributing to Evidently AI
- How Antonis started freelancing
- Getting your first clients on Upwork
- Pricing your work as a freelancer
- The process after getting approved by a client
- Wearing many hats as a freelancer and while working at a startup
- Other suggestions for getting clients as a freelancer
- Antonis' thoughts on the Data Engineering course
- Antonis' resource recommendations
Links:
- Lean Startup by Eric Ries: https://theleanstartup.com/
- Lean Analytics: https://leananalyticsbook.com/
- Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/
- Kafka Streaming with python by Khris Jenkins tutorial video: https://youtu.be/jItIQ-UvFI4
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Data Access Management - Bart Vandekerckhove
We talked about:
- Bart's background
- What is data governance?
- Data dictionaries and data lineage
- Data access management
- How to learn about data governance
- What skills are needed to do data governance effectively
- When an organization needs to start thinking about data governance
- Good data access management processes
- Data masking and the importance of automating data access
- DPO and CISO roles
- How data access management works with a data mesh approach
- Avoiding the role explosion problem
- The importance of data governance integration in DataOps
- Terraform as a stepping stone to data governance
- How Raito can help an organization with data governance
- Open-source data governance tools
Links:
- LinkedIn: https://www.linkedin.com/in/bartvandekerckhove/
- Twitter: https://twitter.com/Bart_H_VDK
- Github: https://github.com/raito-io
- Website: https://www.raito.io/
- Data Mesh Learning Slack: https://data-mesh-learning.slack.com/join/shared_invite/zt-1qs976pm9-ci7lU8CTmc4QD5y4uKYtAA#/shared-invite/email
- DataQG Website: https://dataqg.com/
- DataQG Slack: https://dataqgcommunitygroup.slack.com/join/shared_invite/zt-12n0333gg-iTZAjbOBeUyAwWr8I~2qfg#/shared-invite/email
- DMBOK (Data Management Book of Knowledge): https://www.dama.org/cpages/body-of-knowledge
- DMBOK Wheel describing the data governance activities: https://www.dama.org/cpages/dmbok-2-wheel-images
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Data Strategy: Key Principles and Best Practices - Boyan Angelov
We talked about:
- Boyan's background
- What is data strategy?
- Due diligence and establishing a common goal
- Designing a data strategy
- Impact assessment, portfolio management, and DataOps
- Data products
- DataOps, Lean, and Agile
- Data Strategist vs Data Science Strategist
- The skills one needs to be a data strategist
- How does one become a data strategist?
- Data strategist as a translator
- Transitioning from a Data Strategist role to a CTO
- Using ChatGPT as a writing co-pilot
- Using ChatGPT as a starting point
- How ChatGPT can help in data strategy
- Pitching a data strategy to a stakeholder
- Setting baselines in a data strategy
- Boyan's book recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/angelovboyan/
- Twitter: https://twitter.com/thinking_code
- Github: https://github.com/boyanangelov
- Website: https://boyanangelov.com/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Practical Data Privacy - Katharine Jarmul
We talked about:
- Katharine's background
- Katharine's ML privacy startup
- GDPR, CCPA, and the “opt-in as the default” approach
- What is data privacy?
- Finding Katharine's book – Practical Data Privacy
- The various definitions of data privacy and “user profiles”
- Privacy engineering and privacy-enhancing technologies
- Why data privacy is important
- What is differential privacy?
- The importance of keeping privacy in mind when designing systems
- Data privacy on the example of ChatGPT
- Katharine's resource suggestions for learning about data privacy
Links:
- LinkedIn: https://www.linkedin.com/in/katharinejarmul/
- Twitter: https://twitter.com/kjam
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Building Scalable and Reliable Machine Learning Systems - Arseny Kravchenko
We talked about:
- Arseny's background
- Working on machine learning in startups
- What is Machine Learning System Design?
- Constraints and requirements
- Known unknowns vs unknown unknowns (Design stage)
- Writing a design document
- Technical problems vs product-oriented problems
- The solution part of the Design Document
- What motivated Arseny to write a book on ML System Design
- Examples of a Design Document in the book
- The types of readers for ML System Design
- Working with the co-author
- Reacting to constraints and feedback when writing a book
- Arseny's favorite chapter of the book
- Other resources where you can learn about ML System Design
- Twitter Giveaway
Links:
- Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter
- Discount: poddatatalks21 (35% off)
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Building an Open-Source NLP Tool - Johannes Hötter
We talked about:
- Johannes’s background
- Johannes’s Open Source Spotlight demos – Refinery and Bricks
- The difficulties of working with natural language processing (NLP)
- Incorporating ChatGPT into a process as a heuristic
- What is Bricks?
- The process of starting a startup – Kern
- Making the decision to go with open source
- Pros and cons of launching as open source
- Kern’s business model
- Working with enterprises
- Johannes as a salesperson
- The team at Kern
- Johannes’s role at Kern
- How Johannes and Henrik separate responsibilities at Kern
- Working with very niche use cases
- The short story of how Kern got its funding
- Johannes’s resource recommendation
Links:
- Refinery's GitHub repo: https://github.com/code-kern-ai/refinery
- Bricks' Github repo: https://github.com/code-kern-ai/bricks
- Bricks Open Source Spotlight demo: https://www.youtube.com/watch?v=r3rXzoLQy2U
- Refinery Open Source Spotlight demo: https://www.youtube.com/watch?v=LlMhN2f7YDg
- Discord: https://discord.com/invite/qf4rGCEphW
- Ker's Website: https://www.kern.ai
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Navigating Industrial Data Challenges - Rosona Eldred
We talked about:
- Rosona’s background
- How mathematics knowledge helps in industry
- What is industrial data?
- Setting up an industrial process using blue paint
- Internet companies’ data vs industrial data
- Explaining industrial processes using packing peanuts
- Why productive industry needs data
- Measuring product qualities
- How data specialists use industrial data
- Defining and measuring sustainability
- Using data in reactionary measures to changing regulations
- Types of industrial data
- Solving problems and optimizing with industrial data
- Industrial solvers
- Tiny data vs Big data in productive industry
- The advantages of coming from academia into productive industry
- Materials and resources for industrial data
- Women in industry
- Why Rosona decided to shift to industrial data
Links:
- Kaggle dataset: https://www.kaggle.com/datasets/paresh2047/uci-semcom
Mastering Self-Learning in Machine Learning - Aaisha Muhammad
We talked about:
- Aaisha’s background
- How homeschooling affects self-study
- Deciding on what to learn about
- Establishing whether a resource is good
- How Aaisha focuses on learning
- Deciding on what kind of project to build
- Find research materials
- Aaisha’s experience with the Data Talks Club ML Zoomcamp
- ML Zoomcamp projects
- Aaisha’s interest in bioinformatics
- Keeping motivated with deadlines
- Notes and time-tracking tools
- Drawbacks to self-studying
- Aaisha’s interest in machine learning
- Aaisha’s least favorable part of ML Zoomcamp
- Helping people as a way to learn
- Using ChatGPT as a “study group”
- Is it possible to use self-studying to learn high-level topics
- Switching topics to avoid burnout
- Aaisha’s resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/aaisha-muhammad/
- Twitter: https://twitter.com/ZealousMushroom
- Github: https://github.com/AaishaMuhammad
- Website: http://www.aaishamuhammad.co.za/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
The Secret Sauce of Data Science Management - Shir Meir Lador
We talked about:
- Shir’s background
- Debrief culture
- The responsibilities of a group manager
- Defining the success of a DS manager
- The three pillars of data science management
- Managing up
- Managing down
- Managing across
- Managing data science teams vs business teams
- Scrum teams, brainstorming, and sprints
- The most important skills and strategies for DS and ML managers
- Making sure proof of concepts get into production
Links:
- The secret sauce of data science management: https://www.youtube.com/watch?v=tbBfVHIh-38
- Lessons learned leading AI teams: https://blogs.intuit.com/2020/06/23/lessons-learned-leading-ai-teams/
- How to avoid conflicts and delays in the AI development process (Part I): https://blogs.intuit.com/2020/12/08/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-i/
- How to avoid conflicts and delays in the AI development process (Part II): https://blogs.intuit.com/2021/01/06/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-ii/
- Leading AI teams deck: https://drive.google.com/drive/folders/1_CnqjugtsEbkIyOUKFHe48BeRttX0uJG
- Leading AI teams video: https://www.youtube.com/watch?app=desktop&v=tbBfVHIh-38
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
SE4ML - Software Engineering for Machine Learning - Nadia Nahar
We talked about:
- Nadia’s background
- Academic research in software engineering
- Design patterns
- Software engineering for ML systems
- Problems that people in industry have with software engineering and ML
- Communication issues and setting requirements
- Artifact research in open source products
- Product vs model
- Nadia’s open source product dataset
- Failure points in machine learning projects
- Finding solutions to issues using Nadia’s dataset and experience
- The problem of siloing data scientists and other structure issues
- The importance of documentation and checklists
- Responsible AI
- How data scientists and software engineers can work in an Agile way
Links:
- Model Card: https://arxiv.org/abs/1810.03993
- Datasheets: https://arxiv.org/abs/1803.09010
- Factsheets: https://arxiv.org/abs/1808.07261
- Research Paper: https://www.cs.cmu.edu/~ckaestne/pdf/icse22_seai.pdf
- Arxiv version: https://arxiv.org/pdf/2110.
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Starting a Consultancy in the Data Space - Aleksander Kruszelnicki
We talked about:
- Aleksander’s background
- The difficulty of selling data stack as a service
- How Aleksander got into consulting
- The Mom Test – extracting feedback from people
- User interviews
- Why Aleksander’s data stack as a service startup was not viable
- How Aleksander decided to switch to consulting
- Finding clients to consult
- Figuring out how to position your services
- Geographical limitations
- Figuring out your target audience
- The importance of networking and marketing
- Pricing your services
- The pitfalls of daily and hourly pricing and how to balance incentives
- Is Germany a good place to found a company?
- Aleksander’s book recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/alkrusz/
- Twitter: https://twitter.com/alkrusz
- Website: www.leukos.io
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Biohacking for Data Scientists and ML Engineers - Ruslan Shchuchkin
We talked about:
- Ruslan’s background
- Fighting procrastination and perfectionism
- What is biohacking?
- The role of dopamine and other hormones in daily life
- How meditation can help
- The influence light has on our bodies
- Behavioral biohacking
- Daylight lamps and using light to wake up
- Sleep cycles
- How nutrition affects productivity
- Measuring productivity
- Examples of unsuccessful biohacking attempts
- Stoicism, voluntary discomfort, and self-challenges
- Biohacking risks and ways to prevent them
- Coffee and tea biohacking
- Using self-reflection and tracking to measure results
- Mindset shifting
- Stoicism book recommendation
- Work/life balance
- Ruslan’s biohacking resource recommendation
Links:
- LinkedIn: https://www.linkedin.com/in/ruslanshchuchkin/
ree data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Analytics for a Better World - Parvathy Krishnan
We talked about:
- Parvathy’s background
- Brainstorming sessions with nonprofits to establish data maturity
- Example of an Analytics for a Better World project
- The overall data maturity situation of nonprofits vs private sector
- Solving the skill gap
- Publicly available content
- The Analytics for a Better World Academy
- The Academy’s target audience
- How researchers can work with Analytics for a Better World
- Improving data maturity in nonprofit organizations
- People, processes, and technology
- Typical tools that Analytics for a Better World recommends to nonprofits
- Profiles in nonprofits
- Does Analytics for a Better World has a need for data engineers?
- The Analytics for a Better World team
- Factors that help organizations become more data-driven
- Parvathy’s resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/parvathykrishnank/
- Twitter: https://twitter.com/ABWInstitute
- Github: https://github.com/Analytics-for-a-Better-World
- Website: https://analyticsbetterworld.org/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Accelerating the Adoption of AI through Diversity - Dânia Meira
We talked about:
- Dania’s background
- Founding the AI Guild
- Datalift Summit
- Coming up with meetup topics
- Diversity in Berlin
- Other types of diversity besides gender
- The pitfalls of lacking diversity
- Creating an environment where people can safely share their experiences
- How the AI Guild helps organizations become more diverse
- How the AI guild finds women in the fields of AI and data science
- Advice for people in underrepresented groups
- Organizing a welcoming environment and creating a code of conduct
- AI Guild’s consulting work and community
- AI Guild team
- Dania’s resource recommendations
- Upcoming Datalift Summit
Links:
- Call for Speakers for the #datalift summit (Berlin, 14 to 16 June 2023): https://eu1.hubs.ly/H02RXvX0
- Coded Bias documentary on Netflix: https://www.netflix.com/de/title/81328723#:~:text=This%20documentary%20investigates%20the%20bias,flaws%20in%20facial%20recognition%20technology.
- Book Weapons of Math Destruction by Cathy O'Neil: https://en.wikipedia.org/wiki/Weapons_of_Math_Destruction
- Book Lean In by Sheryl Sandberg: https://en.wikipedia.org/wiki/Lean_In
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html