DataTalks.Club

By DataTalks.Club

DataTalks.Club - the place to talk about data!

Listen on Spotify

Available on

Report content on Spotify

From Open-Source Maintainer to Founder - Will McGugan

DataTalks.ClubJul 15, 2022

00:00

49:34

From Hackathons to Developer Advocacy - Will Russel

In this podcast episode, we talked with Will Russell about From Hackathons to Developer Advocacy.

About the Speaker:

Will Russell is a Developer Advocate at Kestra, known for his videos on workflow orchestration. Previously, Will built open source education programs to help up and coming developers make their first contributions in open source. With a passion for developer education, Will creates technical video content and documentation that makes technologies more approachable for developers.

In this episode, we sit down with Will—developer advocate, content creator, and passionate community builder. We’ll hear about his unique path through tech, the lessons he’s learned, and his approach to making complex topics accessible and engaging. Whether you’re curious about open source, hackathons, or what it’s like to bridge the gap between developers and the broader tech community, this conversation is full of insights and inspiration.

🕒 TIMECODES

0:00 Introduction, career journeys, and video setup and workflow

10:41 From hackathons to open source: Early experiences and learning

16:04 Becoming a hackathon organizer and the value of soft skills

23:18 How to organize a hackathon, memorable projects, and creativity

33:39 Major League Hacking: Building community and scaling student programs

41:16 Mentorship, development environments, and onboarding in open source

49:14 Developer advocacy, content strategy, and video tips

57:16 Will’s current projects and future plans for content creation

🔗 CONNECT WITH DataTalksClub

Join the community - https://datatalks.club/slack.html

Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

Check other upcoming events - https://lu.ma/dtc-events

LinkedIn - https://www.linkedin.com/company/datatalks-club/

Twitter - https://twitter.com/DataTalksClub

Website - https://datatalks.club/

🔗 CONNECT WITH WILL

LinkedIn - https://www.linkedin.com/in/wrussell1999/

Twitter - https://x.com/wrussell1999

GitHub - https://github.com/wrussell1999

Website - https://wrussell.co.uk/

May 26, 202557:11

Build a Strong Career in Data - Lavanya Gupta

In this podcast episode, we talked with Lavanya Gupta about Building a Strong Career in Data.

About the Speaker:

Lavanya is a Carnegie Mellon University (CMU) alumni of the Language Technologies Institute (LTI). She works as a Sr. AI/ML Applied Associate at JPMorgan Chase in their specialized Machine Learning Center of Excellence (MLCOE) vertical. Her latest research on long-context evaluation of LLMs was published in EMNLP 2024.

In addition to having a strong industrial research background of 5+ years, she is also an enthusiastic technical speaker. She has delivered talks at events such as Women in Data Science (WiDS) 2021, PyData, Illuminate AI 2021, TensorFlow User Group (TFUG), and MindHack! Summit. She also serves as a reviewer at top-tier NLP conferences (NeurIPS 2024, ICLR 2025, NAACL 2025). Additionally, through her collaborations with various prestigious organizations, like Anita BOrg and Women in Coding and Data Science (WiCDS), she is committed to mentoring aspiring machine learning enthusiasts.

In this episode, we talk about Lavanya Gupta’s journey from software engineer to AI researcher. She shares how hackathons sparked her passion for machine learning, her transition into NLP, and her current work benchmarking large language models in finance. Tune in for practical insights on building a strong data career and navigating the evolving AI landscape.

🕒 TIMECODES

00:00 Lavanya’s journey from software engineer to AI researcher

10:15 Benchmarking long context language models

12:36 Limitations of large context models in real domains

14:54 Handling large documents and publishing research in industry

19:45 Building a data science career: publications, motivation, and mentorship

25:01 Self-learning, hackathons, and networking

33:24 Community work and Kaggle projects

37:32 Mentorship and open-ended guidance

51:28 Building a strong data science portfolio

🔗 CONNECT WITH LAVANYALinkedIn - / lgupta18 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn - / datatalks-club Twitter - / datatalksclub Website - https://datatalks.club/

May 09, 202551:60

From Supply Chain Management to Digital Warehousing and FinOps - Eddy Zulkifly

In this podcast episode, we talked with Eddy Zulkifly about From Supply Chain Management to Digital Warehousing and FinOps

About the Speaker:

Eddy Zulkifly is a Staff Data Engineer at Kinaxis, building robust data platforms across Google Cloud, Azure, and AWS. With a decade of experience in data, he actively shares his expertise as a Mentor on ADPList and Teaching Assistant at Uplimit. Previously, he was a Senior Data Engineer at Home Depot, specializing in e-commerce and supply chain analytics. Currently pursuing a Master’s in Analytics at the Georgia Institute of Technology, Eddy is also passionate about open-source data projects and enjoys watching/exploring the analytics behind the Fantasy Premier League.

In this episode, we dive into the world of data engineering and FinOps with Eddy Zulkifly, Staff Data Engineer at Kinaxis. Eddy shares his unconventional career journey—from optimizing physical warehouses with Excel to building digital data platforms in the cloud.

🕒 TIMECODES

0:00 Eddy’s career journey: From supply chain to data engineering

8:18 Tools & learning: Excel, Docker, and transitioning to data engineering

21:57 Physical vs. digital warehousing: Analogies and key differences

31:40 Introduction to FinOps: Cloud cost optimization and vendor negotiations

40:18 Resources for FinOps: Certifications and the FinOps Foundation

45:12 Standardizing cloud cost reporting across AWS/GCP/Azure

50:04 Eddy’s master’s degree and closing thoughts

🔗 CONNECT WITH EDDY

Twitter - https://x.com/eddarief

Linkedin - https://www.linkedin.com/in/eddyzulkifly/

Github: https://github.com/eyzyly/eyzyly

ADPList: https://adplist.org/mentors/eddy-zulkifly

🔗 CONNECT WITH DataTalksClub

Join the community - https://datatalks.club/slack.html

Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

Check other upcoming events - https://lu.ma/dtc-events

LinkedIn - https://www.linkedin.com/company/datatalks-club/

Twitter - https://twitter.com/DataTalksClub

Website - https://datatalks.club/

Apr 04, 202552:08

Data Intensive AI - Bartosz Mikulski

In this podcast episode, we talked with Bartosz Mikulski about Data Intensive AI.

About the Speaker:

Bartosz is an AI and data engineer. He specializes in moving AI projects from the good-enough-for-a-demo phase to production by building a testing infrastructure and fixing the issues detected by tests. On top of that, he teaches programmers and non-programmers how to use AI. He contributed one chapter to the book 97 Things Every Data Engineer Should Know, and he was a speaker at several conferences, including Data Natives, Berlin Buzzwords, and Global AI Developer Days.

In this episode, we discuss Bartosz’s career journey, the importance of testing in data pipelines, and how AI tools like ChatGPT and Cursor are transforming development workflows. From prompt engineering to building Chrome extensions with AI, we dive into practical use cases, tools, and insights for anyone working in data-intensive AI projects. Whether you’re a data engineer, AI enthusiast, or just curious about the future of AI in tech, this episode offers valuable takeaways and real-world experiences.

0:00 Introduction to Bartosz and his background

4:00 Bartosz’s career journey from Java development to AI engineering

9:05 The importance of testing in data engineering

11:19 How to create tests for data pipelines

13:14 Tools and approaches for testing data pipelines

17:10 Choosing Spark for data engineering projects

19:05 The connection between data engineering and AI tools

21:39 Use cases of AI in data engineering and MLOps

25:13 Prompt engineering techniques and best practices

31:45 Prompt compression and caching in AI models

33:35 Thoughts on DeepSeek and open-source AI models

35:54 Using AI for lead classification and LinkedIn automation

41:04 Building Chrome extensions with AI integration

43:51 Comparing Cursor and GitHub Copilot for coding

47:11 Using ChatGPT and Perplexity for AI-assisted tasks

52:09 Hosting static websites and using AI for development

54:27 How blogging helps attract clients and share knowledge

58:15 Using AI to assist with writing and content creation

🔗 CONNECT WITH Bartosz

LinkedIn: https://www.linkedin.com/in/mikulskibartosz/

Github: https://github.com/mikulskibartosz

Website: https://mikulskibartosz.name/blog/

🔗 CONNECT WITH DataTalksClub

Join the community - https://datatalks.club/slack.html

Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

Check other upcoming events - https://lu.ma/dtc-events

LinkedIn - https://www.linkedin.com/company/datatalks-club/

Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/

Mar 21, 202554:55

MLOps in Corporations and Startups - Nemanja Radojkovic

In this podcast episode, we talked with Nemanja Radojkovic about MLOps in Corporations and Startups.

About the Speaker:

Nemanja Radojkovic is Senior Machine Learning Engineer at Euroclear.

In this event,we’re diving into the world of MLOps, comparing life in startups versus big corporations. Joining us again is Nemanja, a seasoned machine learning engineer with experience spanning Fortune 500 companies and agile startups. We explore the challenges of scaling MLOps on a shoestring budget, the trade-offs between corporate stability and startup agility, and practical advice for engineers deciding between these two career paths. Whether you’re navigating legacy frameworks or experimenting with cutting-edge tools.

1:00 MLOps in corporations versus startups

6:03 The agility and pace of startups

7:54 MLOps on a shoestring budget

12:54 Cloud solutions for startups

15:06 Challenges of cloud complexity versus on-premise

19:19 Selecting tools and avoiding vendor lock-in

22:22 Choosing between a startup and a corporation

27:30 Flexibility and risks in startups

29:37 Bureaucracy and processes in corporations

33:17 The role of frameworks in corporations

34:32 Advantages of large teams in corporations

40:01 Challenges of technical debt in startups

43:12 Career advice for junior data scientists

44:10 Tools and frameworks for MLOps projects

49:00 Balancing new and old technologies in skill development

55:43 Data engineering challenges and reliability in LLMs

57:09 On-premise vs. cloud solutions in data-sensitive industries

59:29 Alternatives like Dask for distributed systems

🔗 CONNECT WITH NEMANJA

LinkedIn - / radojkovic

Github - https://github.com/baskervilski

🔗 CONNECT WITH DataTalksClub

Join the community - https://datatalks.club/slack.html

Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/...

Check other upcoming events - https://lu.ma/dtc-events

LinkedIn - / datatalks-club

Twitter - / datatalksclub

Website - https://datatalks.club/

Mar 14, 202558:04

Trends in Data Engineering – Adrian Brudaru

In this podcast episode, we talked with Adrian Brudaru about the past, present and future of data engineering.

About the speaker:

Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted.

As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.

0:00 Introduction to DataTalks.Club

1:05 Discussing trends in data engineering with Adrian

2:03 Adrian's background and journey into data engineering

5:04 Growth and updates on Adrian's company, DLT Hub

9:05 Challenges and specialization in data engineering today

13:00 Opportunities for data engineers entering the field

15:00 The "Modern Data Stack" and its evolution

17:25 Emerging trends: AI integration and Iceberg technology

27:40 DuckDB and the emergence of portable, cost-effective data stacks

32:14 The rise and impact of dbt in data engineering

34:08 Alternatives to dbt: SQLMesh and others

35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions

37:20 Audience questions: Career focus in data roles and AI engineering overlaps

39:00

The role of semantics in data and AI workflows

41:11 Focusing on learning concepts over tools when entering the field

45:15 Transitioning from backend to data engineering: challenges and opportunities

47:48 Current state of the data engineering job market in Europe and beyond

49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats

50:40 Suitability of these formats for batch and streaming workloads

52:29 Tools for streaming: Kafka, SQS, and related trends

58:07 Building AI agents and enabling intelligent data applications

59:09Closing discussion on the place of tools like DBT in the ecosystem

🔗 CONNECT WITH ADRIAN BRUDARU

Linkedin - / data-team Website - https://adrian.brudaru.com/ 🔗 CONNECT WITH DataTalksClub

Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn - /datatalks-club Twitter - /datatalksclub Website - https://datatalks.club/

Mar 07, 202556:59

Competitive Machine Leaning And Teaching – Alexander Guschin

In this podcast episode, we talked with Alexander Guschin about launching a career off Kaggle.

About the Speaker:

Alexander Guschin is a Machine Learning Engineer with 10+ years of experience, a Kaggle Grandmaster ranked 5th globally, and a teacher to 100K+ students. He leads DS and SE teams and contributes to open-source ML tools.

0:00 Starting with Machine Learning: Challenges and Early Steps

13:05 Community and Learning Through Kaggle Sessions

17:10 Broadening Skills Through Kaggle Participation

18:54 Early Competitions and Lessons Learned

21:10 Transitioning to Simpler Solutions Over Time

23:51 Benefits of Kaggle for Starting a Career in Machine Learning

29:08 Teamwork vs. Solo Participation in Competitions

31:14 Schoolchildren in AI Competitions

42:33 Transition to Industry and MLOps

50:13 Encouraging teamwork in student projects

50:48 Designing competitive machine learning tasks

52:22 Leaderboard types for tracking performance

53:44 Managing small-scale university classes

54:17 Experience with Coursera and online teaching

59:40 Convincing managers about Kaggle's value

61:38 Secrets of Kaggle competition success

63:11 Generative AI's impact on competitive ML

65:13 Evolution of automated ML solutions

66:22 Reflecting on competitive data science experience

🔗 CONNECT WITH ALEXANDER GUSCHINLinkedin - https://www.linkedin.com/in/1aguschin/Website - https://www.aguschin.com/

🔗 CONNECT WITH DataTalksClub

Join DataTalks.Club:⁠⁠⁠⁠https://datatalks.club/slack.html⁠⁠⁠⁠

Our events:⁠⁠⁠⁠https://datatalks.club/events.html⁠⁠⁠⁠

Datalike Substack -⁠⁠⁠⁠https://datalike.substack.com/⁠⁠⁠⁠

LinkedIn:⁠⁠⁠⁠ / datatalks-club ⁠

Feb 14, 202553:27

Redefining AI Infrastructure: Open-Source, Chips, and the Future Beyond Kubernetes – Andrey Cheptsov

In this podcast episode, we talked with Andrey Cheptsov about The future of AI infrastructure.

About the Speaker:

Andrey Cheptsov is the founder and CEO of dstack, an open-source alternative to Kubernetes and Slurm, built to simplify the orchestration of AI infrastructure. Before dstack, Andrey worked at JetBrains for over a decade helping different teams make the best developer tools.

During the event, the guest, Andrey Cheptsov, founder and CEO of dstack, discussed the complexities of AI infrastructure. We explore topics like the challenges of using Kubernetes for AI workloads, the need to rethink container orchestration, and the future of hybrid and cloud-only infrastructures. Andrey also shares insights into the role of on-premise and bare-metal solutions, edge computing, and federated learning.

00:00 Andrey's Career Journey: From JetBrains to DStack

5:00 The Motivation Behind DStack

7:00 Challenges in Machine Learning Infrastructure

10:00 Transitioning from Cloud to On-Prem Solutions

14:30 Reflections on OpenAI's Evolution

17:30 Open Source vs Proprietary Models: A Balanced Perspective

21:01 Monolithic vs. Decentralized AI businesses

22:05 The role of privacy and control in AI for industries like banking and healthcare

30:00 Challenges in training large AI models: GPUs and distributed systems

37:03 DeepSpeed's efficient training approach vs. brute force methods

39:00 Challenges for small and medium businesses: hosting and fine-tuning models

47:01 Managing Kubernetes challenges for AI teams

52:00 Hybrid vs. cloud-only infrastructure

56:03 On-premise vs. bare-metal solutions

58:05 Exploring edge computing and its challenges

🔗 CONNECT WITH ANDREY CHEPTSOV

Twitter - / andrey_cheptsov

Linkedin - / andrey-cheptsov

GitHub - https://github.com/dstackai/dstack/

Website - https://dstack.ai/

🔗 CONNECT WITH DataTalksClub

Join DataTalks.Club:⁠⁠⁠https://datatalks.club/slack.html⁠⁠⁠

Our events:⁠⁠⁠https://datatalks.club/events.html⁠⁠⁠

Datalike Substack -⁠⁠⁠https://datalike.substack.com/⁠⁠⁠

LinkedIn:⁠⁠⁠ / datatalks-club ⁠

Jan 31, 202556:55

Linguistics and Fairness - Tamara Atanasoska

In this podcast episode, we talked with Tamara Atanasoska about building fair AI systems.

About the Speaker:Tamara works on ML explainability, interpretability and fairness as Open Source Software Engineer at probable. She is a maintainer of fairlearn, contributor to scikit-learn and skops. Tamara has both computer science/ software engineering and a computational linguistics(NLP) background.During the event, the guest discussed their career journey from software engineering to open-source contributions, focusing on explainability in AI through Scikit-learn and Fairlearn. They explored fairness in AI, including challenges in credit loans, hiring, and decision-making, and emphasized the importance of tools, human judgment, and collaboration. The guest also shared their involvement with PyLadies and encouraged contributions to Fairlearn.

00:00 Introduction to the event and the community

01:51 Topic introduction: Linguistic fairness and socio-technical perspectives in AI

02:37 Guest introduction: Tamara’s background and career

03:18 Tamara’s career journey: Software engineering, music tech, and computational linguistics

09:53 Tamara’s background in language and computer science

14:52 Exploring fairness in AI and its impact on society

21:20 Fairness in AI models26:21 Automating fairness analysis in models

32:32 Balancing technical and domain expertise in decision-making

37:13 The role of humans in the loop for fairness

40:02 Joining Probable and working on open-source projects

46:20 Scopes library and its integration with Hugging Face

50:48 PyLadies and community involvement

55:41 The ethos of Scikit-learn and Fairlearn

🔗 CONNECT WITH TAMARA ATANASOSKA

Linkedin - https://www.linkedin.com/in/tamaraatanasoska

GitHub- https://github.com/TamaraAtanasoska

🔗 CONNECT WITH DataTalksClub

Join DataTalks.Club:⁠⁠https://datatalks.club/slack.html⁠⁠

Our events:⁠⁠https://datatalks.club/events.html⁠⁠

Datalike Substack -⁠⁠https://datalike.substack.com/⁠⁠

LinkedIn:⁠⁠ / datatalks-club

Jan 17, 202553:12

Career choices, transitions and promotions in and out of tech - Agita Jaunzeme

In this podcast episode, we talked with Agita Jaunzeme about Career choices, transitions and promotions in and out of tech.

About the Speaker:

Agita has designed a career spanning DevOps/DataOps engineering, management, community building, education, and facilitation. She has worked on projects across corporate, startup, open source, and non-governmental sectors. Following her passion, she founded an NGO focusing on the inclusion of expats and locals in Porto. Embodying the values of innovation, automation, and continuous learning, Agita provides practical insights on promotions, career pivots, and aligning work with passion and purpose.

During this event, discussed their career journey, starting with their transition from art school to programming and later into DevOps, eventually taking on leadership roles. They explored the challenges of burnout and the importance of volunteering, founding an NGO to support inclusion, gender equality, and sustainability. The conversation also covered key topics like mentorship, the differences between data engineering and data science, and the dynamics of managing volunteers versus employees. Additionally, the guest shared insights on community management, developer relations, and the importance of product vision and team collaboration. 0:00 Introduction and Welcome 1:28 Guest Introduction: Agita’s Background and Career Highlights 3:05 Transition to Tech: From Art School to Programming 5:40 Exploring DevOps and Growing into Leadership Roles 7:24 Burnout, Volunteering, and Founding an NGO 11:00 Volunteering and Mentorship Initiatives 14:00 Discovering Programming Skills and Early Career Challenges 15:50 Automating Work Processes and Earning a Promotion 19:00 Transitioning from DevOps to Volunteering and Project Management 24:00 Managing Volunteers vs. Employees and Building Organizational Skills 31:07 Personality traits in engineering vs. data roles 33:14 Differences in focus between data engineers and data scientists 36:24 Transitioning from volunteering to corporate work 37:38 The role and responsibilities of a community manager 39:06 Community management vs. developer relations activities 41:01 Product vision and team collaboration 43:35 Starting an NGO and legal processes 46:13 NGO goals: inclusion, gender equality, and sustainability 49:02 Community meetups and activities 51:57 Living off-grid in a forest and sustainability 55:02 Unemployment party and brainstorming session 59:03 Unemployment party: the process and structure

🔗 CONNECT WITH AGITA JAUNZEME Linkedin - /agita

🔗 CONNECT WITH DataTalksClub Join DataTalks.Club: ⁠https://datatalks.club/slack.html⁠ Our events: ⁠https://datatalks.club/events.html⁠ Datalike Substack - ⁠https://datalike.substack.com/⁠ LinkedIn: ⁠ / datatalks-club

Jan 10, 202555:21

Career advice, learning, and featuring women in ML and AI - Isabella Bicalho

In this podcast episode, we talked with Isabella Bicalho about Career advice, learning, and featuring women in ML and AI.

About the Speaker:

Isabella is a Machine Learning Engineer and Data Scientist with three years of hands-on AI development experience. She draws upon her early computational research expertise to develop ML solutions. While contributing to open-source projects, she runs a newsletter dedicated to showcasing women's accomplishments in data science.

During this event, the guest discussed her transition into machine learning, her freelance work in AI, and the growing AI scene in France. She shared insights on freelancing versus full-time work, the value of open-source contributions, and developing both technical and soft skills. The conversation also covered career advice, mentorship, and her Substack series on women in data science, emphasizing leadership, motivation, and career opportunities in tech. 0:00 Introduction 1:23 Background of Isabella Bicalho 2:02 Transition to machine learning 4:03 Study and work experience 5:00 Living in France and language learning 6:03 Internship experience 8:45 Focus areas of Inria 9:37 AI development in France 10:37 Current freelance work 11:03 Freelancing in machine learning 13:31 Moving from research to freelancing 14:03 Freelance vs. full-time data science 17:00 Finding first freelance client 18:00 Involvement in open-source projects 20:17 Passion for open-source and teamwork 23:52 Starting new projects 25:03 Community project experience 26:02 Teaching and learning 29:04 Contributing to open-source projects 32:05 Open-source tools vs. projects 33:32 Importance of community-driven projects 34:03 Learning resources 36:07 Green space segmentation project 39:02 Developing technical and soft skills 40:31 Gaining insights from industry experts 41:15 Understanding data science roles 41:31 Project challenges and team dynamics 42:05 Turnover in open-source projects 43:05 Managing expectations in open-source work 44:50 Mentorship in projects 46:17 Role of AI tools in learning 47:59 Overcoming learning challenges 48:52 Discussion on substack 49:01 Interview series on women in data 50:15 Insights from women in data science 51:20 Impactful stories from substack 53:01 Leadership challenges in projects 54:19 Career advice and opportunities 56:07 Motivating others to step out of comfort zone 57:06 Contacting for substack story sharing 58:00 Closing remarks and connections

🔗 CONNECT WITH ISABELLA BICALHO Github: github https://github.com/bellabf LinkedIn: / isabella-frazeto

🔗 CONNECT WITH DataTalksClub Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html Datalike Substack - https://datalike.substack.com/ LinkedIn: / datatalks-club

Dec 13, 202454:41

AI in Industry: Trust, Return on Investment and Future - Maria Sukhareva

Reflection on an Almost Two-Year Journey of Generative AI in Industry – Maria Sukhareva

About the speaker:

Maria Sukhareva is a principal key expert in Artificial Intelligence in Siemens with over 15 years of experience at the forefront of generative AI technologies. Known for her keen eye for technological innovation, Maria excels at transforming cutting-edge AI research into practical, value-driven tools that address real-world needs. Her approach is both hands-on and results-focused, with a commitment to creating scalable, long-term solutions that improve communication, streamline complex processes, and empower smarter decision-making. Maria's work reflects a balanced vision, where the power of innovation is met with ethical responsibility, ensuring that her AI projects deliver impactful and production-ready outcomes.

We talked about:

00:00 DataTalks.Club intro

02:13 Career journey: From linguistics to AI

08:02 The Evolution of AI Expertise and its Future

13:10 AI vulnerabilities: Bypassing bot restrictions

17:00 Non-LLM classifiers as a more robust solution

22:56 Risks of chatbot deployment: Reputational and financial

27:13 The role of AI as a tool, not a replacement for human workers

31:41 The role of human translators in the age of AI

34:49 Evolution of English and its Germanic roots

38:44 Beowulf and Old English

39:43 Impact of the Norman occupation on English grammar

42:34 Identifying mushrooms with AI apps and safety precautions

45:08 Decoding ancient languages like Sumerian

49:43 The evolution of machine translation and multilingual models

53:01 Challenges with low-resource languages and inconsistent orthography

57:28 Transition from academia to industry in AI

Join our Slack: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Dec 06, 202452:59

Large Hadron Collider and Mentorship – Anastasia Karavdina

We talked about:

00:00 DataTalks.Club intro

00:00 Large Hadron Collider and Mentorship

02:35 Career overview and transition from physics to data science

07:02 Working at the Large Hadron Collider

09:19 How particles collide and the role of detectors

11:03 Data analysis challenges in particle physics and data science similarities

13:32 Team structure at the Large Hadron Collider

20:05 Explaining the connection between particle physics and data science

23:21 Software engineering practices in particle physics

26:11 Challenges during interviews for data science roles

29:30 Mentoring and offering advice to job seekers

40:03 The STAR method and its value in interviews

50:32 Paid vs unpaid mentorship and finding the right fit

About the speaker:

Anastasia is a particle physicist turned data scientist, with experience in large-scale experiments like those at the Large Hadron Collider. She also worked at Blue Yonder, scaling AI-driven solutions for global supply chain giants, and at Kaufland e-commerce, focusing on NLP and search. Anastasia is a mentor for Ml/AI, dedicated to helping her mentees achieve their goals. She is passionate about growing the next generation of data science elite in Germany: from Data Analysts up to ML Engineers.

Join our Slack: https://datatalks .club/slack.html

Nov 22, 202454:14

MLOps as a Team - Raphaël Hoogvliets

We talked about:

00:00 DataTalks.Club intro

02:34 Career journey and transition into MLOps

08:41 Dutch agriculture and its challenges

10:36 The concept of "technical debt" in MLOps

13:37 Trade-offs in MLOps: moving fast vs. doing things right

14:05 Building teams and the role of coordination in MLOps

16:58 Key roles in an MLOps team: evangelists and tech translators

23:01 Role of the MLOps team in an organization

25:19 How MLOps teams assist product teams

27 :56 Standardizing practices in MLOps

32:46 Getting feedback and creating buy-in from data scientists

36:55 The importance of addressing pain points in MLOps

39:06 Best practices and tools for standardizing MLOps processes

42:31 Value of data versioning and reproducibility

44:22 When to start thinking about data versioning

45:10 Importance of data science experience for MLOps

46:06 Skill mix needed in MLOps teams

47:33 Building a diverse MLOps team

48:18 Best practices for implementing MLOps in new teams

49:52 Starting with CI/CD in MLOps

51:21 Key components for a complete MLOps setup

53:08 Role of package registries in MLOps

54:12 Using Docker vs. packages in MLOps

57:56 Examples of MLOps success and failure stories

1:00:54 What MLOps is in simple terms

1:01:58 The complexity of achieving easy deployment, monitoring, and maintenance

Join our Slack: https://datatalks .club/slack.html

Nov 08, 202455:36

Using Data to Create Liveable Cities - Rachel Lim

We talked about:

00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science

About the speaker:

Rachel is an urban data scientist dedicated to creating liveable cities through the innovative use of data. With a background in geography, and a masters in urban data science, she blends qualitative and quantitative analysis to tackle urban challenges. Her aim is to integrate data driven techniques with urban design to foster sustainable and equitable urban environments.

Links: - https://datamall.lta.gov.sg/content/datamall/en/dynamic-data.html 00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science Join our slack: https: //datatalks.club/slack.html

Nov 01, 202445:36

DataTalks.Club 4th Anniversary AMA Podcast – Alexey Grigorev and Johanna Bayer

We talked about:

00:00 DataTalks.Club intro

00:00 DataTalks.Club anniversary "Ask Me Anything" event with Alexey Grigorev

02:29 The founding of DataTalks .Club

03:52 Alexey's transition from Java work to DataTalks.Club

04:58 Growth and success of DataTalks.Club courses

12:04 Motivation behind creating a free-to-learn community

24:03 Staying updated in data science through pet projects

26 :37 Hosting a second podcast and maintaining programming skills

28:56 Skepticism about LLMs and their relevance

31:53 Transitioning to DataTalks.Club and personal reflections

33:32 Memorable moments and the first event's success

36:19 Community building during the pandemic

38:31 AI's impact on data analysts and future roles

42:24 Discussion on AI in healthcare

44:37 Age and reflections on personal milestones

47:54 Building communities and personal connections

49:34 Future goals for the community and courses

51:18 Community involvement and engagement strategies

53:46 Ideas for competitions and hackathons

54:20 Inviting guests to the podcast

55:29 Course updates and future workshops

56:27 Podcast preparation and research process

58:30 Career opportunities in data science and transitioning fields

1:01 :10 Book recommendations and personal reading experiences

About the speaker:

Alexey Grigorev is the founder of DataTalks.Club.

Join our slack: https://datatalks.club/slack.html

Oct 26, 202453:41

Human-Centered AI for Disordered Speech Recognition - Katarzyna Foremniak

We talked about:

00:00 DataTalks.Club intro

08:06 Background and career journey of Katarzyna

09:06 Transition from linguistics to computational linguistics

11:38 Merging linguistics and computer science

15:25 Understanding phonetics and morpho-syntax

17:28 Exploring morpho-syntax and its relation to grammar

20:33 Connection between phonetics and speech disorders

24:41 Improvement of voice recognition systems

27:31 Overview of speech recognition technology

30:24 Challenges of ASR systems with atypical speech

30:53 Strategies for improving recognition of disordered speech

37:07 Data augmentation for training models

40:17 Transfer learning in speech recognition

42:18 Challenges of collecting data for various speech disorders

44:31 Stammering and its connection to fluency issues

45:16 Polish consonant combinations and pronunciation challenges

46:17 Use of Amazon Transcribe for generating podcast transcripts

47:28 Role of language models in speech recognition

49:19 Contextual understanding in speech recognition

51:27 How voice recognition systems analyze utterances

54:05 Personalization of ASR models for individuals

56:25 Language disorders and their impact on communication

58:00 Applications of speech recognition technology

1:00:34 Challenges of personalized and universal models

1:01:23 Voice recognition in automotive applications

1:03:27 Humorous voice recognition failures in cars

1:04:13 Closing remarks and reflections on the discussion

About the speaker:

Katarzyna is a computational linguist with over 10 years of experience in NLP and speech recognition. She has developed language models for automotive brands like Audi and Porsche and specializes in phonetics, morpho-syntax, and sentiment analysis.

Kasia also teaches at the University of Warsaw and is passionate about human-centered AI and multilingual NLP.

Join our slack: https://datatalks.club/slack.html

Oct 10, 202448:01