Skip to main content
The Data Pulse

The Data Pulse

By Anika Gupta
Dive into the growing role that data science plays in the latest biomedical innovations. I’m your host, Anika Gupta, a PhD student in Bioinformatics at Harvard and the Broad Institute. Join me for ~30 minutes each week as I go behind the scenes and check the pulse with domain experts and rising stars who are leading advances in data-driven human health. For a glossary of terms and resources my guests recommend, check out: bit.ly/datapulse-glossary
Listen on
Where to listen
Apple Podcasts Logo

Apple Podcasts

Breaker Logo

Breaker

Google Podcasts Logo

Google Podcasts

Overcast Logo

Overcast

RadioPublic Logo

RadioPublic

Spotify Logo

Spotify

Season 1 Wrap-up
Thank you for joining for Season 1 of the Data Pulse podcast. I hope you found the conversations over the past 6 months to be both enlightening and enjoyable. With chaos dominating the world this year, the podcast for me has been a grounding force, tapping into the power that lies in using data science to effectively tackle some of the grandest challenges in biomedicine. I have thoroughly enjoyed diving into the minds of peers and mentors who I deeply admire, building a resource for a shared vocabulary—through both the conversations and the podcast glossary, and growing a community of folks passionate about improving medicine through the lens of data. If you have a moment, check out the podcast website (linktr.ee/thedatapulse) to provide feedback on Season 1—I’d like to make it as relevant and useful for *you*! The link will also include an application to join the Data Pulse team as publicity head, audio lead, or script lead. Definitely apply if you are interested! Finally, if you enjoyed this podcast, I’d deeply appreciate a rating on whichever platform you listen. From the bottom of my heart, thank you for being a part of the Data Pulse community!
01:23
December 23, 2020
Design for inference and massively parallel single cell -omics with Aviv Regev (Broad Institute -> Genentech)
From single cells to international consortia and from striving despite fear to creating a "vector field" to inspire teams working in sync, Dr. Aviv Regev shares countless insights into how she has merged the worlds of computation and biomedicine, first at the Broad Institute and now at Genentech.
48:46
December 23, 2020
Diagnosing the undiagnosed and combining international health records with Isaac Kohane
Rare diseases present a unique challenge in both diagnosis and treatment, given the small number of cases, often leaving them undiagnosed. I talk with Professor Isaac Kohane of Harvard Medical School about the Undiagnosed Diseases Network's efforts to catalogue and diagnose rare diseases, focusing on the role that data science--specifically, merging genotype and phenotype information, can play in bringing about hope to the families of affected individuals. We also discuss efforts to aggregate EHR data from hospitals across the world in order to better predict COVID-related symptoms and outcomes, faster than most healthcare systems have been able to. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
31:02
December 22, 2020
Digital diagnostics + therapies for autism with Dennis Wall (Stanford)
Autism Spectrum Disorder remains a pressing yet elusive spectrum of conditions. In my conversation with Dennis Wall of Stanford University, we discuss the promise of technology- and augmented reality-based systems in both diagnosis and behavioral treatment for affected individuals. He shares the importance of understanding the context in which data is collected, as well as the ability for simple algorithms to yield actionable insights in the clinic. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
28:26
December 15, 2020
Single cell-based drug discovery with Greg Ryslik (Celsius Therapeutics)
In my conversation with Greg Ryslik, previously Chief Data Officer and now Special Adviser to Celsius Therapeutics, we talk about single cell sequencing technologies and the nuance they enable with respect to identifying and targeting the cell populations that are responsible for driving diseases, as well as the machine learning frameworks he employs when approaching problems with large-scale datasets. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
23:25
December 8, 2020
Clinical trials and longitudinal studies with Manisha Desai (Stanford University)
Working with humans poses significant challenges to acquiring robust and complete data, but also remarkable opportunity, as I learn in today's episode with Professor Manish Desai of Stanford University. We discuss inferring causality from longitudinal data, clinical trial and observational study considerations, and the intersection of statistics and medicine at large. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
32:12
December 1, 2020
Biomedical data integration with Zainab Doctor (nference)
Natural language processing has long yielded exciting predictions from word-based knowledge. Through my conversation with Zainab Doctor, Translational Science Head of nference, I chat about how we can now use text as a lens into the biomedical world, as well as the ability to integrate diverse data types to synthesize knowledge of all scales and yield clinically actionable insights. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
28:16
November 24, 2020
Biologics and active learning in ML with Peyton Greenside (BigHat Biosciences)
Can we create new biological therapies with machine-guided design? Today I chat with Peyton Greenside, Co-Founder and CSO of BigHat Biosciences, on using machine learning to design therapeutic proteins, the advantages of using "smart" data over "big" data, and the importance of interpretability. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
25:34
November 17, 2020
Protective mutations with Vyas Ramanan (Third Rock Ventures, Maze Therapeutics)
Today I discuss with Vyas Ramanan, of Third Rock Ventures and Maze Therapeutics, the evolution of the field's understanding of genetic modifiers and the role they play in disease, how drug discovery works when attempting to recapitulate protective effects of certain mutations, and how the convergence of certain advances in statistics and genomic sequencing and perturbation tools enable pressure testing of hypotheses at an unprecedented scale. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
28:17
November 10, 2020
Systematizing drug development with Ankit Gupta (Reverie Labs)
How should one think about building "the new wave" of biopharma teams? Turns out we can rely on principles from existing domains. In this episode, I talk with Ankit Gupta, CTO and Co-Founder of Reverie Labs, on systematization as a foundation, and on optimizing properties of a drug using machine learning and software-oriented frameworks. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
31:31
November 3, 2020
Prescription digital therapeutics with Corey McCann (Pear Therapeutics)
Prescription digital therapeutics are emerging as an entirely new therapeutic modality. Corey McCann, President and CEO of Pear Therapeutics, chats with me about what they are, how they evolve as more data are collected, and the subsequent changes that are made to the treatment paradigms of neurological and psychological disorders. We also discuss what it took to get FDA approval for "software as a therapy," what skills are essential to succeeding in the space, and the growing presence of remote and asynchronous care. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
26:49
October 27, 2020
Using social media to infer health with computational epidemiologist Elaine Nsoesie (Boston University)
What does an individual's digital presence reveal about imminent infectious disease outbreaks, obesity prevalence, and the spread of medical misinformation? How can Google Maps images reveal socioeconomic factors that contribute to disparities in health? When and how does community context matter? Elaine Nsoesie, Professor at Boston University School of Public Health, shares how she has led data-driven efforts to address these questions in the context of various diseases and communities around the world. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
28:45
October 20, 2020
Molecular diagnostics and classifiers with Ava Soleimany (Harvard/MIT)
Molecular diagnostics are emerging as a precise way to detect diseases early on in prognosis. In today's episode, I chat with Harvard+MIT PhD Student Ava Soleimany on the role of data science in activity-based molecular diagnostics for early cancer detection and how the confluence of classification techniques with feature representation of cancer biomarkers can enable non-invasive, early detection. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
29:12
October 13, 2020
Linking the environment and public health through data with Francesca Dominici (Harvard)
Climate change has been noted as perhaps the greatest public health crisis of our times--Professor Francesca Dominici of Harvard University agrees. We discuss her data-driven efforts to demonstrate the negative health impacts from air pollutants, particularly for resource-poor communities, and how the current pandemic has further elucidated these disparities. She shares the steps necessary to move from academic science to changing policy, as well as her work in bridging the gender representation gap in STEM. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
30:55
October 6, 2020
Partially automated pathology diagnoses with Andy Beck (PathAI)
Deep learning has shown tremendous advances in making predictions from imaging data. Today I talk with Andy Beck, Co-Founder and CEO of PathAI, about deep learning specifically applied to pathology to both diagnose and lead to treatment options for patients, as well as the global implications that deploying such a technology could have. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
27:58
September 29, 2020
Precision medicine with Gaurav Singal (Foundation Medicine)
In today's conversation, Gaurav Singal, who is the outgoing Chief Data Officer of Foundation Medicine, discusses the advent of molecularly-driven oncology in changing the paradigm of cancer diagnosis and treatment through precision medicine, the importance of using observational data, and the unpredictable nature of one's career that may yield a convergence of what were once disparate fields. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
27:17
September 22, 2020
Machine-guided gene therapy design with Dyno therapeutics
Gene therapies have emerged as a promising new modality for curing genetically-defined diseases; however, the naturally occuring variation remains limited. I chat with Sam Sinai and Jeff Gerold, Co-Founder/Lead ML Scientist and Head of Data Science, respectively, of Dyno Therapeutics about the role machine learning can play in better identifying and designing gene therapy vectors for a suite of traditional bottlenecks in the gene therapy development workflow, as well as the importance of building an environment where cross-talk between different domains is seamless. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
23:51
September 15, 2020
Radiology diagnostics & antibiotic development with Kyle Swanson (Marshall Scholar @ Cambridge University)
Deep learning can be applied to tasks that involve a breadth of data types. I talk with Kyle Swanson, currently a Marshall Scholar at Cambridge University and previously at MIT, about his projects with Regina Barzilay on predicting breast cancer from mammograms, with performance on par with radiologists, and on designing antibiotics in a high-throughput manner using a new class of neural networks, identifying a compound that traditional chemists may have overlooked. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
28:28
September 8, 2020
Bias in machine learning for healthcare with Marzyeh Ghassemi (University of Toronto)
Humans tend towards bias, but are our algorithms objective? Today I discuss fairness and bias in machine learning for healthcare with Professor Maryzeh Ghassemi of the University of Toronto. We delve into the ways in which bias pops up in the data that are used to train computational models, the particular dangers of systemic inequalities in healthcare being perpetuated by algorithms, and some of the steps needed to combat these deeply rooted issues. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
28:45
September 1, 2020
Deep clinical annotation & hybrid startups with Vineeta Agarwala (Andreessen Horowitz + Stanford)
Today I speak with Vineeta Agarwala, General Partner at Andreessen Horowitz and a physician at Stanford. She shares the importance of capturing time-course, evolving data on patients that range from the molecular to the clinical levels ("deep clinical annotation"), as well as key ingredients of a successful tech-biotech hybrid startup. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
28:26
August 25, 2020
Phenomics + computer vision for drug discovery with Imran Haque (Recursion Pharma)
Conventional drug discovery ignores the spatial relationships between cells when assessing the effects of drugs. I talk with Imran Haque, VP of Data Science at Recursion Pharmaceuticals, about their data-first approach that combines "cell painting" with deep learning to assess changes in cellular morphology, thus providing a crucial lens into the molecular wirings of a given drug's mechanism of action. We discuss his team's work in repurposing therapies to tackle COVID-19 and what it takes to successfully merge deep learning with cellular images in a way that yields therapies. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
27:37
August 18, 2020
Representative genomics and healthcare with Carlos Bustamante (Stanford, F-Prime Venture Partner)
Genomics has been a largely homogeneous field--both with respect to the researchers and the individuals whose data has been collected. Carlos Bustamante, Professor at Stanford University and F-Prime Venture Partner, whose work studies populations of diverse ancestry, claims that COVID has brought these disparities to light and that we have a unique opportunity to sequence the next generation of babies as one step towards more equitable precision medicine and drug development. He also discusses how the patient, as the healthcare consumer, will soon govern what solutions are adopted, including digital medicine and continuous tracking. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
28:50
August 11, 2020
Deep learning-based imaging diagnostics with Lily Peng (Google Health)
In 2016, a team at Google Health led by Product Manager Lily Peng accurately predicted diabetic retinopathy in patients solely from images of their eyes. Today I chat with Lily about this work in diagnosing diabetic retinopathy by tapping into advances in deep learning, what factors determine whether a medical problem is well-suited to machine learning tasks, and the iterative process to achieve model performance in the context of a clinical environment. We also discuss tactics to effectively deploy such technologies in resource-poor regions in a sustainable way. Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
26:32
August 4, 2020
Building a bilingual culture for data-first drug discovery with Daphne Koller (insitro)
What does a hybrid team working at the interface of machine learning and biomedicine look like? In this episode, I chat with Daphne Koller, Founder and CEO of insitro, about a data-first approach to drug discovery, building the systems that enable large-scale learning, and the importance of a bilingual culture in "digital biology". Check out the glossary of terms, definitions, and resources (and get a sneak peak of the future conversations lined up!) here: bit.ly/datapulse-glossary
31:57
July 28, 2020
Introducing The Data Pulse, featuring Anthony Philippakis (GV + Broad)
Welcome to The Data Pulse, a podcast that explores the growing role that data science and computation play in the latest biotechnology and biomedical innovations. Join me as I speak with pioneers in the space, to learn more about how they use data-driven tactics to advance biology and medicine, to understand the principles that guide their work, and to better equip yourself with the language and thinking essential to effectively contributing to this space. I’m your host, Anika Gupta, a PhD student in Bioinformatics at Harvard and the Broad Institute. To introduce the podcast, I'm joined by Anthony Philippakis, Chief Data Officer of the Broad Institute and a Venture Partner at GV (formerly Google Ventures). We discuss his unique journey and roles, why this intersection is promising, and what the key ingredients for innovation in this space are. To bring you each episode, I’ve been fortunate to tap into the wisdom of many of my mentors and peers who are trailblazing at this convergence. In addition to my conversations, I’ve created a glossary of terms, definitions, and links to references (bit.ly/datapulse-glossary) that my guests mention.
15:19
July 26, 2020