Skip to main content
Diaries of Social Data Research

Diaries of Social Data Research

By Katherine A. Keith, Naitian Zhou, & Lucy Li

Large-scale data has become a major component of research about human behavior and society. But how are interdisciplinary collaborations that use large-scale social data formed and maintained? What obstacles are encountered on the journey from idea conception to publication? In this podcast, we investigate these questions by probing the “research diaries” of scholars in computational social science and adjacent fields. We unmask the research process with the hope of normalizing the challenges of and increasing accessibility in academia.
Music: Jon Gillick.
Available on
Apple Podcasts Logo
Google Podcasts Logo
Pocket Casts Logo
RadioPublic Logo
Spotify Logo
Currently playing episode

19. Constructing a Taxonomy of Implicit Hate Speech Grounded in Social Theory with Diyi Yang and David Muchlinski

Diaries of Social Data ResearchJul 09, 2022

00:00
56:03
20. Navigating the Shores of Computational Text Analysis Validity with Christian Baden, Christian Pipa, and Mariken van der Velden

20. Navigating the Shores of Computational Text Analysis Validity with Christian Baden, Christian Pipa, and Mariken van der Velden

In this episode, we speak to Christian Baden, Christian Pipal, and Mariken van der Velden about their 2022 journal paper in Communications Methods and Measures, titled, “Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda”. They co-authored this paper with Martijn Schoonvelde, and the authors span several disciplines, from communication to political science.

We discuss the challenges and joys of writing for a cross-disciplinary audience, how their frustrations with the validity of computational methods are shared across fields with different methodological conventions, and how this paper laid the groundwork for a larger project on European political text analysis.

Oct 30, 202357:11
19. Constructing a Taxonomy of Implicit Hate Speech Grounded in Social Theory with Diyi Yang and David Muchlinski

19. Constructing a Taxonomy of Implicit Hate Speech Grounded in Social Theory with Diyi Yang and David Muchlinski

Our guests on this episode are Diyi Yang, assistant professor at the School of Interactive Computing, and David Muchlinski, assistant professor in the Sam Nunn School of International Affairs, both at Georgia Tech. We discuss their EMNLP 2021 paper, "Latent Hatred: A Benchmark for Understanding Implicit Hate Speech." This paper is co-authored with Mai ElSherief, Caleb Ziems, Vaishnavi Anupindi, Jordyn Seybolt, and Munmun De Choudhury.

Diyi and David reveal that the annotation process behind this paper took two years and incorporated domain expertise on the broader context around hateful language. That is, an understanding of the social groups who produce this language allowed for better categorization and interpretation of implicit hate. We also discuss the cross-discipline connections they’ve forged in the past and present, and the ongoing challenges this type of work poses for computational methods.

Jul 09, 202256:03
18. Gender Patterns in English-Language Fiction and Interrogating Data with Ted Underwood and David Bamman

18. Gender Patterns in English-Language Fiction and Interrogating Data with Ted Underwood and David Bamman

This episode features Ted Underwood, a professor in the School of Information Sciences and Department of English at the University of Illinois Urbana-Champaign, and David Bamman, an associate professor at UC Berkeley’s School of Information. We discuss their 2018 Cultural Analytics paper co-authored with literary studies PhD student Sabrina Lee, titled “The Transformation of Gender in English-Language Fiction.”

We trace how Twitter brought Ted and David together as collaborators, and the email that sparked the beginnings of this project. They describe how this paper uses predictive modeling for an unconventional purpose, and various “means of interrogating data.” They also provide tips for establishing collaborative relationships, and advocate using substantive research questions to motivate learning technical skills.

May 09, 202253:50
17. Hashtag Network Analysis and Interwoven Research Ethics with Ryan Gallagher and Brooke Foucault Welles

17. Hashtag Network Analysis and Interwoven Research Ethics with Ryan Gallagher and Brooke Foucault Welles

Our guests in this episode are Ryan Gallagher, a PhD Candidate in Network Science at Northeastern University, and Brooke Foucault Welles, an Associate Professor in Communication Studies and the Network Science Institute at Northeastern University. We discuss their 2019 CSCW paper, "Reclaiming Stigmatized Narratives: The Networked Disclosure Landscape of #MeToo" with co-authors Elizabeth Stowell and Andrea G. Parker.

We talk about their substantive motivation for focusing on #metoo, the networked counter public, and hashtags' influence on social change. Ryan and Brooke also walk us through the advantages of pairing qualitative and quantitative work, weaving ethics throughout every stage of the research process, dealing with missing Tweets, and taking seriously both the "computational" and "social science" sides of CSS.

Apr 24, 202255:50
16. Measuring Uptake in Classroom Conversations and Using NLP to Support Teachers with Dora Demszky

16. Measuring Uptake in Classroom Conversations and Using NLP to Support Teachers with Dora Demszky

This episode features Dora Demszky, a PhD student in Linguistics at Stanford University. Dora works at the intersection of natural language processing and education. We discuss her ACL 2021 paper titled "Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions", co-authored with Jing Liu, Zid Mancenido, Julie Cohen, Heather Hill, Dan Jurafsky, and Tatsunori Hashimoto.

Dora's work is motivated by creating tools that are useful for educators, so her research is not only descriptive or predictive, but also applicable to classrooms. She talks about managing large interdisciplinary teams, approaching research with care, and working with actual teachers to annotate data.

Mar 20, 202250:27
15. Race in Computational Disinformation Analysis and Deep Reading with Deen Freelon

15. Race in Computational Disinformation Analysis and Deep Reading with Deen Freelon

Our guest in this episode is Deen Freelon, Associate Professor at the University of North Carolina in the School of Journalism and Media. We chat about his 2020 Social Science Computer Review Paper "Black Trolls Matter: Racial and Ideological Asymmetries in Social Media Disinformation" with co-authors Michael Bossetta, Chris Wells, Josephine Lukito, Yiping Xia, and Kirsten Adams.

Deen also talks about writing a "behind the scenes" book chapter about the process of making this paper, being one of the first movers in the discipline of computational methods for communication studies, and how he learns programming best when it is connected to the goals of his project. He emphasizes that many of his great research ideas come from reading deeply and recommends devoting at least half a day a week solely to reading.

Mar 06, 202251:34
14. The Past Decade of Computational Social Science Research with David Lazer

14. The Past Decade of Computational Social Science Research with David Lazer

In this episode, we talk with David Lazer, the University Distinguished Professor of Political Science and Computer Sciences at Northeastern University and the Co-Director of the NULab for Texts, Maps, and Networks. We discuss two seminal papers in computational social science he co-authored a decade apart: "Life in the network: the coming age of computational social science" (Science 2009) and  "Computational social science: Obstacles and opportunities" (Science 2020).

David shares with us events in his long and distinguished CSS research career. In the early 2000s, he helped gather a small group of people working on new "data streams" and how they intentionally created the term computational social science. He also talks about his own struggles on the academic job market, advice for aspiring CSS researchers, and a wish for better data availability structures.

Feb 20, 202252:60
13. Finding (Mis)alignments in Public Opinion and Wisdom in Collaboration Management with Kenneth Joseph and Sarah Shugars

13. Finding (Mis)alignments in Public Opinion and Wisdom in Collaboration Management with Kenneth Joseph and Sarah Shugars

Our guests on this episode are Kenneth Joseph, an assistant professor in Computer Science and Engineering at the University of Buffalo, and Sarah Shugars, a Faculty Fellow at New York University’s Center for Data Science. We discuss the process behind their EMNLP 2021 paper, “(Mis)alignment Between Stance Expressed in Social Media Data and Public Opinion Surveys,” co-authored with Ryan Gallagher, Jon Green, Alexi Quintana Mathé, Zijian An, and David Lazer.

Kenneth and Sarah offer tips around communication, collaboration, and project management, especially for papers written during a pandemic. Kenneth talks about “privileging ethics” when making decisions around data privacy and experimental replicability, and Sarah reflects on navigating differences in terminology use in interdisciplinary environments.

Feb 10, 202249:44
12. Understanding Conversational Patterns in Police Community Interactions with Vinodkumar Prabhakaran and Camilla Griffiths

12. Understanding Conversational Patterns in Police Community Interactions with Vinodkumar Prabhakaran and Camilla Griffiths

Our guests on this episode are Vinodkumar Prabhakaran, who was a computer science postdoc at Stanford and now a senior research scientist at Google, and Camilla Griffiths, who is a postdoc at Stanford SPARQ (Social Psychological Answers to Real-world Questions). With Hang Su, Prateek Verma, Nelson Morgan, Jennifer Eberhardt, and Dan Jurafsky, they are co-authors on a TACL 2018 paper, "Detecting Institutional Dialog Acts in Police Traffic Stops".

Vinod and Camilla share with us how this collaboration formed over a common goal and a deep respect for each other’s disciplines. We discuss the considerations that went into forming community partnerships, handling sensitive police body-camera data, and recognizing the implications of their findings.

Jan 18, 202249:08
11. The Effects of Friend-to-Friend Texting on Voter Turnout and Overcoming Project Setbacks with Aaron Schein

11. The Effects of Friend-to-Friend Texting on Voter Turnout and Overcoming Project Setbacks with Aaron Schein

This episode features Aaron Schein, a computer scientist and postdoctoral fellow at Columbia University. We discuss his WWW 2021 paper "Assessing the Effects of Friend-to-Friend Texting on Turnout in the 2018 US Midterm Elections", co-authored with Keyon Vafa, Dhanya Sridhar, Victor Veitch, Jeffery Quinn, James Moffet, David Blei, and Donald Green.

Aaron shares with us how he collaborated with industry partners, overcame the discovery of a confounder that challenged the experiment’s original design, and responded to public feedback. He also mapped his interdisciplinary journey through linguistics, political science, and computer science, and shared his twist on imposter syndrome.

Jan 01, 202256:54
10. Political Discourse and Substantive-Methodological Intersections with Justine Zhang and Arthur Spirling

10. Political Discourse and Substantive-Methodological Intersections with Justine Zhang and Arthur Spirling

In this episode, we talk with Justine Zhang and Arthur Spirling. Justine is currently a postdoctoral researcher at Stanford University and Arthur is a Professor of Politics and Data Science at New York University. We discuss their 2017 EMNLP paper, with Cristian Danescu-Niculescu-Mizil, "Asking too much? The rhetorical role of questions in political discourse."

Justine and Arthur touch on how collaborations can provide real insight into other disciplines as well as their different paces and writing norms. We also discuss substantive validation for unsupervised learning methods, marinating in "fun" data, the responsibility of studying political institutions that touch all aspects of human life, and a call for administrators to incentivize these kinds of collaborations.

Dec 10, 202143:17
9. Reddit Debates and Interdisciplinary Multilingualism with Emaad Manzoor

9. Reddit Debates and Interdisciplinary Multilingualism with Emaad Manzoor

Our guest on this episode is Emaad Manzoor, an Assistant Professor of Operations and Information Management at the University of Wisconsin Madison. Along with George H. Chen, Dokyun Lee, and Michael D. Smith, he wrote "Influence via Ethos: On the Persuasive Power of Reputation in Deliberation Online" which is currently under review at Management Science.

Emaad illuminates this project's long journey, from manually-labeling argumentation schemas, to using observational data from Reddit, to designing experiments. He talks with us about how Economics and NLP can learn from one another and the importance of "interdisciplinary multilingualism" in highlighting different aspects of one's work to different audiences. We also chat about the importance of personal drive in research projects and strategies for developing resilience to "emotional punches."

Nov 28, 202152:03
8. The Evolution of Computational Social Science from a Sociology Perspective with Chris Bail

8. The Evolution of Computational Social Science from a Sociology Perspective with Chris Bail

This unique episode centers on a "meta" discussion on interdisciplinary work involving large-scale social data. We interview Chris Bail, a Professor of Sociology and Public Policy at Duke University. Last year, Chris and co-authors Achim Edelman, Tom Wolff, and Danielle Montagne published an overview paper titled "Computational Social Science and Sociology" in the Annual Review of Sociology.

We discuss the challenges of defining this large research area, the benefits of making "lateral connections" with potential colleagues as a graduate student, and taking risks in pursuing new research directions. We also highlight the process behind the creation and growth of the Summer Institute in Computational Social Science, which Chris co-founded with Matt Salganik.

Sep 27, 202151:12
7. The Power of Birth Stories’ Narratives and Intellectual Generosity with Maria Antoniak and Karen Levy

7. The Power of Birth Stories’ Narratives and Intellectual Generosity with Maria Antoniak and Karen Levy

This episode features Maria Antoniak, a PhD student, and Karen Levy, an assistant professor, who are both in the Department of Information Science at Cornell. Maria, who has a background in computational linguistics, and Karen, who has a background in law and sociology, are co-authors, along with David Mimno, on the CSCW 2019 paper "Narrative Paths and Negotiation of Power in Birth Stories".

We discuss the formation of identity in online communities, approaches for protecting the privacy of users, the different submission and review processes in computing venues, and balancing new methodology and applications. Within an interdisciplinary department, Karen and Maria advocate for "learning to lift up each other’s work" and being "intellectually generous" across disciplines.

Sep 17, 202153:32
6. Extracting Events from Text and Grad School Memories with Brendan O'Connor and Brandon Stewart

6. Extracting Events from Text and Grad School Memories with Brendan O'Connor and Brandon Stewart

Our guests in this episode are Brendan O'Connor, Associate Professor of Computer Science at UMass Amherst, and Brandon Stewart, Assistant Professor of Sociology at Princeton University. We talk with them about their 2013 ACL paper (with co-author Noah Smith) “Learning to Extract International Relations from Political Context” which presents a probabilistic model for extracting events between countries and international organizations from news articles.

Brendan and Brandon also discuss how their collaboration grew from "saying nice things" about each other's work to 30-page written research memos sent back and forth. We also discuss the "ballooning and focusing" scope of research, clunky computer labs in the early 2000s, challenges in incentive structures for interdisciplinary collaborations, and data replicability standards.

Sep 07, 202101:10:34
5. Opioid Use Recovery on Social Media and Mentoring Undergrad Collaborators with Stevie Chancellor

5. Opioid Use Recovery on Social Media and Mentoring Undergrad Collaborators with Stevie Chancellor

In this episode, we talked to Stevie Chancellor, who is the lead author on a 2019 CHI paper titled "Discovering Alternative Treatments for Opioid Use Recovery in Social Media". Along with Stevie, who is a computer scientist, the team of authors included clinical psychologist and addiction researcher George Nitzburg, Stevie’s advisor Munmun De Choudhury, and two undergraduate students, Andrea Hu and Francisco Zampieri.

Stevie shared with us her strategies for successful student mentoring, working with page limits, and using milestones and reflection points in this project’s timeline to help it reach completion.

Jul 12, 202149:07
4. COVID-19 Mobility Networks and Post-Publication Scientific Communication with Serina Chang

4. COVID-19 Mobility Networks and Post-Publication Scientific Communication with Serina Chang

We discuss the paper "Mobility network models of COVID-19 explain inequities and inform reopening" with first author and Stanford computer science PhD student Serina Chang. This paper's team of interdisciplinary authors include other computer scientists (Emma Pierson, Pang Wei Koh, and Jure Leskovec), sociologists (Beth Redbird and David Grusky), and an epidemiologist (Jaline Gerardin).

Serina shared with us challenges in navigating post-publication scientific communication and translating scientific research into real-world policy tools, as well as the success of grounding research questions in supporting the needs of real people.

Jun 27, 202149:58
3. Digital Health Communication and Punk Rock Academics with Ethan Zuckerman

3. Digital Health Communication and Punk Rock Academics with Ethan Zuckerman

In this episode, we talk to Ethan Zuckerman, associate professor at the University of Massachusetts Amherst, where he teaches public policy, communication, and information. We discuss his paper "Digital Health Communication and Global Public Influence: A Study of the Ebola Epidemic" which was published in the Journal of Health Communication in 2017. His co-authors on this paper include technical and visualization experts (Hal Roberts and Sands Alden Fish II), a global public health expert (Brittany Seymour), and expert in education policy (Emily Robinson).

Ethan talks about creating Media Cloud--an open-source platform for media analysis that tracks millions of stories published online--over the course of two decades and the "fearsome process" of scaling it up. He also discussed with us being an unconventional "punk-rock" academic and advice to "scratch your deep itch" when it comes to choosing which research directions to pursue.

Link: https://www.tandfonline.com/doi/full/10.1080/10810730.2016.1209598

Jun 14, 202141:34
2. Analyzing Menstrual Cycle Data and Math Transcending Boundaries with Emma Pierson
Jun 11, 202144:11
1. Abolitionist Newspapers and Maintaining 8-Year Project Momentum with Lauren Klein and Sandeep Soni
Jun 11, 202154:39