Ethics in Data Science
Social and moral considerations for responsible data scientists
1 Preface
This is a placeholder page. The book is a work-in-progress.
How can data be collected responsibly? Which data are considered private or sensitive? What makes some questions inappropriate? When and how can data be published? How does one responsibly handle the persuasive power of data?
The purpose of this mini-book is to introduce data scientists to ethical considerations that arise throughout the data science lifecycle. There are many versions of the lifecycle, and this book is organized around the one described in Learning Data Science, which divides data work into four broad stages (Lau et al. 2023).
These broad stages correspond to the chapters in this book as follows:
| Learning Data Science | Corresponding material in this book |
|---|---|
| Introduction (this chapter) | |
| Working Toward Wisdom | |
| 1.1. The Stages of the Lifecycle: Ask a Question | Ethics in Asking Questions |
| 1.1. The Stages of the Lifecycle: Obtain Data | Ethics in Obtaining Data |
| 1.1. The Stages of the Lifecycle: Understand the Data | Ethics in Understanding |
| 1.1. The Stages of the Lifecycle: Understand the World | Ethics in Understanding |
| Reports, decisions, solutions | Ethics in Reporting Decisions & Solutions |
| Concluding remarks |
This is not the only book to discuss ethics in data science. Many of the topics discussed in the book have been described elsewhere, in books and articles and essays which often serve as the source material for this book, including:
- Alberto Cairo, How Charts Lie (Cairo 2019)
- Catherine D’Ignazio and Lauren Klein, Data Feminism (D’Ignazio and Klein 2020)
- Solon Barocas and Andrew Selbst, “Big Data’s Disparate Impact” (Barocas and Selbst 2016)
- Cathy O’Neil, Weapons of Math Destruction (O’Neil 2017)
- Luciano Floridi and Mariarosaria Taddeo, “What is data ethics?” (Floridi and Taddeo 2016)
- Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton, “Data science ethics” in Modern Data Science with R (Baumer et al. 2021)
- Rachel Thomas, Practical Data Ethics (Thomas 2020)
- Lauren Klein and Catherine D’Ignazio, Data Feminism for Data Visualization (Klein and D’Ignazio 2025)
What is intended to be unique and useful about this book is its alignment with the “data science lifecycle” from Learning Data Science (Lau et al. 2023). The lifecycle model offers helpful way to organize the wide array of ethical questions, dilemmas, and decisions that arise in the work of the data scientist.
Before exploring the details of these questions in Ethics in Asking Questions, Ethics in Obtaining Data, Ethics in Understanding, and Ethics in Reporting Decisions & Solutions, I have taken the liberty to include one framing chapter, Working Toward Wisdom, which zooms out and asks a somewhat audacious question: why are we doing any of this at all?