4 Ethics in Obtaining Data
This chapter is in-progress.
The process of obtaining data is perhaps the aspect of data science with the most concrete and direct ethical implications.
4.1 Where do data come from?
“‘Raw data’ is an oxymoron.”
– Lisa Gitelman and Virginia Jackson (Gitelman and Jackson 2013)
Do you generate it? Do you collect it? Do you obtain it?
4.2 What “obtaining data” includes
TK
4.3 Sensitive information
TK
4.3.1 Defining Sensitive Information
4.3.2 Handling Sensitive Information
4.4 Risks from data triangulation
TK
4.5 Ethics in Maintaining Data
TK
4.6 Case Study: Census Data
The U.S. census offers a helpful example of ethical decision-making in the data science lifecycle. It has been explored in Data Feminism, as well as X and Y and other materials. It is included here in brief as a canonical example of subjectivity in the process of obtaining data.
4.6.1 Census form in 1790
Massachusetts printed schedule used in the 1790 census.
Categories: free White males 16 and over, free White males under 16, free White females, all other free persons, and slaves.

Source: Census Bureau questionnaire page (U.S. Census Bureau 1790) and National Archives scan hosted by Census.gov (National Archives and Records Administration 1790).
4.6.2 Census form in 1850
Categories: name, age, sex, color, occupation, value of real estate, birthplace, married within the year, school attendance, literacy, deafness, dumbness, blindness, insanity, idiocy, pauper status, and conviction.

Source: Census Bureau questionnaire page (U.S. Census Bureau 1850a) and direct image file from Census.gov (U.S. Census Bureau 1850b).
4.6.3 Census form in 1940
Categories: name, relationship, personal description, residence, birthplace, citizenship, education, employment, occupation, income, veteran status, Social Security, and selected supplemental questions for sample respondents.

Source: Census Bureau questionnaire page (U.S. Census Bureau 1940a) and direct PDF from Census.gov (U.S. Census Bureau 1940b).
4.6.4 Census form in 2020
Informational bilingual questionnaire.
Categories: household count, ownership or tenure, phone number, name, sex, age and date of birth, Hispanic/Latino/Spanish origin, race, relationship, and whether the person usually lives or stays elsewhere.

Source: Census Bureau questionnaire page (U.S. Census Bureau 2020a) and direct PDF from Census.gov (U.S. Census Bureau 2020b).
Notes from Wikipedia’s “Race and ethnicity in the United States census,” section “Relation between ethnicity and race in census results” (Wikipedia contributors 2026):
- treats
Hispanic or Latinoas ethnicity, not race, so it asks as separate questions. - In
2000, a large share of Hispanic/Latino respondents selectedSome other race, showing a mismatch between official categories and how respondents describe themselves. - Since
2000, respondents have been allowed to select more than one race, so race totals can exceed the total population (not directly comparable with older censuses).
4.7 Case Study: Facebook Profile
Similar dynamics can be observed in the Facebook signup process. Until 2014, Facebook had essentially three options: male, female, or no answer. On February 13, 2014, Facebook substantially expanded its gender fields, with 58 different options, custom fields, and additional pronoun settings.
Sources: ABC News (ABC News 2014) and CNN (Kelly 2014).