4  Ethics in Obtaining Data

← Back to Course Homepage

The step of obtaining data, in the context of the data science lifecycle.

Figure 4.1: Obtaining data, in the context of the data science lifecycle.

This chapter is in-progress.

The process of obtaining data is perhaps the aspect of data science with the most concrete and direct ethical implications.

4.1 Where do data come from?

“‘Raw data’ is an oxymoron.”

– Lisa Gitelman and Virginia Jackson (Gitelman and Jackson 2013)

Do you generate it? Do you collect it? Do you obtain it?

4.2 What “obtaining data” includes

TK

4.3 Sensitive information

TK

4.3.1 Defining Sensitive Information

4.3.2 Handling Sensitive Information

4.4 Risks from data triangulation

TK

4.5 Ethics in Maintaining Data

TK

4.6 Case Study: Census Data

The U.S. census offers a helpful example of ethical decision-making in the data science lifecycle. It has been explored in Data Feminism, as well as X and Y and other materials. It is included here in brief as a canonical example of subjectivity in the process of obtaining data.

4.6.1 Census form in 1790

Massachusetts printed schedule used in the 1790 census.

Categories: free White males 16 and over, free White males under 16, free White females, all other free persons, and slaves.

Massachusetts printed schedule used in the 1790 census.

Source: Census Bureau questionnaire page (U.S. Census Bureau 1790) and National Archives scan hosted by Census.gov (National Archives and Records Administration 1790).

4.6.2 Census form in 1850

Free Inhabitants schedule.

Categories: name, age, sex, color, occupation, value of real estate, birthplace, married within the year, school attendance, literacy, deafness, dumbness, blindness, insanity, idiocy, pauper status, and conviction.

1850 Free Inhabitants schedule.

Source: Census Bureau questionnaire page (U.S. Census Bureau 1850a) and direct image file from Census.gov (U.S. Census Bureau 1850b).

4.6.3 Census form in 1940

Population questionnaire.

Categories: name, relationship, personal description, residence, birthplace, citizenship, education, employment, occupation, income, veteran status, Social Security, and selected supplemental questions for sample respondents.

1940 population questionnaire.

Source: Census Bureau questionnaire page (U.S. Census Bureau 1940a) and direct PDF from Census.gov (U.S. Census Bureau 1940b).

4.6.4 Census form in 2020

Informational bilingual questionnaire.

Categories: household count, ownership or tenure, phone number, name, sex, age and date of birth, Hispanic/Latino/Spanish origin, race, relationship, and whether the person usually lives or stays elsewhere.

2020 informational bilingual questionnaire.

Source: Census Bureau questionnaire page (U.S. Census Bureau 2020a) and direct PDF from Census.gov (U.S. Census Bureau 2020b).

Notes from Wikipedia’s “Race and ethnicity in the United States census,” section “Relation between ethnicity and race in census results” (Wikipedia contributors 2026):

  • treats Hispanic or Latino as ethnicity, not race, so it asks as separate questions.
  • In 2000, a large share of Hispanic/Latino respondents selected Some other race, showing a mismatch between official categories and how respondents describe themselves.
  • Since 2000, respondents have been allowed to select more than one race, so race totals can exceed the total population (not directly comparable with older censuses).

4.7 Case Study: Facebook Profile

Similar dynamics can be observed in the Facebook signup process. Until 2014, Facebook had essentially three options: male, female, or no answer. On February 13, 2014, Facebook substantially expanded its gender fields, with 58 different options, custom fields, and additional pronoun settings.

Sources: ABC News (ABC News 2014) and CNN (Kelly 2014).

4.8 References

ABC News. 2014. “Here’s a List of 58 Gender Options for Facebook Users.” February 13. https://abcnews.com/blogs/headlines/2014/02/heres-a-list-of-58-gender-options-for-facebook-users.
Gitelman, Lisa, and Virginia Jackson. 2013. “Introduction: ‘Raw Data’ Is an Oxymoron.” In “Raw Data” Is an Oxymoron, edited by Lisa Gitelman. MIT Press. https://doi.org/10.7551/mitpress/9302.003.0002.
Kelly, Heather. 2014. “Facebook Goes Beyond ’Male’ and ’Female’ with New Gender Options.” February 13. https://www.cnn.com/2014/02/13/tech/social-media/facebook-gender-custom/.
National Archives and Records Administration. 1790. “Massachusetts Printed Schedule Used in the 1790 Census.” https://s3.amazonaws.com/NARAprodstorage/lz/dc-metro/rg-029/5634994/5634994_01_01790.pdf.
U.S. Census Bureau. 1790. “1790 Census: Instructions to Enumerators.” https://www.census.gov/programs-surveys/decennial-census/technical-documentation/questionnaires.1790_Census.html.
U.S. Census Bureau. 1850a. “1850 Census: Instructions to Enumerators.” https://www.census.gov/programs-surveys/decennial-census/technical-documentation/questionnaires.1850_Census.html.
U.S. Census Bureau. 1850b. “1850 Free Inhabitants Schedule.” https://www2.census.gov/programs-surveys/decennial/technical-documentation/questionnaires/1850/1850-free-inhabitants-schedule.png.
U.S. Census Bureau. 1940a. “1940 Census: Instructions to Enumerators.” https://www.census.gov/programs-surveys/decennial-census/technical-documentation/questionnaires.1940_Census.html.
U.S. Census Bureau. 1940b. “1940 Population Questionnaire.” https://www.census.gov/content/dam/Census/programs-surveys/decennial/technical-documentation/questionnaires/1940_population_questionnaire.pdf.
U.S. Census Bureau. 2020a. “2020 Census: Instructions to Enumerators.” https://www.census.gov/programs-surveys/decennial-census/technical-documentation/questionnaires.2020_Census.html.
U.S. Census Bureau. 2020b. “2020 Informational Questionnaire.” https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/questionnaires-and-instructions/questionnaires/2020-informational-questionnaire.pdf.
Wikipedia contributors. 2026. “Race and Ethnicity in the United States Census.” https://en.wikipedia.org/wiki/Race_and_ethnicity_in_the_United_States_census#Relation_between_ethnicity_and_race_in_census_results.