2  Working Toward Wisdom

Introduction to Ethics in Data Science

← Back to Course Homepage

“Those analysis droids you’ve got over there only focus on symbols. Hagh! I should think you Jedi would have more respect for the difference between knowledge and, hu-hu-hu… wisdom.”

– Dexter Jettster in Attack of the Clones, written by George Lucas and Jonathan Hales (Lucas and Hales 2002, 35)

“Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?”

– T. S. Eliot in The Rock (Eliot 1934)

This chapter is in-progress.

Overview: This chapter describes the DIKW pyramid — data, information, knowledge, wisdom — and its relevance to the data science lifecycle, especially as a way to consider the fundamental goals and purposes of data science work. The chapter also introduces concepts of understanding and common knowledge, and their relevance to data science. It closes with a brief discussion of statistical worldviews, where “just do the math” requires choosing particular beliefs about math and the world at large.

As one works through the various stages of the data science lifecycle, it is helpful to consider how each stage relates to what is often called the data-information-knowledge-wisdom “hierarchy” or the “DIKW pyramid” for short (Rowley 2007). The framing has a long history across systems thinking and information science (e.g., Ackoff (1989); Vance (1997); Bernstein (2011)). As argued by this chapter and other sources, the definitive logical hierarchy implied by the DIKW pyramid is somewhat misleading (Frické 2009), however, the intuitions around the pyramid metaphor offer helpful framing for the work of data science.

The DIKW pyramid with four labeled layers: data, information, knowledge, and wisdom.

Figure 2.1: A rendition of the DIKW pyramid, with data at the base supporting successive layers of information, knowledge, and wisdom.

The basic premise of the DIKW pyramid is as follows. To build a pyramid with wisdom at the top, start with a brick: a datum. Use multiple bricks to create a large organized layer of bricks: data. Upon data, one can begin to construct information, and upon information, one can construct knowledge.

Like all metaphors, the DIKW pyramid eventually breaks down. Although wisdom is not literally the top layer of a pyramid, the model does accurately suggest that the purpose of data (i.e. data science) generally aims toward wisdom. That may be a surprising word for a data science book.

Sure, data scientists want information, and probably knowledge.

2.1 What is wisdom?

Anyone who wants to answer the question “what is wisdom” would benefit from also being able to answer the question, “what is a triangle?”

Let’s try to display a perfect triangle on your screen, or at least get as close as we can. Here is an attempt.

An equilateral triangle on a golden-ratio canvas.

Figure 2.2: An attempt to display a perfect triangle.

Looks like a pretty good triangle! It uses scalable vector graphics (i.e. an svg file), to make the lines look as crisp as they possibly can on a computer screen. The image only uses a few pieces of information: the size of the frame (using the golden ratio, of course), the location of the three points for an equilateral triangle within that frame, and the color and width of the line to connect those dots. This particular image uses a black line with a width of six pixels (a multiple of three, of course).

But is there really a triangle in this image? Is it a perfect triangle? (Mathematicians sometimes ask similar questions about perfect circles (Payne 2019).) One way to find out if it’s “perfect” is to zoom in on a portion of it. We will zoom in by a power of ten, inspired by the Eames films (Eames and Eames 1968, 1977).

An equilateral triangle with a highlighted rectangular frame around its left edge.

Figure 2.3: The red rectangular frame will be the outer frame for the next image.

The red rectangle represents another golden-ratio rectangle, this one being 1/10th the size of the original rectangle (160 pixels instead of the original 1600 pixels). Now, we will look at that frame up close.

A cropped and enlarged view of the triangle's left edge.

Figure 2.4: The left edge of the “perfect” triangle, which may not be perfect after all.

Although the original picture may have looked like a perfect triangle at first, alas, closer inspection shows the left edge of the triangle is not even a straight line. Those edges are pretty jagged, and indeed, that is the only way to draw “lines” on a computer screen. We can look even closer to see how this works, zooming in by another factor of ten.

An enlarged edge view with a smaller frame marking the next crop.

Figure 2.5: The red rectangular frame will be the outer frame for the next zoomed-in image.

Again, the red frame shows the area where we will zoom in.

An enlarged crop of the triangle's edge.

Figure 2.6: The left edge of the “perfect” triangle breaks down even further.

Now, the flaws of the triangle are even closer and more apparent. We have laid bare its imperfections (or at least some of its imperfections). Then again, those imperfections have lain there all along: the pixels (picture elements) in the original, perfect-looking triangle were always there, they were just too small to see.

Even now, this image of the black squares is not really showing you the pixels. This is what a pixel actually looks like up close:

Microscopic close-up of an LCD display showing red, green, and blue subpixels.

Figure 2.7: A microscopic image of an LCD display showing subpixels. Source: Jacek Halicki, 2023 Mikroskopowy obraz matrycy LCD, licensed under CC BY-SA 4.0.

Yet again, even this statement is somewhat inaccurate - this is not really what a pixel looks like. There is no way to show a zoomed-in picture of your screen at this very moment, however, you could go get a magnifying glass if you are curious. If you looked through that magnifying glass and you zoomed in further, you might see “subpixels” of red, blue, and green in your screen.

Ten-times zoomed view of the LCD subpixel pattern.

Figure 2.8: A 10x zoom into the center of the LCD subpixel image, showing rectangular subpixels. Source: Jacek Halicki, 2023 Mikroskopowy obraz matrycy LCD, licensed under CC BY-SA 4.0.

But the subpixels on your screen might not be the same shape as the subpixels on someone else’s screen. For example, subpixels look rather different on a standard definition CRT television, a CRT computer monitor, and LCD laptop screens, as shown below.

Microscopic photos comparing pixel geometries from CRT and LCD displays.

Figure 2.9: Pixel geometries from CRT and LCD displays, center-cropped to a golden-ratio rectangle. Source: Peter Halasz (Pengo), Pixel geometry 02 Pengo.jpg, licensed under CC BY-SA 3.0.

We could go on for quite a while with this repeated zoom-in effect. But what does all of this have to do with data science?

For one, during the work of data science, you may find yourself (or your team) in a “repeated zoom-in” cycle, and it is useful to be able to recognize them. These repeated loops can be useful to support exploration, but infuriating when they seem to go on forever (as the saying goes, they make “helpful servants but terrible masters”).

You may have been tasked with answering a deceptively simple question: * Do guests like the new cold brew recipe? * Is the running plan helping people run faster? * Do people sleep better with noise machines?

But any real, living, curious human asking these questions will want more than “yes” or “no” as an answer. The task of a data scientist is not merely to deliver the answer up the chain, like a machine that takes data as input and gives knowledge as output.

And this is the second point for which our triangle adventure is relevant. The real value of a wise and competent data scientist is to understand, in detail, how the subpixels of data can become images of information. That is, a data scientist must explore the many possible decisions that can take data to construct information and/or knowledge.

Wisdom involves the ability to say that the original figure was, in some sense, a triangle. And wisdom also involves being able to explain why the figure was not exactly the triangle. Wisdom shows us the process of going from subpixels to pixels to lines to triangles. In some contexts, a data scientist may need to simply say “yes, that is a triangle,” while in other contexts the data scientist may need to explain why it is technically not a triangle. As will be discussed in Chapter 5, this requires an awareness of goals, audience knowledge, and other contextual factors which can improve communication.

2.2 Understanding

Notably missing from the DIKW pyramid is the word “understanding.” This points to one limitation of the metaphor: we are not just working with fixed, static data that are automatically converted to information and/or knowledge and/or wisdom. The existence of the DIKW pyramid thus implies human understanding.

To quote Ursula Le Guin, “What good are all the objects in the universe, if there is no subject?” (Le Guin 1979). And to rephrase this sentiment in the context of the DIKW pyramid, “what good are all the data, if there is no data scientist?”

There must be some subject that turns data into information, information into knowledge, and knowledge into wisdom. If these transformations can happen, they happen through human understanding.

Intuitively, knowing something is more than just storing some piece of information – knowledge is more than compiled information. Zeleny (1987) describes knowledge as a “network of relations through which humans coordinate their actions,” adding that “knowledge brings (through language) coherence and coordination to the otherwise turbulent and chaotic world of human action.”

What makes Wikipedia a source of knowledge is not merely the text on the page(s), but rather the coordination process of the Wikipedia editor network which iteratively writes, reviews, and updates the Wikipedia page(s). We trust the text on the page(s) because of the subjects who crafted and re-crafted the language, ensuring its coherence and alignment with existing human language.

In short, the DIKW pyramid does not build itself.

2.3 Common Knowledge

Something profound happens when multiple people “read” (look at, watch, draw, view, etc.) information together. In brief, this is the phenomenon of common knowledge: shared awareness about what other people know, and/or what others have contributed to the pyramid.

Common knowledge is often confused with mutual knowledge. If we both know that there is only one more ice cream bar in the freezer, that is mutual knowledge. But if we both know that the other person knows that there is only one ice cream bar in the freezer, that is common knowledge, or awareness about what other people know.

Perhaps the best way to explain common knowledge is to recognize its use in storytelling methods which create suspense in film and television. For example, Alfred Hitchcock described the distinction between “surprise” and “suspense” as a difference in common knowledge: in a surprise, the audience discovers an important fact at the same time as the characters. The information is withheld from everyone (Hitchcock 2019).

But suspense depends on partially shared information. For example, when the audience knows that there is a bomb under the dining table, but they also know that Bob was gone when the bomb was placed under the table. In Hitchcock’s framing, this scenario creates suspense because the audience anticipates danger from the bomb, and also knows that Bob does not anticipate the same danger.

A data scientist can benefit from understanding and leveraging common knowledge (or lack thereof). When communicating results, for example, it may not be sufficient just to know what the audience knows. You may also want to know what audience members know about what other audience members know. The intricacies of common knowledge are fascinating, and two classic puzzles show how it can coordinate action (Fagin et al. 1995).

In “the hat puzzle,” row of people each wear a hat they cannot see, and no one is allowed to move until they deduce their own hat color. The line stays frozen. Then, a public announcement states “at least one of you wears a red hat.” The announcement adds no new visible fact, yet it converts mutual knowledge into common knowledge — and that is what finally allows the line move.

A related puzzle, the muddy children puzzle, is described by Baltag and Renne (2024) as follows:

Three children are playing in the mud. Father calls the children to the house, arranging them in a semicircle so that each child can clearly see every other child. “At least one of you has mud on your forehead”, says Father. The children look around, each examining every other child’s forehead. Of course, no child can examine his or her own. Father continues, “If you know whether your forehead is dirty, then step forward now.” No child steps forward. Father repeats himself a second time, “If you know whether your forehead is dirty, then step forward now.” Some but not all of the children step forward. Father repeats himself a third time, “If you know whether your forehead is dirty, then step forward now.” All of the remaining children step forward. How many children have muddy foreheads?

The father’s announcement adds no new visible fact — every child could already see the others’ foreheads — yet it creates the common knowledge that enables the deductions. Each round of silence carries information, and counting those silences lets each muddy child infer their own state.

2.4 Statistical worldviews

One final component of wisdom for the data scientist is an awareness of different worldviews and their relevance to the data science lifecycle. Although some may view data scientists as having a neutral, objective, “view from nowhere” (D’Ignazio and Klein 2020), there are many “researcher degrees of freedom” exercised throughout the lifecycle. Making decisions within these degrees of freedom entails a degree of subjectivity.

To demonstrate this and conclude the chapter, we will consider one historical example: frequentist versus Bayesian statistics.

Consider a simple scenario from (Ipeirotis 2008):

  • You have a coin that, when flipped, ends up head with probability p and ends up tail with probability 1−p
  • The value of p is unknown, but you want to know it
  • You flip the coin 14 times and get 10 heads
  • A stranger walks by and offers a bet as to whether the next two flips will both be heads
  • Do you take the bet?

I will leave the mathematical details to the original source and focus on the relevant aspect for this book: the different conclusions reached through different statistical worldviews.

In this particular example, a frequentist would estimate p from the 14 observations, estimating a 51% chance of two consecutive heads. A Bayesian reaches a different estimate, 48.5% (by treating p as a distribution, incorporating prior beliefs, and using Bayes’ theorem to account for the observations).

There are some underlying beliefs behind these numbers: frequentists treat probability as long-run frequency and judge procedures by their error rates across repeated trials (Romeijn 2025). Bayesians, on the other hand, treat probability as a degree of belief (or “credence”) that is updated as evidence arrives (Hajek and Hartmann 2024).

So, do you take the bet? One might expect there to be a single answer to the statistical question: “just do the math!” But “the math” was not given to us on a stone tablet, and we have multiple paradigms upon which we might base our statistical calculations. This is a case where one must choose which math to use, and to some extent, what to believe about the world.

To my understanding, this particular debate has largely settled down. In 1986, Bradley Efron asked “Why Isn’t Everyone a Bayesian?” and implied the choice as sort of a live contest (Efron 1986). More recently, however, Richard McElreath (and others) have suggested that the Bayesian-versus-frequentist debate has essentially been subsumed by the question of causal inference (McElreath 2020).

The rise of the causal inference paradigm does not imply a resolution to subjectivity. Causal modeling may offer more regularity in asking questions, i.e., “what would happen under intervention?” This lends to causal graphs (DAGs - we’ll talk about them in class) and more formal structures, rather than just loose associations (Hitchcock 2024).

But causal modeling actually opens up another can of worms about what it actually means for one thing to cause another thing:

  • Spurious regularities: The rooster crows every morning right before sunrise, but the rooster does not cause the sun to rise. How do we separate real causes from things that just consistently show up at the same time?
  • Multiple necessary conditions: When a fire needs heat, fuel, and oxygen, which one is considered “the cause” of the fire?
  • Causation without pattern: How do we make sense of (or prove) one-time “causes” and effects, like the meteor that killed the dinosaurs?
  • Common causes: Ice cream sales and drownings rise and fall together, but ice cream does not cause drowning (and drowning does not cause ice cream sales). How do we know when there is a lurking third factor (in this case summer weather) that drives multiple effects?
  • Directionality: A train’s speedometer needle turns as the train goes faster, and the train goes faster as the speedometer needle turns. Of course, flicking the needle will not speed up the train. How do we determine causal directionality in more complicated scenarios?
  • Overdetermination: Two people each empty a full bucket of water onto a campfire at the same moment, and either bucket alone would have been enough to put out the fire. Which bucket “caused” the fire to go out?

For further discussion of these questions, see (Andreas and Guenther 2026). And if you enjoy these kinds of philosophical puzzles, you are hopefully going to enjoy the next chapter.

2.5 References

Ackoff, Russell. 1989. “From Data to Wisdom.” Journal of Applied Systems Analysis. https://faculty.ung.edu/kmelton/Documents/DataWisdom.pdf.
Andreas, Holger, and Mario Guenther. 2026. Regularity and Inferential Theories of Causation.” In The Stanford Encyclopedia of Philosophy, Spring 2026, edited by Edward N. Zalta and Uri Nodelman. Https://plato.stanford.edu/archives/spr2026/entries/causation-regularity/; Metaphysics Research Lab, Stanford University.
Baltag, Alexandru, and Bryce Renne. 2024. Supplement to Dynamic Epistemic Logic.” In The Stanford Encyclopedia of Philosophy, Winter 2024, edited by Edward N. Zalta and Uri Nodelman. Https://plato.stanford.edu/entries/dynamic-epistemic/appendix-B-solutions.html; Metaphysics Research Lab, Stanford University.
Bernstein, Jay H. 2011. “The Data-Information-Knowledge-Wisdom Hierarchy and Its Antithesis.” NASKO 2 (1): 68–75. https://doi.org/10.7152/nasko.v2i1.12806.
D’Ignazio, Catherine, and Lauren F. Klein. 2020. Data Feminism. MIT Press. https://doi.org/10.7551/mitpress/11805.001.0001.
Eames, Charles, and Ray Eames. 1968. A Rough Sketch for a Proposed Film Dealing with the Powers of Ten and the Relative Size of Things in the Universe. Directed by Charles Eames and Ray Eames. https://en.wikipedia.org/wiki/Powers_of_Ten_(film_series).
Eames, Charles, and Ray Eames. 1977. Powers of Ten: A Film Dealing with the Relative Size of Things in the Universe and the Effect of Adding Another Zero. Directed by Charles Eames and Ray Eames. https://en.wikipedia.org/wiki/Powers_of_Ten_(film_series).
Efron, Bradley. 1986. “Why Isn’t Everyone a Bayesian?” The American Statistician 40 (1): 1–5. https://doi.org/10.1080/00031305.1986.10475342.
Eliot, T. S. 1934. The Rock: A Pageant Play. Faber; Faber. https://archive.org/details/in.ernet.dli.2015.501089/page/n9/mode/2up.
Fagin, Ronald, Joseph Y. Halpern, Yoram Moses, and Moshe Y. Vardi. 1995. Reasoning about Knowledge. MIT Press.
Frické, Martin. 2009. “The Knowledge Pyramid: A Critique of the DIKW Hierarchy.” Journal of Information Science 35 (2): 131–42. https://doi.org/10.1177/0165551508094050.
Hajek, Alan, and Stephan Hartmann. 2024. Bayesian Epistemology.” In The Stanford Encyclopedia of Philosophy, Summer 2024, edited by Edward N. Zalta and Uri Nodelman. Https://plato.stanford.edu/archives/sum2024/entries/epistemology-bayesian/; Metaphysics Research Lab, Stanford University.
Hitchcock, Alfred. 2019. “Interview with Alfred Hitchcock (1973).” In Hitchcock on Hitchcock, Volume 2, edited by Sidney Gottlieb. University of California Press. https://doi.org/10.1525/9780520960398-035.
Hitchcock, Christopher. 2024. Causal Models.” In The Stanford Encyclopedia of Philosophy, Summer 2024, edited by Edward N. Zalta and Uri Nodelman. Https://plato.stanford.edu/archives/sum2024/entries/causal-models/; Metaphysics Research Lab, Stanford University.
Ipeirotis, Panos. 2008. “Are You a Bayesian or a Frequentist? (Or Bayesian Statistics 101).” January. https://www.behind-the-enemy-lines.com/2008/01/are-you-bayesian-or-frequentist-or.html.
Le Guin, Ursula K. 1979. “Science Fiction and Mrs. Brown.” In The Language of the Night: Essays on Writing, Science Fiction, and Fantasy, edited by Susan Wood. Putnam. https://openlibrary.org/books/OL4100441M/The_language_of_the_night.
Lucas, George, and Jonathan Hales. 2002. Star Wars: Episode II – Attack of the Clones. Directed by George Lucas. https://assets.scriptslug.com/live/pdf/scripts/star-wars-episode-ii-attack-of-the-clones-2002.pdf?v=1729114998#page=35.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. 2nd ed. Chapman; Hall/CRC. https://doi.org/10.1201/9780429029608.
Payne, Emily. 2019. “Do Perfect Circles Exist? Maybe.” March 14. https://www.cmu.edu/mcs/news-events/2019/0314_pi-day-perfect-circles.html.
Romeijn, Jan-Willem. 2025. Philosophy of Statistics.” In The Stanford Encyclopedia of Philosophy, Winter 2025, edited by Edward N. Zalta and Uri Nodelman. Https://plato.stanford.edu/archives/win2025/entries/statistics/; Metaphysics Research Lab, Stanford University.
Rowley, J. E. 2007. “The Wisdom Hierarchy: Representations of the DIKW Hierarchy.” Journal of Information Science 33 (2): 163–80. https://doi.org/10.1177/0165551506070706.
Vance, David. 1997. “Information, Knowledge and Wisdom: The Epistemic Hierarchy and Computer-Based Information Systems.” AMCIS 1997 Proceedings, August. https://aisel.aisnet.org/amcis1997/165.
Zeleny, Milan. 1987. “Management Support Systems: Towards Integrated Knowledge Management.” Human Systems Management 7 (1): 59–70. https://doi.org/10.3233/HSM-1987-7108.