I occasionally have discussions on blogs where basic statistics concerning “averages” is useful. Rather than referring to another site that explains these statistics, I thought I’d try to summarize them here.
Imagine a town, Sudsville, located on a beautiful lake. The town grew around a thriving, prosperous family-owned industry that has made soap for 3 generations. The soap facilities are in a low-polluting industrial park with a railroad running through both the town and the park.
The town has 5 major income brackets and oddly enough, they tend to group into different neighborhoods. Each of the colored squares here lists the average income of the various neighborhoods. The size of each rectangle is proportional the the number of people living in the neighborhood which by percentage of the town’s population are:
- $12,000/yr: 15%
- $40,000/yr: 30%
- $80,000/yr: 25%
- $210,000/yr: 20%
- $1,000,000/yr: 10%
Given the above, the following question is a classic problem for both elementary statistic and policy courses: If someone were to ask you for the average income in Sudsville, what number would you give? If you are not familiar with this problem, please take a moment to jot down your answer before clicking “more”.
“Average” is actually a very slippery concept. It is the lay version of “measure of central tendency”. And like many slippery concepts, its slipperiness become apparent the larger your data set gets. In elementary statistics, there are three basic measures of “average”: mean, median and mode.
To the right I again represent the neighborhood groups to help you better visualize why all three of these averages come out blatantly different.
If you are not familiar with the three different measures for “average”, I will linkto their wiki article. Here are the calculations for Sudsville:
So, it should be obvious that each of the “averages” distorts the image of the town in some way. Interestingly, the mean (which is the most common notion of “average”) is often used to describe the “normal” for any group. But in our example, we can see that the mean for Sudsville doesn’t even clearly fall in any existing neighborhood. So if you thought about the normal citizen of Sudsville as someone who makes around $170,000 per year, no one would match that description. This is called the ecological fallacy.
As my sketchs below illustrate, there are two main types of fallacies which mistakenly see groups as homogenous. The first, the ecological fallacy we just discussed where by saying “the average person in Sudsville makes $175,000/year”, you blur over important differences. You are tempting inferences about the nature of specific individuals based solely upon aggregate statistics collected for the group.
The second mistake of homogeneity deception is the faulty-generalization fallacy when one of the many subgroups of a population is assumed to generalize for the whole group. This group is often the speaker’s group and/or a powerful, outspoken, large or otherwise influential group. In the diagram, I made the $80,000 group much larger to illustrate this fallacy. While the ecological fallacy blurs the details with some favorite, arbitrary measure of central tendency, the faulty-generalization fallacy chooses an existing favorite group to conveniently erase the others.