I occasionally have discussions on blogs where basic statistics concerning “averages” is useful. Rather than referring to another site that explains these statistics, I thought I’d try to summarize them here.

Imagine a town, Sudsville, located on a beautiful lake. The town grew around a thriving, prosperous family-owned industry that has made soap for 3 generations. The soap facilities are in a low-polluting industrial park with a railroad running through both the town and the park.

The town has 5 major income brackets and oddly enough, they tend to group into different neighborhoods. Each of the colored squares here lists the average income of the various neighborhoods. The size of each rectangle is proportional the the number of people living in the neighborhood which by percentage of the town’s population are:

- $12,000/yr: 15%
- $40,000/yr: 30%
- $80,000/yr: 25%
- $210,000/yr: 20%
- $1,000,000/yr: 10%

Given the above, the following question is a classic problem for both elementary statistic and policy courses: **If someone were to ask you for the average income in Sudsville, what number would you give?** If you are not familiar with this problem, please take a moment to jot down your answer before clicking “more”.

“Average” is actually a very slippery concept. It is the lay version of “measure of central tendency”. And like many slippery concepts, its slipperiness become apparent the larger your data set gets. In elementary statistics, there are three basic measures of “average”: mean, median and mode.

To the right I again represent the neighborhood groups to help you better visualize why all three of these averages come out blatantly different.

If you are not familiar with the three different measures for “average”, I will linkto their wiki article. Here are the calculations for Sudsville:

So, it should be obvious that each of the “averages” distorts the image of the town in some way. Interestingly, the mean (which is the most common notion of “average”) is often used to describe the “normal” for any group. But in our example, we can see that the mean for Sudsville doesn’t even clearly fall in any existing neighborhood. So if you thought about the normal citizen of Sudsville as someone who makes around $170,000 per year, no one would match that description. This is called the ecological fallacy.

As my sketchs below illustrate, there are two main types of fallacies which mistakenly see groups as homogenous. The first, the ecological fallacy we just discussed where by saying “the average person in Sudsville makes $175,000/year”, you blur over important differences. You are tempting inferences about the nature of specific individuals based solely upon aggregate statistics collected for the group.

The second mistake of homogeneity deception is the faulty-generalization fallacy when one of the many subgroups of a population is assumed to generalize for the whole group. This group is often the speaker’s group and/or a powerful, outspoken, large or otherwise influential group. In the diagram, I made the $80,000 group much larger to illustrate this fallacy. While the ecological fallacy blurs the details with some favorite, arbitrary measure of central tendency, the faulty-generalization fallacy chooses an existing favorite group to conveniently erase the others.

Your examples also show a third mistake that people often make. Your examples show that the

distributioncan make all the difference. When given a mean or a median, people often make assumptions about the distribution. If the distribution is relatively Gaussian, like height or weight of individuals, the mean can be quite informative. You may not be able to predict specific individuals, but you’ll be able to do a lot with just the mean.However, if the distribution is a long tail or a fat tail, and the person assumes that it’s Gaussian, then the mean will be worse than useless.

Not many things that are important follow a Gaussian distribution, and income is one. People often

assumeGaussian by default, to their own detriment.@ JS Allen: Indeed: The Gaussian Assumption

Building on JS Allen’s comment, consideration of the mean, mode, and median relative to one another can go a step further and suggest the type of distribution curve which exists. Having a median which is less than half of the mean, and a mode which is half of the median, suggests a distribution which is heavily weighted towards lower incomes. In other words, a person familiar with statistics would immediately realize that the mean was not a good representation of the majority when presented with the associated mode and median in this example.

@TWF,Indeed, all elementary statistics stuff. So, if you will go back my post on EVIL, you will see that you declared a “norm for society” as a standard. I was accusing you of making a homogenization error: either ecological fallacy or faulty generalization fallacy. If you wish, you can go back to that post and discuss the issue there, we could. (I’d prefer not to discuss on this thread).

Very interesting … and a bit disconcerting. I’ll keep a closer watch on my tendency to “average” things out.

Thanx, Louise, glad you enjoyed. Watch those averages!