is called a census. If you then summarize all of the census information into one number, that number is a parameter , not a statistic. Most of the time, researchers are trying to estimate the parameters using statistics. In the case of the U.S. Census Bureau, that agency wants to report the total number of people in the United States, so it conducts a census. However, due to logistical problems in doing such an arduous task (such as being able to contact homeless folks), the census numbers can only be called estimates in the end, and they're adjusted upward to account for those people that the census missed. The long form for the census is filled out by a random sample of households; the U.S. Census Bureau uses this information to draw conclusions about the entire population (without asking every person to fill out the long form).
Mean (average)
The mean , also referred to by statisticians as the average , is the most common statistic used to measure the center, or middle, of a numerical data set. The mean is the sum of all the numbers divided by the total number of numbers. See Chapter 5 for more on the mean.
HEADS UP
The mean may not be a fair representation of the data, because the average is easily influenced by outliers (very large or very small values in the data set that are not typical).
Median
The median is another way to measure the center of a numerical data set (besides the good old standby, the average). A statistical median is much like the median of an interstate highway. On a highway, the median is the middle of the road, and an equal number of lanes lay on either side of the median. In a numerical data set, the median is the point at which there are an equal number of data points whose values lie above and below the median value. Thus, the median is truly the middle of the data set. See Chapter 5 for more on the median.
REMEMBER
The next time you hear an average reported, look to see whether the median is also reported. If not, ask for it! The average and the median are two different representations of the middle of a data set and can often give two very different stories about the data.
Standard deviation
Have you heard anyone report that a certain result was found to be "2 standard deviations above the mean"? More and more, people want to report how significant their results are, and the number of standard deviations above or below average is one way to do it. But exactly what is a standard deviation?
The standard deviation is a way statisticians use to measure the amount of variability (or spread) among the numbers in a data set. As the term implies, a standard deviation is a standard (or typical) amount of deviation (or distance) from the average (or mean, as statisticians like to call it). So, the standard deviation, in very rough terms, is the average distance from the mean. See Chapter 5 for calculations and more information.
The standard deviation is also used to describe where most of the data should fall, in a relative sense, compared to the average. For example, in many cases, about 95% of the data will lie within two standard deviations of the mean. (This result is called the empirical rule. See Chapter 8 for more on this.)
TECHNICAL STUFF
The formula for standard deviation( s ) is as follows:
where
n = the number of values in the data set
x = the average of all the values
x = each value in the data set
For detailed instructions on calculating the standard deviation, see Chapter 5 .
HEADS UP
The standard deviation is an important statistic, but it is often absent when statistical results are reported. Without it, you're getting only part of the story about the data. Statisticians like to tell the story about the man who had one foot in a bucket of ice water and the other foot in a bucket of boiling water. He said that, on average, he felt just great! But think about the variability in the two temperatures for each of his feet. Closer to home, the average house price, for example,
Tamora Pierce
Brett Battles
Lee Moan
Denise Grover Swank
Laurie Halse Anderson
Allison Butler
Glenn Beck
Sheri S. Tepper
Loretta Ellsworth
Ted Chiang