Variance and standard deviation: Use and misuse (2024)

(Use for skewed data, corrections for bias, repeatability, within-subject standard deviation)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

The variance provides a measure of spread or dispersion of a population. The population variance is computed as the average of the squared deviations of the observations from their mean, hence its alternative name 'mean square error'. If you take a sample, you will under-estimate the true value of the population variance. You correct for this bias by dividing by n - 1 (where n is the number of observations), rather than by n. The standard deviation of a population is simply the square root of the population variance. Similarly the standard deviation of a sample is the square root of the sample variance.

In our review of the literature we found that the standard deviation suffers from the same problem as other measures of dispersion - namely that it is usually just quoted or used to obtain a confidence interval - and then forgotten about. This is because of the tendency to focus excessively on the effects of a treatment on the mean value of the response variable, and pay no attention to the effect of a treatment on dispersion or other characteristics of the distribution. The standard deviation is also overused at the expense of other measures of dispersion. For example, its use with the arithmetic mean (as mean ± SD) is misleading for data with a skewed distribution. This is because errors are no longer distributed symmetrically around the mean. Similarly neither the mean nor the standard deviation are appropriate measures for ordinal variables. For skewed or ordinal data a box and whisker plot of the five quantile summary (minimum, lower quartile, median, upper quartile, and maximum) is much more informative. Often researchers admit their distribution is not remotely normal, and consequently use a non-parametric statistical test to compare medians - yet still (misleadingly) present their results as means with standard deviations.

Much more seriously, the standard deviation (of the observations) is still regularly confused with the standard deviation of the mean (also known as the standard error). The latter is an estimate of the variability of the mean. If a researcher quotes 'mean ± standard deviation', we do not know if the 'standard deviation' is of the observations or of the mean. In general medical researchers nearly always use the standard deviation (of the observations) as their 'default' measure of dispersion. Researchers in other disciplines may use either, which can create confusion!

Another important measure of spread is the within-subject standard deviation. This describes the random component of measurement error and hence provides a useful measure of both reproducibility (if the same test material sent to different laboratories) and repeatability (same test material analyzed by same person in same laboratory). For instance you might wish to assess repeatability of a technician for PCV readings by geting her to measure the PCV of each of ten blood samples on 5 consecutive occasions. Because the ten samples are not identical, the results you obtain will include the variation between cows - in addition to the measurement error. So simply pooling the results, and calculating the overall standard deviation, will overestimate the variation arising from measurement error. Instead the set of individual standard deviations is combined into a single, overall measure of measurement error - the within-subject standard deviation.

This measure tends to be used rather little reflecting a lack of interest on measurement error, or sometimes an apparent refusal to admit it even exists. Where within-subject standard deviation was estimated, we found that sometimes the standard deviation was not independent of the mean, which is a requirement for the validity of this measure. Under such circ*mstances, the data should have been transformed in an attempt to normalize distributions.

What the statisticians say

Woodward (1999) covers measures of location and dispersion in Chapter 2. He recommends the five quantile summary for general use, with the mean and standard deviation reserved for variables with a symmetrical distribution. and Bland (2000) introduces the variance and standard deviation in Chapters 2 and 4 respectively. cover the variance and standard deviation in Chapter 4. However, little attention is given to the best descriptive statistics to use for skewed distributions. Bart et al. (1998) introduce the standard deviation in Chapter 2.

give a useful review of the difference between the standard deviation (of the observations) and the standard error (of the mean). They point out that the standard deviation is a valid measure of variability regardless of distribution - even though one may choose a different summary statistic for a skewed distribution. Lehmann et al. (1996) point out that whilst the mean and standard deviation (or standard error) are appropriate if a variable has a normal distribution, populations with skewed distributions cannot be adequately represented in this way.Bland & Altman (1996) (1) (2) (3) provide a clear account of how to use the within-subject standard deviation as a measure of repeatability. Massé (1997) and other letter writers comment on the assumptions made when estimating the within-subject standard deviation.Benedetti-Cecchi (2003) stresses the importance of the variance around the mean effect size of ecological processes. Anderson et al. (2001) provide a number of helpful suggestions to wildlife biologists for presenting the results of data analyses, in particular the need to distinguish between standard deviation and standard error! Good (1973) attempts to explain the meaning of the term degrees of freedom, following up the much earlier paper by Walker (1940). Wikipedia provides sections on the standard deviation, the variance and degrees of freedom. Stephen Gorard argues the advantages of the average absolute deviation over the standard deviation. Gerard Dallal takes a practical approach to explaining degrees of freedom.

Variance and standard deviation: Use and misuse (2024)

FAQs

What is the usefulness of variance and standard deviation? ›

In short, the mean is the average of the range of given data values, a variance is used to measure how far the data values are dispersed from the mean, and the standard deviation is the used to calculate the amount of dispersion of the given data set values.

Discover More ›

When can standard deviation be misleading? ›

If we have small deviations from normality, the standard deviation becomes a misleading and dangerous way to describe the data. Note that all metrics based on the standard deviation like Cohen's d and confidence intervals are also not robust.

Learn More Now ›

When can you not use standard deviation? ›

The standard deviation is used in conjunction with the mean to summarise continuous data, not categorical data. In addition, the standard deviation, like the mean, is normally only appropriate when the continuous data is not significantly skewed or has outliers.

Discover More Details ›

What are the uses of variance? ›

Statisticians use variance to see how individual numbers relate to each other within a data set, rather than using broader mathematical techniques such as arranging numbers into quartiles. The advantage of variance is that it treats all deviations from the mean as the same regardless of their direction.

See Details ›

What are the advantages and disadvantages of standard deviation? ›

Standard Deviation: Advantage: Captures the variability of returns for a stock. Useful for assessing overall volatility. Disadvantage: Doesn't specifically show how a stock relates to the market, might miss market-specific risk.

What are the disadvantages of variance in statistics? ›

A disadvantage of the variance for practical applications is that, unlike the standard deviation, its units differ from the random variable, which is why the standard deviation is more commonly reported as a measure of dispersion once the calculation is finished.

Discover More ›

What is the main disadvantage of standard deviation? ›

The main disadvantage of standard deviation is that it can be influenced by outliers and extremely high or low values. Even when uncertainty is to an investor's advantage, such as when returns are above average, the standard deviation assumes a normal distribution and assesses all uncertainty as risk.

Why is standard deviation a bad measure of risk? ›

Limitations of Using Standard Deviation as a Risk Measurement Metric. Standard deviation as a risk measurement metric only shows how the annual returns of an investment are spread out, and it does not necessarily mean that the outcomes will be consistent in the future.

Get More Info Here ›

Is standard deviation used for accuracy? ›

The standard deviation, which quantifies how near the data are to the estimated mean, may be used to judge whether an experiment is exact or not. As a result, standard deviation and accuracy are inversely proportional: the higher the standard deviation, the less exact the experiment.

Read On ›

What does variance tell you? ›

Variance is a measure of how data points differ from the mean. According to Layman, a variance is a measure of how far a set of data (numbers) are spread out from their mean (average) value. Variance means to find the expected difference of deviation from actual value.

Keep Reading ›

What is the 3 standard deviation rule? ›

The Empirical Rule states that 99.7% of data observed following a normal distribution lies within 3 standard deviations of the mean. Under this rule, 68% of the data falls within one standard deviation, 95% percent within two standard deviations, and 99.7% within three standard deviations from the mean.

Keep Reading ›

What can standard deviation be used for? ›

What Does Standard Deviation Tell You? Standard deviation describes how dispersed a set of data is. It compares each data point to the mean of all data points, and standard deviation returns a calculated value that describes whether the data points are in close proximity or whether they are spread out.

Discover More ›

What are the three main uses for variance analysis? ›

Useful when developing a future budget.
Can be used as a benchmark for performance and quality expectations.
Can individually identify areas of success and areas for improvement.

View Details ›

Why is variance useful in statistics? ›

Statistical tests such as variance tests or the analysis of variance (ANOVA) use sample variance to assess group differences of populations. They use the variances of the samples to assess whether the populations they come from significantly differ from each other.

See Details ›

What do both the variance and standard deviation tell us about a distribution? ›

Where variance is used to show how much the values in a dataset vary from each other, the standard deviation exists to show how far apart the values in a dataset are from the mean, and therefore can be used to identify outliers.

What is standard deviation useful for? ›

Discover More ›

What is the usefulness of variance of sample mean? ›

Sample variance is used to calculate the variability in a given sample. A sample is a set of observations that are pulled from a population and can completely represent it. The sample variance is measured with respect to the mean of the data set. It is also known as the estimated variance.

View Details ›