Ellipsix Informatics - standard deviation

2013

Nov

Stable standard deviation

Posted by David Zaslavsky on November 3, 2013 11:40 PM

To kick off National Blog Writing Month, here’s something I’ve been saving up for my more mathematically inclined readers — and also for anyone who does scientific computing. It’s a quick computational trick to come out of all the stuff I’ve been doing lately that is not writing blog posts.

Sometimes you have to calculate the standard deviation of a set of numbers. Although there are various definitions of the standard deviation, the usual way to do this in physics is

$$\sigma = \sqrt{\frac{1}{N}\sum (x_k - \expect{x})^2} = \sqrt{\frac{1}{N}\sum x_k^2 - \biggl(\frac{1}{N}\sum x_k\biggr)^2} = \sqrt{\expect{x^2} - \expect{x}^2}$$

where $N$ is the number of data points and $x_k$ are the values. But this can be pretty inaccurate when the values themselves are much larger than the variance — you wind up taking a difference of two very large numbers, the two terms under the square root, and expecting to come out with a small result. That’s bad. Sometimes the difference even winds up being negative!

There’s a better method, described on John D. Cook’s website: instead of computing two large …