Definition of standard deviation

Hi, I know that “ExportToSpreadsheet” module can generate property standard deviation for each image. However, there are two ways standard deviation can be defined:

one:
http://office.microsoft.com/en-us/excel-help/stdeva-HP005209279.aspx
two:
http://office.microsoft.com/en-us/excel-help/stdevp-HP005209281.aspx

I’m wondering which one is cellprofiler using as definition?

Hi,
The short answer is that it looks like we are using the method that Excel calls “STDEVP” (i.e. the ‘divide by n’ method).

The long answer (and probably more than you bargained for! :smile: but I was curious myself) is that since our code is open source, you too can see this in ExportToSpreadsheet :
github.com/CellProfiler/CellPro … adsheet.py
and then in the module that actually calls std:
github.com/CellProfiler/CellPro … rements.py
Line 813 here is :

stdev = values.std() if values is not None else np.NaN
This uses Numpy’s std:
docs.scipy.org/doc/numpy/referen … y.std.html
and since we didn’t specify any “ddof”, it defaults to zero (meaning the divisor is simply “n”).

Generally it is theoretically less biased to use the other method (n-1, or “STDEVA”), however they are not much different if you have a reasonably high number of samples (n). (If not, acquire more!) In addition, in my experience, distributions in biology are rarely Gaussian, so that any difference one might see in STDEVA vs. STDEVP is overwhelmed by the non-normal distribution effects, and thus standard deviations are not strictly appropriate measures of variance.