Calculating Standard Deviation in Google Sheets vs Python
Hello, I'm Dang, an AI and machine learning engineer at Knowledgelabo, Inc. We provide a service called "Manageboard," which helps aggregate, analyze, and manage corporate data scattered throughout our organization. Manageboard is set to enhance its AI capabilities in the future. In my articles, I will share the challenges we encountered during our research and development.
Background
Standard deviation is an important statistical measure used to understand the variability of data, indicating how spread out the values are. There are two types of standard deviation: population standard deviation and sample standard deviation. Each has a different formula, and which one to use depends on the nature of the data. In this article, I will explain the differences in how standard deviation is calculated in Google Sheets and Python.
Basic Understanding of Standard Deviation
Standard deviation can be calculated in two ways: population standard deviation and sample standard deviation. The formulas differ, and the appropriate method depends on the data you are working with.
- Population Standard Deviation: This method is used when you are calculating the standard deviation based on an entire population. The formula is as follows:
\sigma=\sqrt{\frac{1}{N}\sum^N_{n=1} (x_n-\overline{x})^2}
Where is the total number of data points,N is each individual data point, andx_n is the mean of the data.\overline{x} - Sample Standard Deviation: This method is used when the data represents a sample from a larger population. It is an unbiased estimate of the population standard deviation. The formula for the sample standard deviation is:
s=\sqrt{\frac{1}{N-1}\sum^N_{n=1} (x_n-\overline{x})^2}
In this case, the denominator is to apply a correction for the sample size, which improves the estimate for the population standard deviation.N−1
Calculating Standard Deviation in Google Sheets
In Google Sheets (and Microsoft Excel), the STDEV
function is used to calculate the standard deviation. However, this function calculates the sample standard deviation by default. It applies the formula with
- Sample Standard Deviation: To calculate the sample standard deviation, simply use the
STDEV
function. For example, if you want to calculate the sample standard deviation for the data range from A1 to A10, use the following formula:
=STDEV(A1:A10)
- Population Standard Deviation:
If you need to calculate the population standard deviation, you must use theSTDEV.P
(orSTDEVP
) function. For example, to calculate the population standard deviation for the data range from A1 to A10, use the following formula:
=STDEV.P(A1:A10)
In Google Sheets, it’s important to note that the default behavior is to calculate the sample standard deviation. If you want the population standard deviation, you need to use a different function.
Calculating Standard Deviation in Python
In Python, it is common to use the NumPy library to calculate standard deviation. The std()
function in NumPy calculates the population standard deviation by default. To calculate the sample standard deviation, you need to specify a parameter.
- Population Standard Deviation: By default, the
numpy.std()
function calculates the population standard deviation. For example, if you have a list calleddata
, you can calculate the population standard deviation as follows:
import numpy as np
data = [10, 12, 14, 18, 20]
population_std = np.std(data)
print(population_std)
In this case, the function uses the formula with
- Sample Standard Deviation: To calculate the sample standard deviation, you need to use the
ddof
(Delta Degrees of Freedom) parameter to set the denominator to . You can calculate the sample standard deviation as follows:N−1
sample_std = np.std(data, ddof=1)
print(sample_std)
Summary
Default Standard Deviation | Google Sheets | Python (NumPy) |
---|---|---|
Default Standard Deviation | Sample Standard Deviation | Population Standard Deviation |
Calculating Population Standard Deviation |
STDEV.P or STDEVP
|
numpy.std(data) (no ddof specified) |
Calculating Sample Standard Deviation |
STDEV function (default) |
numpy.std(data, ddof=1) |
The method of calculating standard deviation differs depending on the tool you use. In Google Sheets, the default calculation is the sample standard deviation, so you need to use a different function (STDEV.P
) for the population standard deviation. On the other hand, Python's NumPy library calculates the population standard deviation by default, and you need to specify ddof=1
to calculate the sample standard deviation.
Understanding these differences and choosing the appropriate function or parameter for your specific needs is crucial to ensure accurate results.
Discussion