In this post, we'll look at a couple of statistics functions in Python. These statistics functions are part of the Python Standard Library in the statistics module. The four functions we'll use in this post are common in statistics:
- mean - average value
- median - middle value
- mode - most often value
- standard deviation - spread of values
To access Python's statistics functions, we need to import the functions from the statistics module using the statement:
from statistics import mean, median, mode, stdev
After the import statement, the functions mean()
, median()
, mode()
and stdev()
(standard deviation) can be used. Since the statistics module is part of the Python Standard Library, no external packages need to be installed.
Let's imagine we have a data set of 5 test scores. The test scores are 60
, 83
, 91
and 100
. These test scores can be stored in a Python list. Python lists are defined with square brackets [ ]
. Elements in Python lists are separated with commas.
from statistics import mean, median, mode, stdev
test_scores = [60 , 83, 83, 91, 100]
Calculate the mean¶
To calculate the mean, or average of our test scores, use the statistics module's mean()
function.
mean(test_scores)
Calculate the median¶
To calculate the median, or middle value of our test scores, use the statistics module's median()
function.
If there are an odd number of values, median()
returns the middle value. If there are an even number of values median()
returns an average of the two middle values.
median(test_scores)
83
Calculate the mode¶
To calculate the mode, or most often value of our test scores, use the statistics module's mode()
function.
If there is more than one number which occurs most often, mode()
returns an error.
>>> mode([1, 1, 2, 2, 3])
StatisticsError: no unique mode; found 2 equally common values
If there is no value that occurs most often (all the values are unique or occur the same number of times), mode()
also returns an error.
>>> mode([1,2,3])
StatisticsError: no unique mode; found 3 equally common values
mode(test_scores)
Calculate the standard deviation¶
To calculate the standard deviation, or spread of the test scores, use the statistics module's stdev()
function. A large standard deviation indicates the data is spread out; a small standard deviation indicates the data is clustered close together.
stdev(test_scores)
Alternatively, we can import the whole statistics module at once (all the functions in the staticsitics module) using the the line:
import statistics
Then to use the functions from the module, we need to call the names statistics.mean()
, statistics.median()
, statistics.mode()
, and statistics.stdev()
. See below:
import statistics
test_scores = [60 , 83, 83, 91, 100]
statistics.mean(test_scores)
statistics.median(test_scores)
statistics.mode(test_scores)
statistics.stdev(test_scores)
Summary¶
The statistics module is part of the Python Standard Library. To use statistics module functions, you first have to import the functions with the line from statistics import <function_name>
where <function_name>
is the name of the function you want to use. Then you can call the <function_name>()
and pass in a list of values.
The following functions are part of Python's statistics module:
statistics module function | name | description | example | result |
---|---|---|---|---|
mean() |
mean | mean or average | mean([1,4,5,5]) |
3.75 |
median() |
median | middle value | median([1,4,5,5]) |
4.5 |
mode() |
mode | most often | mode([1,4,5,5]) |
5 |
stdev() |
standard deviation | spread of data | stdev([1,4,5,5]) |
1.892 |