17.5: Summary statistics for DataFrames

Last updated
Save as PDF

Page ID: 39317

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Summary statistics like the mean, median, minimum/maximum, and the like, can of course all be computed on individual columns of a DataFrame, because each column is a Series:

Code \(\PageIndex{1}\) (Python):

print(simpsons['IQ'].median())

| 95.0

Code \(\PageIndex{2}\) (Python):

print(simpsons['salary'].sum())

| 52000.0

You can also, believe it or not, compute the sum/mean/max/etc on the entire DataFrame. This computes it on every column individually:

Code \(\PageIndex{3}\) (Python):

print(simpsons.mean())

| age 15.500000

| IQ 102.333333

| salary 8666.666667

| dtype: float64

Pandas left out the non-numeric columns (species, gender, etc.) and computed the mean of each of the others, giving us a Series containing their values.

Finally, I often find the .describe() method useful:

Code \(\PageIndex{4}\) (Python):

print(simpsons.describe())

| count 6.000000 6.000000 6.000000

| mean 15.500000 102.333333 8666.666667

| std 15.436969 56.645094 21228.911104

| min 1.000000 30.000000 0.000000

| 25% 5.000000 78.000000 0.000000

| 50% 9.000000 95.000000 0.000000

| 75% 28.000000 115.000000 0.000000

| max 36.000000 200.000000 52000.000000

Neat! We get the number of values, the mean, the standard deviation, and all the quartiles for each of the numeric columns. Lots of dashboard information at a glance!