Skip to main content
Engineering LibreTexts

17.5: Summary statistics for DataFrames

  • Page ID
    39317
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Summary statistics like the mean, median, minimum/maximum, and the like, can of course all be computed on individual columns of a DataFrame, because each column is a Series:

    Code \(\PageIndex{1}\) (Python):

    print(simpsons['IQ'].median())

    | 95.0

    Code \(\PageIndex{2}\) (Python):

    print(simpsons['salary'].sum())

    | 52000.0

    You can also, believe it or not, compute the sum/mean/max/etc on the entire DataFrame. This computes it on every column individually:

    Code \(\PageIndex{3}\) (Python):

    print(simpsons.mean())

    | age 15.500000

    | IQ 102.333333

    | salary 8666.666667

    | dtype: float64

    Pandas left out the non-numeric columns (species, gender, etc.) and computed the mean of each of the others, giving us a Series containing their values.

    Finally, I often find the .describe() method useful:

    Code \(\PageIndex{4}\) (Python):

    print(simpsons.describe())

    | count 6.000000 6.000000 6.000000

    | mean 15.500000 102.333333 8666.666667

    | std 15.436969 56.645094 21228.911104

    | min 1.000000 30.000000 0.000000

    | 25% 5.000000 78.000000 0.000000

    | 50% 9.000000 95.000000 0.000000

    | 75% 28.000000 115.000000 0.000000

    | max 36.000000 200.000000 52000.000000

    Neat! We get the number of values, the mean, the standard deviation, and all the quartiles for each of the numeric columns. Lots of dashboard information at a glance!


    This page titled 17.5: Summary statistics for DataFrames is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.