Skip to main content
Engineering LibreTexts

18.3: Looping with DataFrames

  • Page ID
    39322
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Just as we wrote for loops to iterate through the elements of a NumPy array (section 14.3) and a Pandas Series (section 14.4), so we can iterate through the rows of a DataFrame. We’ll do so using the weirdly-named .itertuples() method.

    By using .itertuples() in the loop header, we have available to us in the loop body a Series representing the current row. (“Current row” just means “the row we’re on as we successively go through all the rows in sequence.”) We can access individual elements of it by using column names and the dot (or boxie) syntax, as follows:

    Code \(\PageIndex{1}\) (Python):

    for row in simpsons.itertuples():

    print("A certain {}-year old has {} hair.".format(row.age, row.hair))

    | A certain 36-year old has none hair.

    | A certain 34-year old has stacked tall hair.

    | A certain 10-year old has buzz hair.

    | A certain 8-year old has curly hair.

    | A certain 1-year old has curly hair.

    | A certain 4-year old has shaggy hair.

    I called the loop variable “row” because it represents a row of the DataFrame (duh), although you can call it anything you want to. (I could have named it “family_member” or “Simpson” instead.)

    This is very intuitive. Slightly less intuitive is that if we want the index column (in simpsons, it’s called “name,” remember) we can’t use the name of the index column. Instead, we have to literally say “.Index,” and yes that’s a capital ‘I’.

    To illustrate:

    Code \(\PageIndex{2}\) (Python):

    for family_member in simpsons.itertuples():

    print("{} Simpson, {} years of age, has {} hair.".format( family_member.Index, family_member.age, family_member.hair))

    | Homer Simpson, 36 years of age, has none hair.

    | Marge Simpson, 34 years of age, has stacked tall hair.

    | Bart Simpson, 10 years of age, has buzz hair.

    | Lisa Simpson, 8 years of age, has curly hair.

    | Maggie Simpson, 1 years of age, has curly hair.

    | SLH Simpson, 4 years of age, has shaggy hair.


    This page titled 18.3: Looping with DataFrames is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.