Skip to main content
Engineering LibreTexts

18.1: Queries

  • Page ID
    39320
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

    Back in section 13.1 (p. 124), we learned how to write simple queries to selectively match only certain elements of a Series. The same technique is available to us with DataFrames, only it’s more powerful since there are more columns to work with at a time.

    Let’s return to the Simpsons example from p. 174, which is reproduced here:

    |                        species   age     gender       fave           IQ         hair        salary

    |    name

    |   Homer       human    36         M         beer           74.0       none        52000.0

    |   Marge       human    34         F    helping others     120.0   stacked tall        0.0

    |   Bart        human    10         M      skateboard        90.0       buzz            0.0

    |   Lisa        human     8         F      saxophone        200.0      curly            0.0

    | Maggie       human     1         F        pacifier       100.0      curly            0.0

    |   SLH         dog       4         M          NaN          100.0      shaggy           0.0

    We can filter this on certain rows by including a query in boxies:

    Code \(\PageIndex{1}\) (Python):

    adults = simpsons[simpsons.age > 18]

     

    |                        species   age     gender       fave           IQ         hair        salary

    |    name

    |   Homer       human    36         M         beer           74.0       none        52000.0

    |   Marge       human    34         F    helping others     120.0   stacked tall        0.0

     

    As with Serieses, we can’t forget to repeat the name of the variable (“simpsons”) before giving the query criteria (“> 18”). Unlike with Serieses, we also specify a column name (“.age”) that we want to query.

    We can also provide compound conditions in just the same way as before (section 13.1, p. 128). If we want only human children, we say:

    Code \(\PageIndex{2}\) (Python):

    kids = simpsons[(simpsons.age <= 18) & (simpsons.species == "human")]

     

    |                        species   age     gender       fave           IQ         hair         salary

    |    name

    |   Bart        human    10         M      skateboard        90.0       buzz            0.0

    |   Lisa        human     8         F       saxophone       200.0      curly            0.0

    | Maggie       human     1         F        pacifier       100.0      curly            0.0

    whereas if we want everybody who’s smart and/or old, we say:

    Code \(\PageIndex{3}\) (Python):

    old_andor_wise = simpsons[(simpsons.IQ > 100) | (simpsons.age > 30)]

     

    |                        species   age     gender       fave           IQ         hair        salary

    |    name

    |   Homer       human    36         M         beer           74.0       none        52000.0

    |   Marge       human    34         F    helping others     120.0   stacked tall        0.0

    |   Lisa        human     8         F       saxophone       200.0      curly            0.0

     

    To narrow it down to only specific columns, we can combine our query with the syntax from section 17.1 (p. 177). You see, our query gave us another (shorter) DataFrame as a result, which has the same rights and privileges as any other DataFrame. So tacking on another pair of boxies gives us just a column:

    Code \(\PageIndex{4}\) (Python):

    print(simpsons[simpsons.age > 18]['fave'])

     

    |   name

    Homer       beer

    Marge  helping others

    Name: fave, dtype: object 

     

    while tacking on double boxies gives us columns:

    Code \(\PageIndex{5}\) (Python):

    print(simpsons[simpsons.age > 18])

     

    |                         fave            gender     IQ

     name

    |    Homer        beer             M       74.0

    |    Marge   helping others        F      120.0

     

    Note that in the first of these cases, we got a Series back, whereas in the second (with the double boxies) we got a DataFrame with multiple columns. Combining all these operations takes practice, but lets you slice and dice a DataFrame up in innumerable different ways.


    18.1: Queries is shared under a CC BY-SA license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) .

    • Was this article helpful?