Skip to main content
Engineering LibreTexts

18.1: Queries

  • Page ID
    39320
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Back in section 13.1 (p. 124), we learned how to write simple queries to selectively match only certain elements of a Series. The same technique is available to us with DataFrames, only it’s more powerful since there are more columns to work with at a time.

    Let’s return to the Simpsons example from p. 174, which is reproduced here:

    | species age gender fave IQ hair salary

    | name

    | Homer human 36 M beer 74.0 none 52000.0

    | Marge human 34 F helping others 120.0 stacked tall 0.0

    | Bart human 10 M skateboard 90.0 buzz 0.0

    | Lisa human 8 F saxophone 200.0 curly 0.0

    | Maggie human 1 F pacifier 100.0 curly 0.0

    | SLH dog 4 M NaN 100.0 shaggy 0.0

    We can filter this on certain rows by including a query in boxies:

    Code \(\PageIndex{1}\) (Python):

    adults = simpsons[simpsons.age > 18]

    | species age gender fave IQ hair salary

    | name

    | Homer human 36 M beer 74.0 none 52000.0

    | Marge human 34 F helping others 120.0 stacked tall 0.0

    As with Serieses, we can’t forget to repeat the name of the variable (“simpsons”) before giving the query criteria (“> 18”). Unlike with Serieses, we also specify a column name (“.age”) that we want to query.

    We can also provide compound conditions in just the same way as before (section 13.1, p. 128). If we want only human children, we say:

    Code \(\PageIndex{2}\) (Python):

    kids = simpsons[(simpsons.age <= 18) & (simpsons.species == "human")]

    | species age gender fave IQ hair salary

    | name

    | Bart human 10 M skateboard 90.0 buzz 0.0

    | Lisa human 8 F saxophone 200.0 curly 0.0

    | Maggie human 1 F pacifier 100.0 curly 0.0

    whereas if we want everybody who’s smart and/or old, we say:

    Code \(\PageIndex{3}\) (Python):

    old_andor_wise = simpsons[(simpsons.IQ > 100) | (simpsons.age > 30)]

    | species age gender fave IQ hair salary

    | name

    | Homer human 36 M beer 74.0 none 52000.0

    | Marge human 34 F helping others 120.0 stacked tall 0.0

    | Lisa human 8 F saxophone 200.0 curly 0.0

    To narrow it down to only specific columns, we can combine our query with the syntax from section 17.1 (p. 177). You see, our query gave us another (shorter) DataFrame as a result, which has the same rights and privileges as any other DataFrame. So tacking on another pair of boxies gives us just a column:

    Code \(\PageIndex{4}\) (Python):

    print(simpsons[simpsons.age > 18]['fave'])

    | name

    | Homer beer

    | Marge helping others

    | Name: fave, dtype: object

    while tacking on double boxies gives us columns:

    Code \(\PageIndex{5}\) (Python):

    print(simpsons[simpsons.age > 18])

    | fave gender IQ

    | name

    | Homer beer M 74.0

    | Marge helping others F 120.0

    Note that in the first of these cases, we got a Series back, whereas in the second (with the double boxies) we got a DataFrame with multiple columns. Combining all these operations takes practice, but lets you slice and dice a DataFrame up in innumerable different ways.


    This page titled 18.1: Queries is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.