Skip to main content
Engineering LibreTexts

17.4: Sorting DataFrames

  • Page ID
    39316
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Sorting a DataFrame is largely like sorting a Series, except we have more choices: instead of just the keys and the values, we have the index and potentially many different columns.

    The .sort_index() method works just like it did for Serieses:

    Code \(\PageIndex{1}\) (Python):

    print(simpsons.sort_index())

    | species age gender fave IQ hair salary

    | name

    | Bart human 10 M skateboard 90.0 buzz 0.0

    | Homer human 36 M beer 74.0 none 52000.0

    | Lisa human 8 F saxophone 200.0 curly 0.0

    | Maggie human 1 F pacifier 100.0 curly 0.0

    | Marge human 34 F helping others 120.0 stacked tall 0.0

    | SLH dog 4 M NaN 30.0 shaggy 0.0

    The result is rows sorted alphabetically by name. And I hate to keep repeating myself, but remember that .sort_index() returns a modified copy, unless you pass the inplace=True argument. The ascending=False argument is also allowed, and will sort by the index highest-to-lowest instead of lowest-to-highest.

    To sort by one of the columns, we call .sort_values() and pass it the column name:

    Code \(\PageIndex{2}\) (Python):

    print(simpsons.sort_values('IQ')

    | species age gender fave IQ hair salary

    | name

    | SLH dog 4 M NaN 30.0 shaggy 0.0

    | Homer human 36 M beer 74.0 none 52000.0

    | Bart human 10 M skateboard 90.0 buzz 0.0

    | Maggie human 1 F pacifier 100.0 curly 0.0

    | Marge human 34 F helping others 120.0 stacked tall 0.0

    | Lisa human 8 F saxophone 200.0 curly 0.0

    Sometimes we want to include more than one column in the sort. Why? As a tie-breaker. Consider sorting a roster for a student club, which has first_name and last_name columns, among other things. We might want to sort the list alphabetically by last name, but for students with the same last name, we should go to the first name as a tie-breaker (so that Angela Smith shows up after Velma Patterson but before Brad Smith).

    To do this, we pass a list of columns, instead of a single column:

    Code \(\PageIndex{3}\) (Python):

    print(simpsons.sort_values(['gender','hair','IQ']))

    | species age gender fave IQ hair salary

    | name

    | Maggie human 1 F pacifier 100.0 curly 0.0

    | Lisa human 8 F saxophone 200.0 curly 0.0

    | Marge human 34 F helping others 120.0 stacked tall 0.0

    | Bart human 10 M skateboard 90.0 buzz 0.0

    | Homer human 36 M beer 74.0 none 52000.0

    | SLH dog 4 M NaN 30.0 shaggy 0.0

    Here, we said “sort the rows alphabetically by gender. For rows with the same gender, use hair as a tie-breaker. And for rows with the same gender and the same hair, use IQ as a second tiebreaker.” Glance at that output and convince yourself that it’s correct.

    We control the “ascendingness” of the multi-column sort by specifying a list of each ascending value, one for each column we’re sorting by. Consider this:

    Code \(\PageIndex{4}\) (Python):

    print(simpsons.sort_values(['gender','hair','IQ'],

    ascending=[False,True,False]))

    | species age gender fave IQ hair salary

    | name

    | Bart human 10 M skateboard 90.0 buzz 0.0

    | Homer human 36 M beer 74.0 none 52000.0

    | SLH dog 4 M NaN 30.0 shaggy 0.0

    | Lisa human 8 F saxophone 200.0 curly 0.0

    | Maggie human 1 F pacifier 100.0 curly 0.0

    | Marge human 34 F helping others 120.0 stacked tall 0.0

    Now we’re saying “sort reverse alphabetically by gender, breaking ties by comparing hair alphabetically, and breaking further ties by reverse sorted order by IQ.”

    Oh, and the inplace=True argument works for all these examples as well.


    This page titled 17.4: Sorting DataFrames is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.