Skip to main content
Engineering LibreTexts

11.1: The Pandas Series

  • Page ID
    39271
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    A Series is conceptually a set of key-value pairs. The keys are normally homogeneous, and so are the values, although the keys might be of a different type than the values. Any of the three atomic types are permissible for either.

    Somewhat confusing is that the Pandas package calls the keys “the index,” which is an overlap with the term we used for ordinary arrays. It’s not a total loss, though, since if you think hard about it, you’ll realize that in some sense, a regular array is really just an associative array with consecutive integer keys. Oooo, deep. If you study the two halves of Figure 11.1.1, I think you’ll agree.

    clipboard_e974e5aed0099f3deec4bc0084708282b.png

    Figure \(\PageIndex{1}\): An ordinary array, and an associative array, that represent the same information.

    Creating Serieses

    Here are a few common ways of creating a Pandas Series object in memory.

    Way 1: create an empty Series

    Perhaps this first one sounds dumb, but we will indeed have occasion to start off with an empty Series and then add key/value pairs to it from there. The code is simple:

    Code \(\PageIndex{1}\) (Python):

    my_new_series = pd.Series()

    Voilà.

    Way 2: pd.Series([], index=[])

    As with NumPy ndarrays, we can explicitly list the values we want in a new Series. We also have to list the index values (the keys). The syntax for doing so is:

    Code \(\PageIndex{2}\) (Python):

    alter_egos = pd.Series(['Hulk','Spidey','Iron Man','Thor'], index=['Bruce','Peter','Tony','Thor'])

    This creates the Series shown in Figure 11.1.2.

    clipboard_e73b96d6ac7295563190db1bb06c6acda.png

    Be careful to keep all your boxies and bananas straight. Note that both the keys and the values are in their own sets of boxies.

    We can print (smallish) Serieses to the screen to inspect their contents:

    Code \(\PageIndex{3}\) (Python):

    print(alter_egos)

    | Bruce Hulk

    | Peter Spidey

    | Tony Iron Man

    | Thor Thor

    | dtype: object

    Also, as we did on p. 63, we can inquire as to both the overarching type of alter_egos and also to the kind of underlying data it contains:

    Code \(\PageIndex{4}\) (Python):

    print(type(alter_egos))

    print(alter_egos.dtype)

    | pandas.core.series.Series

    | object

    Just as it did on p. 71, the “object” here is just a confusing way of saying “str”. Don’t read anything more into it than that.

    Way 3: “wrapping” an array

    Associative arrays, and the Pandas Serieses we’ve been using to implement them, are inherently one-dimensional data structures. This is just like the NumPy arrays we used before. Pandas Serieses also provide a bunch of features for manipulating, querying, computing, and even graphing aspects of their content. It’s a lot of rich stuff on top of plain-old NumPy.

    For this reason, it’s common to want to create a Series that just “wraps” (or encloses) an underlying NumPy ndarray, and provides all that rich stuff.

    The way to do this is simple:

    Code \(\PageIndex{5}\) (Python):

    my_numpy_array = np.array(['Ghost','Pumpkin','Vampire','Witch'])

    my_pandas_enhanced_thang = pd.Series(my_numpy_array)

    You can then treat my_pandas_enhanced_thang as an ordinary aggregate variable which has the more sophisticated operations of next chapter automatically glommed on to it. The keys (index values) of this thang will simply be the integers 0 through 3.

    Way 4: pd.read_csv()

    Finally, there’s reading data from a text flie, which as I mentioned back in section 8.2 is actually the most common. Data typically resides in sources and files external to our programming environment, and we want to do everything we can to play ball with this open universe.

    One common data format is called CSV, which stands for comma-separated values. Files in this format are normally named with a “.csv” extension. As the name suggests, the lines in such a file consist of values separated by commas. For example, suppose there’s a file called disney_rides.csv whose contents looked like this:


    Pirates of the Carribean,25

    Small World,20

    Peter Pan,29


    These are the current expected wait time (in minutes) for each of these Disney World rides at some point of the day.

    To read this into Python, we use the pd.read_csv() function. It’s a bit awkward since it has several mandatory arguments if you want to deal with Serieses. Here’s how it works:

    Code \(\PageIndex{6}\) (Python):

    wait_times = pd.read_csv('disney_rides.csv', index_col=0, squeeze=True, header=None)

    Most of that junk is just to memorize for now, not to fully understand. If you’re curious, index_col=0 tells Pandas that the first (0th) column – namely, the ride names – should be treated as the index for the Series. The header=None means “there is no separate header row at the top of the file, Pandas, so don’t try to treat it like one.” If our .csv file did have a summary row at the top, containing labels for the two columns, then we’d skip the header=None part. Finally, “squeeze=True” tells Pandas, “since this is so skinny anyway – just two columns – let’s have pd.read_csv() return us a Series, rather than a more complex DataFrame object (which is the subject of Chapter 16).”


    This page titled 11.1: The Pandas Series is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.