Skip to main content
Engineering LibreTexts

13.4: Most common words

  • Page ID
    40804
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    To find the most common words, we can make a list of tuples, where each tuple contains a word and its frequency, and sort it.

    The following function takes a histogram and returns a list of word-frequency tuples:

    def most_common(hist):
        t = []
        for key, value in hist.items():
            t.append((value, key))
    
        t.sort(reverse=True)
        return t
    

    In each tuple, the frequency appears first, so the resulting list is sorted by frequency. Here is a loop that prints the ten most common words:

    t = most_common(hist)
    print('The most common words are:')
    for freq, word in t[:10]:
        print(word, freq, sep='\t')
    

    I use the keyword argument sep to tell print to use a tab character as a “separator”, rather than a space, so the second column is lined up. Here are the results from Emma:

    The most common words are:
    to      5242
    the     5205
    and     4897
    of      4295
    i       3191
    a       3130
    it      2529
    her     2483
    was     2400
    she     2364
    

    This code can be simplified using the key parameter of the sort function. If you are curious, you can read about it at https://wiki.python.org/moin/HowTo/Sorting.


    This page titled 13.4: Most common words is shared under a CC BY-NC 3.0 license and was authored, remixed, and/or curated by Allen B. Downey (Green Tea Press) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

    • Was this article helpful?