Skip to main content
Engineering LibreTexts

13.12: Exercises

  • Page ID
    42391
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Exercise \(\PageIndex{1}\)

    The “rank” of a word is its position in a list of words sorted by frequency: the most common word has rank 1, the second most common has rank 2, etc.

    Zipf’s law describes a relationship between the ranks and frequencies of words in natural languages (http://en.Wikipedia.org/wiki/Zipf's_law). Specifically, it predicts that the frequency, \(f\), of the word with rank \(r\) is:

    \[ f = c r^{-s} \nonumber \]

    where \(s\) and \(c\) are parameters that depend on the language and the text. If you take the logarithm of both sides of this equation, you get:

    \[ \log{f} = \log{c} - s \log{r} \nonumber \]

    So if you plot \( \log{f} \) versus \( \log{r} \), you should get a straight line with slope \( -s \) and intercept \( \log{c} \).

    Write a program that reads a text from a file, counts word frequencies, and prints one line for each word, in descending order of frequency, with \( \log{f} \) and \( \log{r} \). Use the graphing program of your choice to plot the results and check whether they form a straight line. Can you estimate the value of \( s \)?

    Solution

    http://thinkpython2.com/code/zipf.py. To run my solution, you need the plotting module matplotlib. If you installed Anaconda, you already have matplotlib; otherwise you might have to install it.


    This page titled 13.12: Exercises is shared under a CC BY-NC 3.0 license and was authored, remixed, and/or curated by Allen B. Downey (Green Tea Press) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

    • Was this article helpful?