# 13.12: Exercises

$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

Exercise $$\PageIndex{1}$$

The “rank” of a word is its position in a list of words sorted by frequency: the most common word has rank 1, the second most common has rank 2, etc.

Zipf’s law describes a relationship between the ranks and frequencies of words in natural languages (http://en.Wikipedia.org/wiki/Zipf's_law). Specifically, it predicts that the frequency, $$f$$, of the word with rank $$r$$ is:

$f = c r^{-s} \nonumber$

where $$s$$ and $$c$$ are parameters that depend on the language and the text. If you take the logarithm of both sides of this equation, you get:

$\log{f} = \log{c} - s \log{r} \nonumber$

So if you plot $$\log{f}$$ versus $$\log{r}$$, you should get a straight line with slope $$-s$$ and intercept $$\log{c}$$.

Write a program that reads a text from a file, counts word frequencies, and prints one line for each word, in descending order of frequency, with $$\log{f}$$ and $$\log{r}$$. Use the graphing program of your choice to plot the results and check whether they form a straight line. Can you estimate the value of $$s$$?

Solution

http://thinkpython2.com/code/zipf.py. To run my solution, you need the plotting module matplotlib. If you installed Anaconda, you already have matplotlib; otherwise you might have to install it.

This page titled 13.12: Exercises is shared under a CC BY-NC 3.0 license and was authored, remixed, and/or curated by Allen B. Downey (Green Tea Press) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.