19.5: Sets
- Page ID
- 40847
In Section 13.6 I use dictionaries to find the words that appear in a document but not in a word list. The function I wrote takes d1
, which contains the words from the document as keys, and d2
, which contains the list of words. It returns a dictionary that contains the keys from d1
that are not in d2
.
def subtract(d1, d2): res = dict() for key in d1: if key not in d2: res[key] = None return res
In all of these dictionaries, the values are None
because we never use them. As a result, we waste some storage space.
Python provides another built-in type, called a set
, that behaves like a collection of dictionary keys with no values. Adding elements to a set is fast; so is checking membership. And sets provide methods and operators to compute common set operations.
For example, set subtraction is available as a method called difference
or as an operator, -
. So we can rewrite subtract
like this:
def subtract(d1, d2): return set(d1) - set(d2)
The result is a set instead of a dictionary, but for operations like iteration, the behavior is the same.
Some of the exercises in this book can be done concisely and efficiently with sets. For example, here is a solution to has_duplicates
, from Exercise 10.15.7, that uses a dictionary:
def has_duplicates(t): d = {} for x in t: if x in d: return True d[x] = True return False
When an element appears for the first time, it is added to the dictionary. If the same element appears again, the function returns True
.
Using sets, we can write the same function like this:
def has_duplicates(t): return len(set(t)) < len(t)
An element can only appear in a set once, so if an element in t
appears more than once, the set will be smaller than t
. If there are no duplicates, the set will be the same size as t
.
We can also use sets to do some of the exercises in Chapter 9. For example, here’s a version of uses_only
with a loop:
def uses_only(word, available): for letter in word: if letter not in available: return False return True
uses_only
checks whether all letters in word
are in available
. We can rewrite it like this:
def uses_only(word, available): return set(word) <= set(available)
The <=
operator checks whether one set is a subset of another, including the possibility that they are equal, which is true if all the letters in word
appear in available
.
As an exercise, rewrite avoids
using sets.