9.2: The Varieties of Collections
- Page ID
- 36379
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)To make good use of the collection classes, the reader needs at least a superficial knowledge of the wide variety of collections that they implement, and their commonalities and differences.
Programming with collections rather than individual elements is an important way to raise the level of abstraction of a program. The Lisp function map
, which applies an argument function to every element of a list and returns a new list containing the results is an early example of this style, but Smalltalk-80 adopted collection-based programming as a central tenet. Modern functional programming languages such as ML and Haskell have followed Smalltalk’s lead.
Why is this a good idea? Suppose you have a data structure containing a collection of student records, and wish to perform some action on all of the students that meet some criterion. Programmers raised to use an imperative language will immediately reach for a loop. But the Smalltalk programmer will write:
students select: [ :each | each gpa < threshold ]
which evaluates to a new collection containing precisely those elements of students for which the bracketed function returns true
.1 The Smalltalk code has the simplicity and elegance of a domain-specific query language.
The message select:
is understood by all collections in Smalltalk. There is no need to find out if the student data structure is an array or a linked list: the select:
message is understood by both. Note that this is quite different from using a loop, where one must know whether students is an array or a linked list before the loop can be set up.
In Smalltalk, when one speaks of a collection without being more specific about the kind of collection, one means an object that supports well-defined protocols for testing membership and enumerating the elements. All collections understand the testing messages includes:
, isEmpty
and occurrencesOf:
. All collections understand the enumeration messages do:
, select:
, reject:
(which is the opposite of select:
), collect:
(which is like lisp’s map
), detect:ifNone:
, inject:into:
(which performs a left fold) and many more. It is the ubiquity of this protocol, as well as its variety, that makes it so powerful.
Protocol | Methods |
---|---|
accessing | size, capacity, at: anIndex, at: anIndex put: anElement |
testing | isEmpty, includes: anElement, contains: aBlock, occurrencesOf: anElement |
adding | add: anElement, addAll: aCollection |
removing | remove: anElement, remove: anElement ifAbsent: aBlock, removeAll: aCollection |
enumerating | do: aBlock, collect: aBlock, select: aBlock, reject: aBlock, detect: aBlock, detect: aBlock ifNone: aNoneBlock, inject: aValue into: aBinaryBlock |
converting | asBag, asSet, asOrderedCollection, asSortedCollection, asArray, asSortedCollection: aBlock |
creation | with: anElement, with:with:, with:with:with:, with:with:with:with:, withAll: aCollection |
Table \(\PageIndex{1}\) summarizes the standard protocols supported by most of the classes in the collection hierarchy. These methods are defined, redefined, optimized or occasionally even forbidden by subclasses of Collection
.
Beyond this basic uniformity, there are many different kinds of collection either supporting different protocols, or providing different behaviour for the same requests. Let us briefly survey some of the key differences:
- Sequenceable: Instances of all subclasses of
SequenceableCollection
start from a first element and proceed in a well-defined order to a last element. Instances ofSet
,Bag
andDictionary
, on the other hand, are not sequenceable. - Sortable: A
SortedCollection
maintains its elements in sort order. - Indexable: Most sequenceable collections are also indexable, that is, elements can be retrieved with
at:
.Array
is the familiar indexable data structure with a fixed size;anArray at: n
retrieves the nth element ofanArray
, andanArray at: n put: v
changes the nth element tov
.LinkedLists
andSkipLists
are sequenceable but not indexable, that is, they understandfirst
andlast
, but notat:
. - Keyed: Instances of
Dictionary
and its subclasses are accessed by keys instead of indices. - Mutable: Most collections are mutable, but
Interval
s andSymbol
s are not. AnInterval
is an immutable collection representing a range ofInteger
s. For example,5 to: 16 by: 2
is an interval that contains the elements 5, 7, 9, 11, 13 and 15. It is indexable withat:
, but cannot be changed withat:put:
. - Growable: Instances of
Interval
andArray
are always of a fixed size. Other kinds of collections (sorted collections, ordered collections, and linked lists) can grow after creation. - Accepts duplicates: A
Set
will filter out duplicates, but aBag
will not.Dictionary
,Set
andBag
use the = method provided by the elements; theIdentity
variants of these classes use the == method, which tests whether the arguments are the same object, and thePluggable
variants use an arbitrary equivalence relation supplied by the creator of the collection. - Heterogeneous: Most collections will hold any kind of element. A
String
,CharacterArray
orSymbol
, however, only holdsCharacter
s. AnArray
will hold any mix of objects, but aByteArray
only holdsByte
s, anIntegerArray
only holdsInteger
s and aFloatArray
only holdsFloat
s. ALinkedList
is constrained to hold elements that conform to theLink ⊳ accessing
protocol.
- The expression in brackets can be thought of as a λ-expression defining an anonymous function λx.x gpa < threshold.