10.3: Streaming over Collections

Last updated
Save as PDF

Page ID: 36386

Andrew P. Black, Stéphane Ducasse, Oscar Nierstrasz, Damien Pollet

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Streams are really useful when dealing with collections of elements. They can be used for reading and writing elements in collections. We will now explore the stream features for the collections.

Reading collections

This section presents features used for reading collections. Using a stream to read a collection essentially provides you a pointer into the collection. That pointer will move forward on reading and you can place it wherever you want. The class ReadStream should be used to read elements from collections.

Methods next and next: are used to retrieve one or more elements from the collection.

stream := ReadStream on: #(1 (a b c) false).
stream next.        →    1
stream next.        →    #(#a #b #c)
stream next.        →    false

stream := ReadStream on: 'abcdef'.
stream next: 0.    →    ''
stream next: 1.    →    'a'
stream next: 3.    →    'bcd'
stream next: 2.    →    'ef'

The message peek is used when you want to know what is the next element in the stream without going forward.

stream := ReadStream on: '-143'.
negative := (stream peek = $-).        "look at the first element without reading it"
negative.        →    true
negative ifTrue: [stream next].        "ignores the minus character"
number := stream upToEnd.
number.          →    '143'

This code sets the boolean variable negative according to the sign of the number in the stream and number to its absolute value. The method upToEnd returns everything from the current position to the end of the stream and sets the stream to its end. This code can be simplified using peekFor:, which moves forward if the following element equals the parameter and doesn’t move otherwise.

stream := '-143' readStream.
(stream peekFor: $-)        →    true
stream upToEnd              →    '143'

peekFor: also returns a boolean indicating if the parameter equals the element.

You might have noticed a new way of constructing a stream in the above example: one can simply send readStream to a sequenceable collection to get a reading stream on that particular collection.

Positioning. There are methods to position the stream pointer. If you have the index, you can go directly to it using position:. You can request the current position using position. Please remember that a stream is not positioned on an element, but between two elements. The index corresponding to the beginning of the stream is 0.

You can obtain the state of the stream depicted in Figure \(\PageIndex{1}\) with the following code:

stream := 'abcde' readStream.
stream position: 2.
streampeek            →    $c

Figure \(\PageIndex{1}\): A stream at position 2.

To position the stream at the beginning or the end, you can use reset or setToEnd. skip: and skipTo: are used to go forward to a location relative to the current position: skip: accepts a number as argument and skips that number of elements whereas skipTo: skips all elements in the stream until it finds an element equal to its parameter. Note that it positions the stream after the matched element.

stream := 'abcdef' readStream.
stream next.            →    $a    "stream is now positioned just after the a"
stream skip: 3.                    "stream is now after the d"
stream position.        →    4
stream skip: -2.                   "stream is after the b"
stream position.        →    2
stream reset.
stream position.        →    0
stream skipTo: $e.                 "stream is just after the e now"
stream next.            →    $f
stream contents.        →    'abcdef'

As you can see, the letter e has been skipped.

The method contents always returns a copy of the entire stream.

Testing. Some methods allow you to test the state of the current stream: atEnd returns true if and only if no more elements can be read whereas isEmpty returns true if and only if there is no element at all in the collection.

Here is a possible implementation of an algorithm using atEnd that takes two sorted collections as parameters and merges those collections into another sorted collection:

stream1 := #(1 4 9 11 12 13) readStream.
stream2 := #(1 2 3 4 5 10 13 14 15) readStream.

"The variable result will contain the sorted collection."
result := OrderedCollection new.
[stream1 atEnd not & stream2 atEnd not]
    whileTrue: [stream1 peek < stream2 peek
        "Remove the smallest element from either stream and add it to the result."
        ifTrue: [result add: stream1 next]
        ifFalse: [result add: stream2 next]].

"One of the two streams might not be at its end. Copy whatever remains."
result
    addAll: stream1 upToEnd;
    addAll: stream2 upToEnd.

result.    →    an OrderedCollection(1 1 2 3 4 4 5 9 10 11 12 13 13 14 15)

Writing to collections

We have already seen how to read a collection by iterating over its elements using a ReadStream. We’ll now learn how to create collections using WriteStreams.

WriteStreams are useful for appending a lot of data to a collection at various locations. They are often used to construct strings that are based on static and dynamic parts as in this example:

stream := String new writeStream.
stream
    nextPutAll: 'This Smalltalk image contains: ';
    print: Smalltalk allClasses size;
    nextPutAll: ' classes.';
    cr;
    nextPutAll: 'This is really a lot.'.

stream contents.    →    'This Smalltalk image contains: 2322 classes.
This is really a lot.'

This technique is used in the different implementations of the method printOn: for example. There is a simpler and more efficient way of creating streams if you are only interested in the content of the stream:

string := String streamContents:
    [:stream |
        stream
            print: #(1 2 3);
            space;
            nextPutAll: 'size';
            space;
            nextPut: $=;
            space;
            print: 3. ].
string.    →    '#(1 2 3) size = 3'

The method streamContents: creates a collection and a stream on that collection for you. It then executes the block you gave passing the stream as a parameter. When the block ends, streamContents: returns the content of the collection.

The following WriteStream methods are especially useful in this context:

nextPut: adds the parameter to the stream;
nextPutAll: adds each element of the collection, passed as a parameter, to the stream;
print: adds the textual representation of the parameter to the stream.

There are also methods useful for printing different kinds of characters to the stream like space, tab and cr (carriage return). Another useful method is ensureASpace which ensures that the last character in the stream is a space; if the last character isn’t a space it adds one.

About Concatenation. Using nextPut: and nextPutAll: on a WriteStream is often the best way to concatenate characters. Using the comma concatenation operator (,) is far less efficient:

[| temp |
    temp := String new.
    (1 to: 100000)
        do: [:i | temp := temp, i asString, ' ']] timeToRun    →    115176 "(milliseconds)" 

[| temp |
    temp := WriteStream on: String new.
    (1 to: 100000)
        do: [:i | temp nextPutAll: i asString; space].
    temp contents] timeToRun    →    1262 "(milliseconds)"

The reason that using a stream can be much more efficient is that comma creates a new string containing the concatenation of the receiver and the argument, so it must copy both of them. When you repeatedly concatenate onto the same receiver, it gets longer and longer each time, so that the number of characters that must be copied goes up exponentially. This also creates a lot of garbage, which must be collected. Using a stream instead of string concatenation is a well-known optimization. In fact, you can use streamContents: to help you do this:

String streamContents: [ :tempStream |
    (1 to: 100000)
        do: [:i | tempStream nextPutAll: i asString; space]]

Reading and writing at the same time

It’s possible to use a stream to access a collection for reading and writing at the same time. Imagine you want to create an History class which will manage backward and forward buttons in a web browser. A history would react as in figures from \(\PageIndex{2}\) to \(\PageIndex{8}\).

Figure \(\PageIndex{2}\): A new history is empty. Nothing is displayed in the web browser.

Figure \(\PageIndex{3}\): The user opens to page 1.

This behaviour can be implemented using a ReadWriteStream.

Figure \(\PageIndex{4}\): The user clicks on a link to page 2.

Figure \(\PageIndex{5}\): The user clicks on a link to page 3.

Figure \(\PageIndex{6}\): The user clicks on the back button. He is now viewing page 2 again.

Figure \(\PageIndex{7}\): The user clicks again the back button. Page 1 is now displayed.

Figure \(\PageIndex{8}\): From page 1, the user clicks on a link to page 4. The history forgets pages 2 and 3.

Object subclass: #History
    instanceVariableNames: 'stream'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'SBE-Streams'

History>>initialize
    super initialize.
    stream := ReadWriteStream on: Array new.

Nothing really difficult here, we define a new class which contains a stream. The stream is created during the initialize method.

We need methods to go backward and forward:

History>>goBackward
    self canGoBackward ifFalse: [self error: 'Already on the first element'].
    stream skip: -2.
    ↑ self next.

History>>goForward
    self canGoForward ifFalse: [self error: 'Already on the last element'].
    ↑ stream next

Until then, the code was pretty straightforward. Now, we have to deal with the goTo: method which should be activated when the user clicks on a link. A possible solution is:

History>>goTo: aPage
    stream nextPut: aPage.

This version is incomplete however. This is because when the user clicks on the link, there should be no more future pages to go to, i.e., the forward button must be deactivated. To do this, the simplest solution is to write nil just after to indicate the history end:

History>>goTo: anObject
    stream nextPut: anObject.
    stream nextPut: nil.
    stream back.

Now, only methods canGoBackward and canGoForward have to be implemented.

A stream is always positioned between two elements. To go backward, there must be two pages before the current position: one page is the current page, and the other one is the page we want to go to.

History>>canGoBackward
    ↑ stream position > 1

History>>canGoForward
    ↑ stream atEnd not and: [stream peek notNil]

Let us add a method to peek at the contents of the stream:

History>>contents
    ↑ stream contents

And the history works as advertised:

History new
    goTo: #page1;
    goTo: #page2;
    goTo: #page3;
    goBackward;
    goBackward;
    goTo: #page4;
    contents    →    #(#page1 #page4 nil nil)

Search

Text Color

Text Size

Margin Size

Font Type