16.4: Read and Interpret the Results

Last updated
Save as PDF

Page ID: 43760

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The message tally profiler essentially provides two kinds of information:

execution time is represented using a tree representing the profiled code execution (**Tree**. Each node of this tree is annotated with the time spent in each leaf method (**Leaves**)).
memory activity contains the memory consumption (**Memory** and the garbage collector usage (**GC**).

For illustration purpose, let us consider the following scenario: the string character 'A' is cumulatively appended 9,000 times to an initial empty string.

MessageTally spyOn:
    [ 500 timesRepeat: [
            | str |
            str := ''.
            9000 timesRepeat: [ str := str, 'A' ]]].

The complete result is:

- 24038 tallies, 24081 msec.

**Tree**
--------------------------------
Process: (40s) 535298048: nil
--------------------------------
29.7% {7152ms} primitives
11.5% {2759ms} ByteString(SequenceableCollection)>>copyReplaceFrom:to:with:
 5.9% {1410ms} primitives
 5.6% {1349ms} ByteString class(String class)>>new:

**Leaves**
29.7% {7152ms} ByteString(SequenceableCollection)>>,
9.2% {2226ms} SmallInteger(Integer)>>timesRepeat:
5.9% {1410ms} ByteString(SequenceableCollection)>>copyReplaceFrom:to:with:
5.6% {1349ms} ByteString class(String class)>>new:
4.4% {1052ms} UndefinedObject>>DoIt

**Memory**
    old        +0 bytes
    young      +9,536 bytes
    used       +9,536 bytes
    free       -9,536 bytes

**GCs**
    full        0 totalling 0ms (0.0% uptime)
    incr        9707 totalling 7,985ms (16.0% uptime), avg 1.0ms
    tenures     0
    root table  0 overflows

The first line gives the overall execution time and the number of samplings (also called tallies, we will come back on sampling at the end of the chapter).

Tree: Cumulative information

The **Tree** section represents the execution tree per processes. The tree tells the time the Pharo interpreter has spent in each method. It also tells the different invocation using a call graph. Different execution flows are kept separated according to the process in which they have been executed. The process priority is also displayed, this helps distinguishing between different processes. The example tells:

**Tree**
--------------------------------
Process: (40s) 535298048: nil
--------------------------------
29.7% {7152ms} primitives
11.5% {2759ms} ByteString(SequenceableCollection)>>copyReplaceFrom:to:with:
5.9% {1410ms} primitives
5.6% {1349ms} ByteString class(String class)>>new:

This tree shows that the interpreter spent 29.7% of its time by executing primitives. 11.5% of the total execution time is spent in the method SequenceableCollection>>copyReplaceFrom:to:with:. This method is called when concatenating character strings using the message comma (,), itself indirectly invoking new: and some virtual machine primitives.

The execution takes 11.5% of the execution time, this means that the interpreter effort is shared with other processes. The invocation chain from the code to the primitives is relatively short. Reaching hundreds of nested calls is no exception for most of applications. We will optimize this example later on.

Two primitives are listed as tree leaves. These correspond to different primitives. Unfortunately, MessageTally does not tell exactly which primitive has been invoked.

Leaves: leaf methods

The ** Leaves** part lists leaf methods for the code block that has been profiled. A leaf method is a method that does not call other methods. More exactly, it is a method m for which no method invoked by \( \mathit{m} \) have been “detected”. This is the case for variable accessors (e.g., Point»x), primitive methods and methods that are very quickly executed. For the previous example, we have:

**Leaves**
29.7% {7152ms} ByteString(SequenceableCollection)>>,
9.2% {2226ms} SmallInteger(Integer)>>timesRepeat:
5.9% {1410ms} ByteString(SequenceableCollection)>>copyReplaceFrom:to:with:
5.6% {1349ms} ByteString class(String class)>>new:
4.4% {1052ms} UndefinedObject>>DoIt

Memory

The statistical part on memory consumption tells the observed changes on the quantity of memory allocated and the garbage collector usage. To fully understand this information, one needs to keep in mind that Pharo’s garbage collector (GC) is a scavenging GC, relying on the principle that an old object has greater change to live even longer. It is designed following the fact that an old object will probably be kept referenced in the future. On the contrary, a young object has greater change to be quickly dereferenced.

Several memory zones are considered and the migration of a young object to the space dedicated for old object is qualified as tenured. (Following the metaphor of American academic scientists, when a permanent position is obtained.)

An example of the memory analyze realized by MessageTally:

**Memory**
    old    +0 bytes
    young  +9,536 bytes
    used   +9,536 bytes
    free   -9,536 bytes

MessageTally describes the memory usage using four values:

the old value is about the grow of the memory space dedicated to old objects. An object is qualified as “old” when its physical memory location is in the “old memory space”. This happens when a full garbage collector is triggered, or when there are too many object survivors (according to some threshold specified in the virtual machine). This memory space is cleaned by a full garbage collection only. (An incremental GC does not reduce its size therefore).
An increase of the old memory space is likely to be due to a memory leak: the virtual machine is unable to release memory, promoting young objects as old.
the young value tells about the increase of the memory space dedicated to young objects. When an object is created, it is physically located in this memory space. The size of this memory space changes frequently.
the used value is the total amount of used memory.
the free value is the remaining amount of memory available.

In our example, none of the objects created during the execution have been promoted as old. 9 536 bytes are used by the current process, located in the young memory space. The amount of available memory has been reduced accordingly.

GCs

The **GCs** provides statistics about the garbage collector. An example of a garbage collector report is:

**GCs**
full        0 totalling 0ms (0.0% uptime)
incr        9707 totalling 7,985ms (16.0% uptime), avg 1.0ms
tenures     1 (avg 9707 GCs/tenure)
root table  0 overflows

Four values are available.

The full value totals the amount of full garbage collections and the amount of time it took. Full garbage collection are not that frequent. They results from often allocating large memory chunks.
The incr is about incremental garbage collections. Incremental GCs are usually frequent (several times per second) and quickly performed (about 1 or 2 ms). It is wise to keep the amount of time spent in incremental GCs below 10%.
The number of tenures tells the amount of objects that migrated to the old memory space. This migration happens when the size of the young memory space is above a given threshold. This typically happens when launching an application, when all the necessary objects haven’t been created and referenced.
The root table overflows is the amount of root objects used by the garbage collector to navigate the image. This navigation happens when the system is running short on memory and need to collect all the objects relevant for the future program execution. The overflow value identifies the rare situation when the number of roots used by the incremental GC is greater than the its internal table. This situation forces the GC to promote some objects as tenured.

In the example, we see that only the incremental GC is used. As we will subsequently see, the amount of created objects is quite relevant when one wants to optimize performances.

**Tree**: Cumulative information

**Leaves**: leaf methods

**Memory**

**GCs**

Tree: Cumulative information

Leaves: leaf methods

Memory

GCs