Skip to main content
Engineering LibreTexts

19.2: Multiprocessing

  • Page ID
    19982
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    As noted in Chapter 2, Architecture Overview, most current CPU chips include multiple cores. The CPU cores all have equal access to the main memory resource. Multiprocessing is a form of parallel processing that specifically refers to using multiple cores to perform simultaneous execution of multiple processes.

    Focusing on a single large project, a main or initial process might generate subprocesses, referred to as threads(For more information, refer to: http://en.Wikipedia.org/wiki/Thread_(computing)). When the initial process is started and placed in memory by the system loader(For more information, refer to Chapter 5, Tool Chain), the Operating System creates a new process including the required Operating System data structures, memory allocations, and page/swap file allotments. A thread is often referred to as a light-weight process since it will use the initial process’s allocations and address space thus making it quicker to create. It will also be quicker to terminate as the deallocations are not performed. Since the threads share memory with the initial process and any other threads, this presents the potential for very fast communication between simultaneously executing threads. It also has the added requirement that the communications in the form of memory writes be carefully coordinated to ensure results are not corrupted. Such a problem is referred to as a race condition(For more information, refer to: http://en.Wikipedia.org/wiki/Race_condition). Race conditions are addressed in the following section.

    The threaded approach will not scale to the extent of the distributed approach. CPU chips have a limited number of cores which places an upper bound on the number of simultaneous computations that can occur. This limit does not apply to distributed computing. The limitations regarding network communication speeds do not apply to threaded computations.

    POSIX Threads

    POSIX Threads(For more information, refer to: http://en.Wikipedia.org/wiki/POSIX_Threads), commonly called pThreads, is a widely available thread library on Ubuntu and many other operating systems. The example in this section will use the pThreads thread library Application Programmer Interface (API) for creating and joining threads.

    The initial or main process is created during the load process. The main process may create additional threads with the pthread_create() library function. While there is no specific limit to the number of threads that can be created, there is a limit to the number of cores available thus creating a practical limit for the number of threads.

    The initial or main process can see if the thread has completed with the pthread_join() library function. If the thread has not completed, the join function will wait until it does. Ideally, in order to maximize the overall parallel operations, the main process would perform other computations while the thread or threads are executing and only check for thread completion when the other work has been completed.

    There are other pThread functions to address mutexes(For more information, refer to: http://en.Wikipedia.org/wiki/Mutual_exclusion), condition variables, and synchronization(For more information, refer to: http://en.Wikipedia.org/wiki/Synchro...puter_science)) between threads which are not addressed in this very brief overview.

    It should be noted that there are other approaches to threading not addressed here.

    Race Conditions

    A race condition is a generic term referring to when multiple threads simultaneously write to the same location at the same time. Threaded programming is typically done in a high-level language. However, fully understanding the specific cause of a race condition is best done at the assembly language level.

    A simple example is provided to purposely create a race condition and examine the problem in detail. This example is not computationally useful.

    Assuming we wish to compute the following formula MAX times, where MAX is a defined constant. For example;

    \[myValue = \left (\dfrac{myValue}{X} \right ) + Y\nonumber\]

    We could write a high-level language program something along the lines of:

           for (int i=0; i < MAX; i++)
               myValue = (myValue / X) + Y;
    

    If we wish to speed-up this computation, perhaps because MAX is extremely large, we might create two thread functions, each to perform MAX/2 computations. Each of the thread functions would have shared access to the variables myValue, X, and Y. Thus, the code might be;

           for (int i=0; i < MAX/2; i++)
               myValue = (myValue / X) + Y;
    

    This code would be repeated in each of the thread functions. It may not be obvious, but assuming both threads are simultaneously executing, this will cause a race condition on the myValue variable. Specifically, each of the two threads are attempting to update the variable simultaneously and some of the updates to the variable may be lost.

    To further simplify this example, we will assume that X and Y are both set to 1. As such, the result of each calculation would be to increment myValue by 1. If myValue is initialized to 0, repeating the calculation MAX times, should result in myValue being set to MAX. This simplification will allow easy verification of the results. It is not computationally useful in any meaningful way.

    Implementing this in assembly would first require a main program that creates each of the two threads. Then, each of the thread functions would need to be created. In this very simple example, each thread function would be the same (performing MAX/2 iterations and that MAX is an even number).

    Assuming one of the thread functions is named threadFunction0() and given the below pThread thread data structure;

        pthreadID0     dd     0, 0, 0, 0, 0 

    The following code fragment would create and start threadFunction0() executing.

        ; pthread_create (&pthreadID0, NULL,
        ;                      &threadFunction0, NULL);
            mov    rdi, pthreadID0
            mov    rsi, NULL
            mov    rdx, threadFunction0
            mov    rcx, NULL
            call   pthread_create
    

    This would need to be performed for each thread function.

    The following code fragment will check if a thread function is completed;

        ; pthread_join (pthreadID0, NULL); 
            mov     rdi, qword [pthreadID0] 
            mov     rsi, NULL
            call    pthread_join 

    If the thread function is not done, the join call will wait until it is completed. The thread function, threadFunction0(), itself might contain the following code;

    ; -----
    
    global threadFunction0
    threadFunction0:
    
    ;  Perform MAX / 2 iterations to update myValue.
        mov    rcx, MAX
        shr    rcx, 1                    ; divide by 2
        mov    r10, qword [x]
        mov    r11, qword [y]
        
    incLoop0:
        ; myValue = (myValue / x) + y
        
        mov    rax, qword [myValue]
        cqo
        div    r10
        add    rax, r11
        mov    qword [myValue], rax
        loop   incLoop0
        
        ret    
    

    The code for the second thread function would be similar.

    ; divide by 2

    If both threads are simultaneously executing, they are both trying to update the myValue variable. For example, assuming that thread function 0 is executing on core 0 and thread function 1 is executing on core 1 (arbitrarily chosen), the following execution trace is possible;

    Step

    Code: Core 0, Thread 0

    Code: Core 1, Thread 1

    1

    mov  rax, qword [myValue]
    

    2

    cqo

    mov  rax, qword [myValue]
    

    3

    div qword [x]

    cqo

    4

    add  rax, qword [y]
    

    div qword [x]

    5

    mov  qword [myValue], rax
    
    add  rax, qword [y]
    

    6

    mov  qword [myValue], rax
    

    As a reminder, each core has its own set of registers. Thus, the core 0 rax register is different than the core 1 rax register.

    If the variable myValue is currently at 730, two thread executions should increase it to 732. On core 0 code, at step 1 the 730 is copied into core 0, rax. On core 1, at step 2, the 730 is copied into core 1, rax. As execution progresses, steps 2-4 are performed and core 0 rax is incremented from 730 to 731. During this time, on core 1, steps 1-3 are completed and the 730 is also incremented to 731. As the next step, step 5, is executed on core 0, the 731 is written to the myValue variable. As the next step, step 6 is executed on core 1, the value 731 is again written to the myValue variable. Two execution traces should have incremented the variable twice. However, since the value was obtained in core 1 before core 0 was able to write the final value, one of the increments was lost or duplicated. The end result of this overlap is that the final value of myValue will not be correct. And, since the amount of overlapping execution is not predictable, the degree of incorrectness is not easily predicted and may vary between different executions.

    For example, if the value of MAX is fairly small, such as 10,000, there would not likely be any overlap. Each thread would start and complete so quickly, that the chances of overlapping execution would be very small. The problem still exists but would likely appear mostly correct on such executions, making it easy to ignore occasional anomalous output. As MAX is increased, the odds of overlapping execution increase.

    Such problems can be very challenging to debug and require an understanding of the low-level operations and the technical architecture.


    This page titled 19.2: Multiprocessing is shared under a CC BY-NC-SA license and was authored, remixed, and/or curated by Ed Jorgensen.

    • Was this article helpful?