17.2: Buffering Algorithm

Last updated
Save as PDF

Page ID: 19968

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The first step when developing an algorithm is to understand the problem. We will assume the file is already open and available for reading. For our buffering problem, we wish to provide a myGetLine() function to a calling routine.

The routine might be called as follows:

       status = myGetLine(fileDescriptor, textLine, MAXLEN);

The file opening and associated error checking is required but would be addressed separately and is outlined in Chapter 13, System Services. Once a file descriptor is obtained, it can be made available to the myGetLine() function. It is shown here as an explicit parameter for clarity.

As you may already be aware there is no special end of file code or character. We must infer the end of file based on the number of characters actually read. The file read system service will return the actual number of characters read. If the process requests 100,000 characters to be read and less than 100,000 are read, the end of file has been reached and all characters have been read. Further attempts at reading the file will generate an error. While it is easy to recognize the end of file and thus the last time the file needs to be read, the remaining characters in the buffer need to be processed. So while the program may know the end of file has been found, the program must continue until the buffer is depleted.

The calling routine should not need to know any of the details regarding the buffering, buffer management, or file operations. If a line of text from the file is available, the myGetLine() function will return the line from the file and a TRUE status. The line is returned via the passed argument. An argument for the maximum length of the text line is also provided to ensure that the text line array is not overwritten. If a line is not available, possibly due to all lines having been already returned or an unrecoverable read error, the function should return FALSE. An unrecoverable read error might occur if the storage medium goes off-line (drive removed), the file is deleted from another console, or a hardware error occurs. Such errors will make continued reading impossible. In such an event, an error message will be displayed and a FALSE returned which will halt the program (with incomplete results).

In general, the function would read characters, up to and including the LF, from the large buffer and place them in the line buffer, named textLine in the above example call.

This is easily done when there are characters available in the buffer. It will require an extra step to ensure that the buffer has characters. The file must be read if the buffer is empty, either due to the very first call or when all characters in the buffer have been returned. It is possible to check if characters are available by checking the current location of the next character to read in the buffer against the buffer size. The current location cannot be allowed to exceed the buffer size (or we would be accessing a location past the end of the buffer array). This is fairly straightforward when the buffer is filled entirely. The buffer may be partially filled due to a file size smaller than the buffer size or on the last file read after a series of file reads. To address this, the end of the buffer, initially set to the buffer size, must be set based on the actual number of characters read. There is a special case when the number of characters in the file is an exact multiple of the buffer size. If this occurs, the returned number of characters read is 0. An additional check is required to address this possibility.

Now that the problem and all the associated subtle issues are understood, we can take the next step of developing an algorithm. In general this process is typically iterative. That is, a first cut is developed, reviewed and refined. This would occur multiple times until a comprehensive algorithm is obtained. While some try to rush through this step, it is a mistake. Saving minutes on the algorithm development step means extra hours of debugging.

Based on this earlier example, the passed arguments include;

      ; file descriptor → file descriptor for open file, 
      ;     as required for read system service
      ; text line → starting address of
      ; max length → maximum length of the array for
      ;     the text line

Some variables are required and defined as follows:

    ; BUFFER_SIZE → parameter for size of buffer
    ; currIndex → index for current location in
    ;     the buffer, initially set to BUFFER_SIZE 
    ; buffMaximum → current maximum size of buffer, 
    ;     initially set to BUFFER_SIZE
    ; eofFlag → boolean to mark if the end of file 
    ;     has been found, initially set to false

Based on an understanding of the problem, a number of different algorithmic approaches are possible. Using the arguments and local variables, one such approach is presented for reference.

    ; myGetLine(fileDescriptor, textLine, maxLength) {
    ;     repeat {
    ;        if current index 3 buffer maximum        
    ;           read buffer (buffer size)
    ;           if error           
    ;               handle read error
    ;               display error message
    ;               exit routine (with false)
    ;           reset pointers
    ;           if chars read < characters request read
    ;               set eofFlag = TRUE
    ;       get one character from buffer at current index
    ;       place character in text line buffer
    ;       increment current index
    ;       if character is LF
    ;           exit with true
    ;     }
    ; }

This algorithm outline does not verify that the text line buffer is not overwritten nor does it handle the case when the file size is an exact multiple of the buffer size. Refining, implementing, and testing the algorithm is left to the reader as an exercise.

As presented, this algorithm will require statically declared variables. Stack dynamic variables cannot be used here since the information is required to be maintained between successive calls.

The variables and algorithm are provided as program comments. The algorithm is the most important part of the program documentation and will not only help when writing the code but in the debugging process. Whatever time you think you are saving by skipping the documentation, it will cost far, far more time when debugging.