Skip to main content
Engineering LibreTexts

12.3: Retrieving an image over HTTP

  • Page ID
    3210
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    In the above example, we retrieved a plain text file which had newlines in the file and we simply copied the data to the screen as the program ran. We can use a similar program to retrieve an image across using HTTP. Instead of copying the data to the screen as the program runs, we accumulate the data in a string, trim off the headers, and then save the image data to a file as follows:

    Code 12.3.1 (Python)
    %%python3
    
    import socket
    import time
    
    HOST = 'data.pr4e.org'
    PORT = 80
    mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    mysock.connect((HOST, PORT))
    mysock.sendall(b'GET http://data.pr4e.org/cover3.jpg HTTP/1.0\\r\\n\\r\\n')
    count = 0
    picture = b''
    
    while True:
        data = mysock.recv(5120)
        if (len(data) < 1): break
        time.sleep(0.25)
        count = count + len(data)
        print(len(data), count)
        picture = picture + data
    
    mysock.close()
    
    # Look for the end of the header (2 CRLF)
    pos = picture.find(b\'\\r\\n\\r\\n\')
    print('Header length', pos)
    print(picture[:pos].decode())
    
    # Skip past the header and save the picture data
    picture = picture[pos+4:]
    fhand = open('stuff.jpg', 'wb')
    fhand.write(picture)
    fhand.close()
    
    # Code: http://www.py4e.com/code3/urljpeg.py
    
    

    When the program runs it produces the following output:

    $ python urljpeg.py
    2920 2920
    1460 4380
    1460 5840
    1460 7300
    ...
    1460 62780
    1460 64240
    2920 67160
    1460 68620
    1681 70301
    Header length 240
    HTTP/1.1 200 OK
    Date: Sat, 02 Nov 2013 02:15:07 GMT
    Server: Apache
    Last-Modified: Sat, 02 Nov 2013 02:01:26 GMT
    ETag: "19c141-111a9-4ea280f8354b8"
    Accept-Ranges: bytes
    Content-Length: 70057
    Connection: close
    Content-Type: image/jpeg

    You can see that for this url, the Content-Type header indicates that body of the document is an image (image/jpeg). Once the program completes, you can view the image data by opening the file stuff.jpg in an image viewer.

    As the program runs, you can see that we don't get 5120 characters each time we call the recv() method. We get as many characters as have been transferred across the network to us by the web server at the moment we call recv(). In this example, we either get 1460 or 2920 characters each time we request up to 5120 characters of data.

    Your results may be different depending on your network speed. Also note that on the last call to recv() we get 1681 bytes, which is the end of the stream, and in the next call to recv() we get a zero-length string that tells us that the server has called close() on its end of the socket and there is no more data forthcoming.

    We can slow down our successive recv() calls by uncommenting the call to time.sleep(). This way, we wait a quarter of a second after each call so that the server can "get ahead" of us and send more data to us before we call recv() again. With the delay, in place the program executes as follows:

    $ python urljpeg.py
    1460 1460
    5120 6580
    5120 11700
    ...
    5120 62900
    5120 68020
    2281 70301
    Header length 240
    HTTP/1.1 200 OK
    Date: Sat, 02 Nov 2013 02:22:04 GMT
    Server: Apache
    Last-Modified: Sat, 02 Nov 2013 02:01:26 GMT
    ETag: "19c141-111a9-4ea280f8354b8"
    Accept-Ranges: bytes
    Content-Length: 70057
    Connection: close
    Content-Type: image/jpeg

    Now other than the first and last calls to recv(), we now get 5120 characters each time we ask for new data.

    There is a buffer between the server making send() requests and our application making recv() requests. When we run the program with the delay in place, at some point the server might fill up the buffer in the socket and be forced to pause until our program starts to empty the buffer. The pausing of either the sending application or the receiving application is called "flow control".


    This page titled 12.3: Retrieving an image over HTTP is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Chuck Severance via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.