In the above example, we retrieved a plain text file which had newlines in the file and we simply copied the data to the screen as the program ran. We can use a similar program to retrieve an image across using HTTP. Instead of copying the data to the screen as the program runs, we accumulate the data in a string, trim off the headers, and then save the image data to a file as follows:
import socket import time HOST = 'data.pr4e.org' PORT = 80 mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) mysock.connect((HOST, PORT)) mysock.sendall(b'GET http://data.pr4e.org/cover3.jpg HTTP/1.0\r\n\r\n') count = 0 picture = b"" while True: data = mysock.recv(5120) if (len(data) < 1): break time.sleep(0.25) count = count + len(data) print(len(data), count) picture = picture + data mysock.close() # Look for the end of the header (2 CRLF) pos = picture.find(b"\r\n\r\n") print('Header length', pos) print(picture[:pos].decode()) # Skip past the header and save the picture data picture = picture[pos+4:] fhand = open("stuff.jpg", "wb") fhand.write(picture) fhand.close() # Code: http://www.py4e.com/code3/urljpeg.py
When the program runs it produces the following output:
$ python urljpeg.py 2920 2920 1460 4380 1460 5840 1460 7300 ... 1460 62780 1460 64240 2920 67160 1460 68620 1681 70301 Header length 240 HTTP/1.1 200 OK Date: Sat, 02 Nov 2013 02:15:07 GMT Server: Apache Last-Modified: Sat, 02 Nov 2013 02:01:26 GMT ETag: "19c141-111a9-4ea280f8354b8" Accept-Ranges: bytes Content-Length: 70057 Connection: close Content-Type: image/jpeg
You can see that for this url, the
Content-Type header indicates that body of the document is an image (
image/jpeg). Once the program completes, you can view the image data by opening the file
stuff.jpg in an image viewer.
As the program runs, you can see that we don't get 5120 characters each time we call the
recv() method. We get as many characters as have been transferred across the network to us by the web server at the moment we call
recv(). In this example, we either get 1460 or 2920 characters each time we request up to 5120 characters of data.
Your results may be different depending on your network speed. Also note that on the last call to
recv() we get 1681 bytes, which is the end of the stream, and in the next call to
recv() we get a zero-length string that tells us that the server has called
close() on its end of the socket and there is no more data forthcoming.
We can slow down our successive
recv() calls by uncommenting the call to
time.sleep(). This way, we wait a quarter of a second after each call so that the server can "get ahead" of us and send more data to us before we call
recv() again. With the delay, in place the program executes as follows:
$ python urljpeg.py 1460 1460 5120 6580 5120 11700 ... 5120 62900 5120 68020 2281 70301 Header length 240 HTTP/1.1 200 OK Date: Sat, 02 Nov 2013 02:22:04 GMT Server: Apache Last-Modified: Sat, 02 Nov 2013 02:01:26 GMT ETag: "19c141-111a9-4ea280f8354b8" Accept-Ranges: bytes Content-Length: 70057 Connection: close Content-Type: image/jpeg
Now other than the first and last calls to
recv(), we now get 5120 characters each time we ask for new data.
There is a buffer between the server making
send() requests and our application making
recv() requests. When we run the program with the delay in place, at some point the server might fill up the buffer in the socket and be forced to pause until our program starts to empty the buffer. The pausing of either the sending application or the receiving application is called "flow control".