While the file handle does not contain the data for the file, it is quite easy to construct a
for loop to read through and count each of the lines in a file:
fhand = open('mbox-short.txt') count = 0 for line in fhand: count = count + 1 print('Line Count:', count) # Code: http://www.py4e.com/code3/open.py
We can use the file handle as the sequence in our
for loop. Our
for loop simply counts the number of lines in the file and prints them out. The rough translation of the
for loop into English is, "for each line in the file represented by the file handle, add one to the
The reason that the
open function does not read the entire file is that the file might be quite large with many gigabytes of data. The
open statement takes the same amount of time regardless of the size of the file. The
for loop actually causes the data to be read from the file.
When the file is read using a
for loop in this manner, Python takes care of splitting the data in the file into separate lines using the newline character. Python reads each line through the newline and includes the newline as the last character in the
line variable for each iteration of the
for loop reads the data one line at a time, it can efficiently read and count the lines in very large files without running out of main memory to store the data. The above program can count the lines in any size file using very little memory since each line is read, counted, and then discarded.
If you know the file is relatively small compared to the size of your main memory, you can read the whole file into one string using the
read method on the file handle.
>>> fhand = open('mbox-short.txt') >>> inp = fhand.read() >>> print(len(inp)) 94626 >>> print(inp[:20]) From stephen.marquar
In this example, the entire contents (all 94,626 characters) of the file
mbox-short.txt are read directly into the variable
inp. We use string slicing to print out the first 20 characters of the string data stored in
When the file is read in this manner, all the characters including all of the lines and newline characters are one big string in the variable inp. Remember that this form of the
openfunction should only be used if the file data will fit comfortably in the main memory of your computer.
If the file is too large to fit in main memory, you should write your program to read the file in chunks using a