Menu
Home
Log in / Register
 
Home arrow Computer Science arrow Python Programming Fundamentals
< Prev   CONTENTS   Next >

3.8 Reading Records from a File

It is frequently the case that a file contains more than one line that relate to each other in some way. For example, consider an address book program. Each entry in your address book may contain last name, first name, street, city, zip code, home phone number, and mobile number. Typically, each of these pieces of information would be stored on a separate line in a file. A program that reads such a file would need to read all these lines together and a for loop will not suffice. In this case it can be done if we use a while loop.A while loop looks like this:

< statements before while loop > while < condition >:

< body of while loop >

< statements after the while loop >

The condition of the while loop is evaluated first. If the condition evaluates to true, then the body of the while loop is executed. The condition is evaluated again and if the condition evaluates to true, the body of the while loop is performed again. The body of the while loop is repeated until the condition evaluates to false. It is possible the body of the while loop will never be executed if the condition evaluates to false the first time as graphically depicted in Fig. 3.7.

A while loop is used to read records from a file that are composed of multiple lines. A for loop will not suffice because a for loop only reads one line per iteration. Since multiple lines must be read, a while loop gives you the extra control you need. To read a multi-line record from a file we can use this pattern:

Fig. 3.7 A While Loop

< read first line from first record > while < line > != "":

< read the rest of the record >

< process the record >

< read the first line of the next record >

< close the file >

This pattern can be illustrated by looking at part of an address book application where each address book record resides on 6 lines of a file.

Example 3.14 Here is a program that counts the number of entries in your phonebook. This assumes that the file looks something like the following:

Lie

Sophus

2234 Valdres Rd

Decorah , IA 52101 777 -555 -1234

777 -554 -4765

Lee

Kent D.

700 College Drive Decorah , IA 52101 777 -555 -1212

777 -554 -0789

...

To read this file and count the entries the code would look like this:

1 phonebook = open (" addressbook . txt " ,"r")

2 numEntries = 0

3 # reads the first line of the first record

4 lastName = phonebook . readline (). rstrip ()

5 while lastName != "":

6 # when the file is completely read the lastName string

7 # will be empty . Since the lastName wasn 't an empty

8 # string , read the rest of the record .

9 firstName = phonebook . readline (). rstrip ()

10 street = phonebook . readline (). rstrip ()

11 citystatezip = phonebook . readline (). rstrip ()

12 homephone = phonebook . readline (). rstrip ()

13 mobilephone = phonebook . readline (). rstrip ()

14

15 # Process the record by adding to the accumulator

16 numEntries = numEntries + 1

17

18 # Read the first line of the next record

19 lastName = phonebook . readline (). rstrip ()

20

21 print (" You have " , numEntries ," entries in your address book .")

The code in Example 3.14 reads the first line of a record, or at least it tries to. Every opened file has a current position that is set to the beginning of the file when the file is opened. As lines are read from the file, the current position advances through the file. When the current position is at the end of the file, the program in Example 3.14 will attempt to read one more line on either line 4 or line 19, depending on whether the file is empty or not. When the current position is at the end and it attempts to read a line, the lastName variable will be a reference to an empty string. This is the indication in Python that the current position is at the end of file sometimes abbreviated EOF. When this happens the code exits the while loop and prints the output on line 21. If the lastName variable is not empty, then the code assumes that because one line was present, all six lines will be present in the file. The code depends on each record being a six line record in the input file called addressbook.txt.

When you read a line from a file using the readline method you not only get the data on that line, but you also get the newline character at the end of the line in the file. The use of the rstrip method on the string read by readline strips away any white space from the right end of the string. If you need to look at the data at all you probably don't want the newline character on the end of each line of the record.

Whether you are writing code in Python or some other language, this Reading Records From a File pattern comes up over and over again. It is sometimes called the loop and a half problem. The idea is that you must attempt to read a line from the file before you know whether you are at the end of file or not. This can also be done if a boolean variable is introduced to help with the while loop. This boolean variable is the condition that gets you out of the while loop and the first time through it must be set to get your code to execute the while loop at least one.

Example 3.15 As with nearly every program, there is more than one way to do the same thing. The loop and a half code can be written differently as well. Here is another variation that while slightly different, accomplishes the same thing as Example 3.14.

1 phonebook = open ("addressbook.txt","r")

2 numEntries = 0

3 eof = False

4 w h i l e no t eof:

5 # when the file is completely read the lastName string

6 # will be empty. So will the other lines , but if the

7 # lastName is empty then we know not to process the record.

8 lastName = phonebook.readline (). rstrip ()

9 firstName = phonebook.readline (). rstrip ()

10 street = phonebook.readline (). rstrip ()

11 citystatezip = phonebook.readline (). rstrip ()

12 homephone = phonebook.readline (). rstrip ()

13 mobilephone = phonebook.readline (). rstrip ()

14

15 # if lastName is empty then we didn 't really read a record.

16 i f lastName !="":

17 # Process the record by adding to the accumulator

18 numEntries = numEntries + 1

19 e l s e :

20 eof = True

21 p r i n t ("You have", numEntries ,"entries in your address book.")

Examples 3.14 and 3.15 do exactly the same thing. They each perform a loop and a half. The half part is one half of the body of the loop. In Example 3.14 this was reading the lastName variable before the loop started. In Example 3.15 this was the first half of the body of the while loop. Some may feel one is easier to memorize than the other. Some experienced programmers may even prefer another way of writing the loop and a half. The important thing is that one of these patterns should be memorized. You can use it any time you need to read multi-line records from a file.

William Edward Deming was a mathematician and consultant who is widely recognized as an important contributor to the rebuilding of Japan after the second world war [15]. One of his principles emphasized that you should not repeat the same process in more than one location. In Computer Science this translates to “You should avoid writing the same code in more than one location in your program”. If you write code more than once and have to make a change later, you have to remember to change it in every location. If you've only written the code once, you only have to remember to change it in that one location. Copying code within your program increases the risk of there being a bug introduced by changing only some of the locations and not all of them when new function is being added or when a bug is being fixed. This guiding principle should be followed whenever possible. Example 3.14 appears to violate this principle with one line of repeated code. That's the tradeoff for not having to include an extra if statement in the body of the while loop as was done in Example 3.15.

 
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Philosophy
Political science
Psychology
Religion
Sociology
Travel