Menu
Home
Log in / Register
 
Home arrow Computer Science arrow Data Structures and Algorithms with Python
< Prev   CONTENTS   Next >

1.16 Reading XML Files

XML files would be difficult to read if we had to read them like we read a regular text file. This is especially true because XML files are not line-oriented. They conform to the XML grammar, but the grammar does not specify anything about the lines in the file. Instead of reading an XML file by reading lines of the file, we use a special tool called a parser. A parser is written according to the rules of a grammar, in this case the XML grammar. There are many XML parsers that have been written and different parsers have different features. The one we will use in this text is one of the simpler parsers called minidom. The minidom parser reads an entire XML file by calling the parse method on it. It places the entire contents of an XML file into an sequence of Element objects. An Element object contains the child data and attributes of an XML element along with any other elements that might be defined inside this element.

To use the minidom parser, you must first import the module where the minidom parser is defined.

import xml.dom.minidom

Then, you can read an entire XML file by calling the parse method on an XML document as follows.

xmldoc = xml.dom.minidom.parse(filename)

Once you have done that, you can read a specific type of element from the XML file by calling the method getElementsByTagName on it. For instance, to get the GraphicsCommands element from the graphics commands XML file, you would write this.

graphicsCommands = xmldoc.getElementsByTagName("GraphicsCommands")[0]

The XML document contains the GraphicsCommands element. Calling getElementsByTagName on GraphicsCommands returns a list of all elements that match this tag name. Since we know there is only one of these tags in the file, we can write [0] to get the first element from the list. Then, the graphicsCommands element contains just the one element from the file and all the Command elements of the file are located within it. If we want to go through all these elements we can use a for loop as in the code in Sect. 1.16.1.

1.16.1 Using an XML Parser

1 for commandElement in graphicsCommands:

2 print(type(commandElement))

3 command = commandElement.firstChild.data.strip()

4 attr = commandElement.attributes

5 if command == "GoTo":

6 x = float(attr["x"].value)

7 y = float(attr["y"].value)

8 width = float(attr["width"].value)

9 color = attr["color"].value.strip()

10 cmd = GoToCommand(x,y,width,color)

11

12 elif command == "Circle":

13 radius = float(attr["radius"].value)

14 width = float(attr["width"].value)

15 color = attr["color"].value.strip()

16 cmd = CircleCommand(radius,width,color)

17

18 elif command == "BeginFill":

19 color = attr["color"].value.strip()

20 cmd = BeginFillCommand(color)

21

22 elif command == "EndFill":

23 cmd = EndFillCommand()

24

25 elif command == "PenUp":

26 cmd = PenUpCommand()

27

28 elif command == "PenDown":

29 cmd = PenDownCommand()

30 else:

31 raise RuntimeError("Unknown Command: " + command)

32

33 self.append(cmd)

In the code in Sect. 1.16.1 the attr variable is a dictionary mapping the attribute names (i.e. keys) to their associated values. The child data of a Command node can be found by looking at the firstChild.data for the node. The strip method is used to strip away any unwanted blanks, tabs, or newline characters that might appear in the string.

 
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Philosophy
Political science
Psychology
Religion
Sociology
Travel