Interview Questions

What is parsing and how do I do it in XML

XML Interview Questions and Answers


(Continued from previous question...)

47. What is parsing and how do I do it in XML

Parsing is the act of splitting up information into its component parts (schools used to teach this in language classes until the teaching profession collectively caught the anti-grammar disease).
‘Mary feeds Spot’ parses as
1. Subject = Mary, proper noun, nominative case
2. Verb = feeds, transitive, third person singular, present tense
3. Object = Spot, proper noun, accusative case
In computing, a parser is a program (or a piece of code or API that you can reference inside your own programs) which analyses files to identify the component parts. All applications that read input have a parser of some kind, otherwise they'd never be able to figure out what the information means. Microsoft Word contains a parser which runs when you open a .doc file and checks that it can identify all the hidden codes. Give it a corrupted file and you'll get an error message.
XML applications are just the same: they contain a parser which reads XML and identifies the function of each the pieces of the document, and it then makes that information available in memory to the rest of the program.
While reading an XML file, a parser checks the syntax (pointy brackets, matching quotes, etc) for well-formedness, and reports any violations (reportable errors). The XML Specification lists what these are.
Validation is another stage beyond parsing. As the component parts of the program are identified, a validating parser can compare them with the pattern laid down by a DTD or a Schema, to check that they conform. In the process, default values and datatypes (if specified) can be added to the in-memory result of the validation that the validating parser gives to the application.

<person corpid="abc123" birth="1960-02-31" gender="female"> <name> <forename>Judy</forename> <surname>O'Grady</surname> </name> </person>
The example above parses as: 1. Element person identified with Attribute corpid containing abc123 and Attribute birth containing 1960-02-31 and Attribute gender containing female containing ...
2. Element name containing ...
3. Element forename containing text ‘Judy’ followed by ...
4. Element surname containing text ‘O'Grady’
(and lots of other stuff too).
As well as built-in parsers, there are also stand-alone parser-validators, which read an XML file and tell you if they find an error (like missing angle-brackets or quotes, or misplaced markup). This is essential for testing files in isolation before doing something else with them, especially if they have been created by hand without an XML editor, or by an API which may be too deeply embedded elsewhere to allow easy testing.

(Continued on next question...)

Other Interview Questions