XML is a widely-used text-based format for data as well as instructions. This format arranges information in a hierarchical fashion.
An XML document is made of tags. There are two types of tags: start and end. A start tag is of the format
1. Each start tag has a corresponding end tag, with the same tag name. For example:
This is an HTML page
2. As the ‘<'and '>‘ characters are used to start and end a tag, they are reserved (i.e. they cannot be included in a valid XML input except for this purpose).
3. Between a start tag and its corresponding end tag, information can be in the form of (a) another tag pair or (b) an arbitrary string or both. For example
4. Tags must be properly nested. That is, if tag A starts before tag B, then B must end before A. Thus B is completedly enclosed within A. For example:
5. A tag name can contain only characters ‘a-z’, ‘A-Z’, ‘0-9’, :, _, and -. However, a tag name cannot start with a number or -.
Valid examples include:
Invalid examples include:
6. The outermost tag pair is called the root. Each valid XML data has exactly one root. For example, the following is invalid:
This is a body
This is another body
7. White space characters (space, tab) are allowed at any place, except tag names.
An XML parser is a program that reads and parses an XML format assuming the above format, and produces some result. This result may include simply validating whether the given text forms valid XML or not.
When an XML document must be transmitted over a network, it is received character-by-character. In this assignment you will implement a parser that receives text one-character-at-a-time, validates it and produces additional textual outputs.
Note: The actual XML specification has many other features, but for this assignment, we will consider only the above features.
2.2 The XMLParser interface
You have been provided an XMLParser interface. This interface contains two methods:
· A method that takes a single character as input, and returns an XMLParser object that is the result of parsing this character along with all others input before it. The returned object makes it possible to chain inputs:
xmlObj.input(‘<').input('h').input('t')... This method also throws a custom InvalidXMLException when the input character causes the inputs given thus far to be invalid XML. · A method that returns the output of the parser as a String. The nature and format of this output depends on the implementation. 2.3 XML Validator You must write an implementation XMLValidator that implements the provided XMLParser interface. This class acts as a validator of XML, reporting whether the characters given to it collectively form valid XML. This implementation’s behavior should have the following characteristics: 1. It should check all of the characteristics above regarding valid XML. 2. The output method should return a single word that represents the current status of the input provided thus far. If no inputs have been provided yet, the method should return "Status:Empty". If the inputs provided form complete, valid XML (i.e. all tag names are valid, each start tag has a corresponding end tag, tags are properly nested, root tag occurs only once) then the method should return "Status:Valid". If the inputs thus far can be part of valid XML but the data is not yet complete (e.g. part of any of the above valid examples) it should return "Status:Incomplete". 3. It should throw the InvalidXMLException at the input character that causes the XML to be invalid. For example, if the inputs are '<','h','t','m' ,'<','>‘ then the outputs after each of the first four characters should be “Status:Incomplete” and it should throw an exception with the fifth input. The parser becomes unusable after this (i.e. its behavior if inputs are continued is undefined).
2.4 XML Logger
An XML parser can be used to not only check the validity of XML input but also to parse and extract data from it.
You must write a XMLInfoLogger class that implements the XMLParser interface. This class provides a more elaborate output in the form of a log of the tags and data as it detects them.
This implementation’s behavior should have the following characteristics:
1. The output method should return a string that represents the parts of the input that have been successfully processed up to this point:
a. If a start tag
b. If an end tag
c. If there are characters that are not part of a tag name, it should add Characters: followed by the characters verbatim to the output, all on one line (except if the characters include new lines), only if these characters are followed by a valid start or end tag. This includes whitespace characters.
2. This class should check for all the validity constraints and throw exceptions in the same manner as specified in earlier sections.
*There should be a new line after the last line.*
For each of the examples below, the output represents the output after all the characters in the input have been entered. Please note that these inputs are partial (i.e. there may be more characters after them, which will change the output accordingly).
Highlight the outputs with the mouse to see whitespace characters.
This is a body
Characters: This is a body
This is \n a body < *Output:* Started:html *Input:* This is a body
Characters: This is a body
_ This is a heading
Characters: This is a heading
*You are not allowed to use any existing XML parsing classes in your implementations!*