Interview Questions

How can I allow the prefixes in my document to be different from the prefixes in my DTD?

XML Interview Questions and Answers


(Continued from previous question...)

92. How can I allow the prefixes in my document to be different from the prefixes in my DTD?

One of the problems with the solution proposed in question is that it requires the prefixes in the document to match those in the DTD. Fortunately, there is a workaround for this problem, although it does require that a single prefix be used for a particular namespace URI throughout the document. (This is a good practice anyway, so it's not too much of a restriction.) The solution assumes that you are using a DTD that is external to the document, which is common practice.
To use different prefixes in the external DTD and XML documents, you declare the prefix with a pair of parameter entities in the DTD. You can then override these entities with declarations in the internal DTD in a given XML document. This works because the internal DTD is read before the external DTD and the first definition of a particular entity is the one that is used. The following paragraphs describe how to use a single namespace in your DTD. You will need to modify them somewhat to use multiple namespaces.
To start with, declare three parameter entities in your DTD:

<!ENTITY % p "" >
<!ENTITY % s "" >
<!ENTITY % nsdecl "xmlns%s;" >

The p entity ("p" is short for "prefix") is used in place of the actual prefix in element type and attribute names. The s entity ("s" is short for "suffix") is used in place of the actual prefix in namespace declarations. The nsdecl entity ("nsdecl" is short for "namespace declaration") is used in place of the name of the xmlns attribute in declarations of that attribute.
Now use the p entity to define parameter entities for each of the names in your namespace. For example, suppose element type names A, B, and C and attribute name D are in your namespace.

<!ENTITY % A "%p;A">
<!ENTITY % B "%p;B">
<!ENTITY % C "%p;C">
<!ENTITY % D "%p;D">

Next, declare your element types and attributes using the "name" entities, not the actual names. For example:

<!ELEMENT %A; ((%B;)*, %C;)>
<!ATTLIST %A;
%nsdecl; CDATA "http://www.google.org/">
<!ELEMENT %B; EMPTY>
<!ATTLIST %B;
%D; NMTOKEN #REQUIRED
E CDATA #REQUIRED>
<!ELEMENT %C; (#PCDATA)>

There are several things to notice here.
* Attribute D is in a namespace, so it is declared with a "name" entity. Attribute E is not in a namespace, so no entity is used.
* The nsdecl entity is used to declare the xmlns attribute. (xmlns attributes must be declared on every element type on which they can occur.) Note that a default value is given for the xmlns attribute.
* The reference to element type B in the content model of A is placed inside parentheses. The reason for this is that a modifier -- * in this case -- is applied to it. Using parentheses is necessary because the replacement values of parameter entities are padded with spaces; directly applying the modifier to the parameter entity reference would result in illegal syntax in the content model.
For example, suppose the value of the A entity is "google:A", the value of the B entity is "google:B", and the value of the C entity is "google:C". The declaration:
<!ELEMENT %A; (%B;*, %C;)>
would resolve to:
<!ELEMENT google:A ( google:B *, google:C )>

This is illegal because the * modifier must directly follow the reference to the google:B element type. By placing the reference to the B entity in parentheses, the declaration resolves to:

<!ELEMENT google:A (( google:B )*, google:C )>

This is legal because the * modifier directly follows the closing parenthesis.

Now let's see how this all works. Suppose our XML document won't use prefixes, but instead wants the default namespace to be the http://www.google.org/ namespace. In this case, no entity declarations are needed in the document. For example, our document might be:

<!DOCTYPE A SYSTEM "http://www.google.org/google.dtd">
<A>
<B D="bar" E="baz buz" />
<B D="boo" E="biz bez" />
<C>bizbuz</C>
</A>

This document is valid because the declarations for p, s, and nsdecl in the DTD set p and s to "" and nsdecl to "xmlns". That is, after replacing the p, s, and nsdecl parameter entities, the DTD is as follows. Notice that both the DTD and document use the element type names A, B, and C and the attribute names D and E.
<!ELEMENT A (( B )*, C )>
<!ATTLIST A
xmlns CDATA "http://www.google.org/">
<!ELEMENT B EMPTY>
<!ATTLIST B
D NMTOKEN #REQUIRED
E CDATA #REQUIRED>
<!ELEMENT C (#PCDATA)>

But what if the document wants to use a different prefix, such as google? In this case, the document must override the declarations of the p and s entities in its internal DTD. That is, it must declare these entities so that they use google as a prefix (followed by a colon) and a suffix (preceded by a colon). For example:

<!DOCTYPE google:A SYSTEM "http://www.google.org/google.dtd" [
<!ENTITY % p "google:">
<!ENTITY % s ":google">
]>
<google:A>
<google:B google:D="bar" E="baz buz" />
<google:B google:D="boo" E="biz bez" />
<google:C>bizbuz</google:C>
</google:A>

In this case, the internal DTD is read before the external DTD, so the values of the p and s entities from the document are used. Thus, after replacing the p, s, and nsdecl parameter entities, the DTD is as follows. Notice that both the DTD and document use the element type names google:A, google:B, and google:C and the attribute names google:D and E.

<!ELEMENT google:A (( google:B )*, google:C )>
<!ATTLIST google:A
xmlns:google CDATA "http://www.google.org/">
<!ELEMENT google:B EMPTY>
<!ATTLIST google:B
google:D NMTOKEN #REQUIRED
E CDATA #REQUIRED>
<!ELEMENT google:C (#PCDATA)>

(Continued on next question...)

Other Interview Questions