The technique described here assumes that one needs to render the infoset to XML text using standard printing, i.e., using no special XML library.
Hence, every namespace prefix binding must be explicitly represented in the output XML text.
-
For every element declaration, capture the lexical namespace scope (scala.xml.NamespaceBinding
) from its element declaration XML in its schema document.
Save this on the runtime data structure for the element. In Runtime 1, this would be the DPathElementCompileInfo. (This is longstanding functionality in Daffodil since before version 1.0.0)
-
Excepting on the Root, remove any namespace binding that is unambiguous across the schema, and which appears on the root.
-
For each element declaration, the remaining namespace bindings and assigned prefix to be used are assigned based on the minimization rules describe above (e.g., about avoiding "tns" when possible.)
That is all that is done at schema compile time and at parse time up to the point where a textual representation (such as XML) needs to be output.
The DFDL Infoset tree is constructed with InfosetElement nodes that point to this compile time DPathElementCompileInfo structure, and no processing of namespace bindings occurs.
However when converting an infoset element into XML examine the namespace bindings of the element and those of the enclosing parent element.
-
Any that are redundant across the two are dropped.
-
New definitions introduced by the child are output as bindings
-
Redefinitions are output as bindings
-
If the element has no namespace, and the parent (or any super-parent) has a default namespace binding, then add an undefine binding for the default namespace.
This algorithm requires non-constant-time (worst case) processing at runtime; however, there is no overhead unless there are ambiguities among the namespace bindings and when namespace bindings at nodes beneath the root are required.
In addition, the number of such cases in any real schema will be small, so the algorithmic complexity worst-case here is far less important than the constant factor here.
Attaching an element to the infoset is a common operation.
These namespace binding machinations have the potential to be equally costly, per binding, to the general overhead of attaching the infoset element node.
Our standard design principle is, however, to not worry about overheads like this which are often not going to occur in real schemas, unless performance profiling shows them to be a hot-spot.
Sensibly-designed schemas will have no overhead from this namespace-binding combining.
Converting the DFDL Infoset to XML in One Pass (Streaming)
Note that eliminating prefix definitions that are unused in a particular XML document is not compatible with streaming.
It requires two passes to determine if a prefix is ever used to decide whether it can be omitted or must be included.
The only alternative to this is to introduce new namespace prefix definitions only at their point of use.
That would, however, be inconsistent with our goal of clarity and avoiding namespace prefix clutter in the schema.
It is preferable to output extra namespace bindings on the root element than to litter the document with namespace bindings at
interior XML elements.
Daffodil aspires to streaming parsing and unparsing. A streaming parser will output parts of the infoset without waiting to
know if children will eventually appear that require the namespace prefix definitions.
As a result, all namespace prefix definitions which may be required are included.
Most commonly this will result in extra unused namespace prefix definitions having been output on the start tag of the root element.
API XML-Fragment Mode - For Clarity: Avoid Namespace Bindings on the Root
When using the message streaming API and converting the parse Infoset to XML, each message is created as XML text by the parser and associated InfosetOutputter, and converting one relatively small message to XML may result in far more characters used to represent the namespace bindings on the root of the message than the rest of the message occupies in XML text.
The API provides a method to enable XML-fragment mode.
In this mode a method can be called to retrieve namespace prefix bindings that would appear on the root (i.e, on the root XML element of each message) if a complete XML data document were being created.
Subsequent calls to parse in XML-Fragment mode create XML which has no namespace bindings on the root element of each message.
This is an XML fragment because it lacks the namespace bindings needed for it to be a complete XML document.
This is in effect leaving it up to the caller whether and when to append the namespace bindings to the XML text.
This option is provided because the caller may not want to construct complete XML documents, and the namespace bindings in use for the XML may be well-known by the processing system.
This is less a performance optimization (XML text is really verbose, and this optimization is only scratching the surface).
This feature is about clarity and coping with XML when using it for small data documents corresponding to small communications messages.
Small XML data documents can be overwealmed by the volume of namespace definitions.
This is particularly likely if they are for small data messages created from large DFDL schemas with many schema documents and many namespaces.
<gn:Feature xmlns:cc="http://creativecommons.org/ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:gn="http://www.geonames.org/ontology#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wgs84_pos="http://www.w3.org/2003/01/geo/wgs84_pos#"><rdfs:isDefinedBy>sws/3/about.rdf</rdfs:isDefinedBy><gn:name>Zamīn Sūkhteh</gn:name><gn:alternateName><lang>fa</lang><name>Zamīn Sūkhteh</name></gn:alternateName><gn:alternateName><lang>fa</lang><name>زمين سوخته</name></gn:alternateName><gn:featureClass>ontology#S</gn:featureClass><gn:featureCode>ontology#S.CRRL</gn:featurecode><gn:countryCode>IR</gn:countryCode><wgs84_pos:lat>32.45831</wgs84_pos:lat><wgs84_pos:long>48.96335</wgs84_pos:long><gn:parentFeature>sws/3202991/</gn:parentFeature><gn:parentCountry>sws/130758/</gn:parentCountry><gn:parentADM1>sws/127082/</gn:parentADM1><gn:nearbyFeatures>sws/3/nearby.rdf</gn:nearbyFeatures><gn:locationMap>3/zamin-sukhteh.html</gn:locationMap></gn:Feature>
This is almost impossible to understand, given that the first 1/3 of it is just namespace bindings.
Without the namespace bindings it is easier. It looks like:
<gn:Feature><rdfs:isDefinedBy>sws/3/about.rdf</rdfs:isDefinedBy><gn:name>Zamīn Sūkhteh</gn:name><gn:alternateName><lang>fa</lang><name>Zamīn Sūkhteh</name></gn:alternateName><gn:alternateName><lang>fa</lang><name>زمين سوخته</name></gn:alternateName><gn:featureClass>ontology#S</gn:featureClass><gn:featureCode>ontology#S.CRRL</gn:featurecode><gn:countryCode>IR</gn:countryCode><wgs84_pos:lat>32.45831</wgs84_pos:lat><wgs84_pos:long>48.96335</wgs84_pos:long><gn:parentFeature>sws/3202991/</gn:parentFeature><gn:parentCountry>sws/130758/</gn:parentCountry><gn:parentADM1>sws/127082/</gn:parentADM1><gn:nearbyFeatures>sws/3/nearby.rdf</gn:nearbyFeatures><gn:locationMap>3/zamin-sukhteh.html</gn:locationMap></gn:Feature>
With line endings after each element end tag, it is quite easy to understand.
<gn:Feature><rdfs:isDefinedBy>sws/3/about.rdf</rdfs:isDefinedBy>
<gn:name>Zamīn Sūkhteh</gn:name>
<gn:alternateName><lang>fa</lang><name>Zamīn Sūkhteh</name></gn:alternateName>
<gn:alternateName><lang>fa</lang><name>زمين سوخته</name></gn:alternateName>
<gn:featureClass>ontology#S</gn:featureClass>
<gn:featureCode>ontology#S.CRRL</gn:featurecode>
<gn:countryCode>IR</gn:countryCode>
<wgs84_pos:lat>32.45831</wgs84_pos:lat>
<wgs84_pos:long>48.96335</wgs84_pos:long>
<gn:parentFeature>sws/3202991/</gn:parentFeature>
<gn:parentCountry>sws/130758/</gn:parentCountry>
<gn:parentADM1>sws/127082/</gn:parentADM1>
<gn:nearbyFeatures>sws/3/nearby.rdf</gn:nearbyFeatures>
<gn:locationMap>3/zamin-sukhteh.html</gn:locationMap>
</gn:Feature>
The XML Infoset Inputter also has a feature allowing an API method to supply the root-level namespace bindings once, not on the root element of every XML-fragment delivered for unparsing.
The symmetry of the API insures that one can unparse the XML output from a parse that is creating XML in this fragment mode.
The Daffodil CLI has an option to add XML-fragment mode to message streaming behavior for parsing and unparsing.