package sapi
Provides the classes necessary to compile DFDL schemas, parse and unparse files using the compiled objects, and retrieve results and parsing diagnostics
Overview
The Daffodil object is a factory object to create a Compiler. The Compiler provides a method to compile a provided DFDL schema into a ProcessorFactory, which creates a DataProcessor:
val c = Daffodil.compiler() val pf = c.compileFile(file) val dp = pf.onPath("/")
The DataProcessor provides the necessary functions to parse and unparse data, returning a ParseResult or UnparseResult, respectively. These contain information about the parse/unparse, such as whether or not the processing succeeded with any diagnostic information.
The DataProcessor also provides two functions that can be used to perform parsing/unparsing via the SAX API. The first creates a DaffodilParseXMLReader which is used for parsing, and the second creates a DaffodilUnparseContentHandler which is used for unparsing.
val xmlReader = dp.newXMLReaderInstance val unparseContentHandler = dp.newContentHandlerInstance(output)
The DaffodilParseXMLReader has several methods that allow one to set properties and handlers (such as ContentHandlers or ErrorHandlers) for the reader. One can use any contentHandler/errorHandler as long as they extend the org.xml.sax.ContentHandler and org.xml.sax.ErrorHandler interfaces respectively. One can also set properties for the DaffodilParseXMLReader using DaffodilParseXMLReader.setProperty.
The following properties can be set as follows:
The constants below have literal values starting with "urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:sax:" and ending with "BlobDirectory", "BlobPrefix" and "BlobSuffix" respectively.
xmlReader.setProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_BLOBDIRECTORY, Paths.get(System.getProperty("java.io.tmpdir"))) // value type: java.nio.file.Paths xmlReader.setProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_BLOBPREFIX, "daffodil-sax-") // value type String xmlReader.setProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_BLOBSUFFIX, ".bin") // value type String
The properties can be retrieved using the same variables with DaffodilParseXMLReader.getProperty and casting to the appropriate type as listed above.
The following handlers can be set as follows:
xmlReader.setContentHandler(contentHandler) xmlReader.setErrorHandler(errorHandler)
The handlers above must implement the following interfaces respectively:
org.xml.sax.ContentHandler org.xml.sax.ErrorHandler
The ParseResult can be found as a property within the DaffodilParseXMLReader using this uri: "urn:ogf:dfdl:2013:imp:daffodil.apache.org:2018:sax:ParseResult" or DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT
In order for a successful unparse to happen, the SAX API requires the unparse to be kicked off by a parse call to any org.xml.sax.XMLReader implementation that has the DaffodilUnparseContentHandler registered as its content handler. To retrieve the UnparseResult, one can use DaffodilUnparseContentHandler.getUnparseResult once the XMLReader.parse run is complete.
Parse
Dataprocessor Parse
The DataProcessor.parse method accepts input data to parse in the form of a InputSourceDataInputStream and an InfosetOutputter to determine the output representation of the infoset (e.g. Scala XML Nodes, JDOM2 Documents, etc.):
val scalaOutputter = new ScalaXMLInfosetOutputter() val is = new InputSourceDataInputStream(data) val pr = dp.parse(is, scalaOutputter) val node = scalaOutputter.getResult
The DataProcessor.parse method is thread-safe and may be called multiple times without the need to create other data processors. However, InfosetOutputter's are not thread safe, requiring a unique instance per thread. An InfosetOutputter should call InfosetOutputter.reset before reuse (or a new one should be allocated). For example:
val scalaOutputter = new ScalaXMLInfosetOutputter() files.foreach { f => { outputter.reset val is = new InputSourceDataInputStream(new FileInputStream(f)) val pr = dp.parse(is, scalaOutputter) val node = scalaOutputter.getResult }
One can repeat calls to parse() using the same InputSourceDataInputStream to continue parsing where the previous parse ended. For example:
val is = new InputSourceDataInputStream(dataStream) val scalaOutputter = new ScalaXMLInfosetOutputter() val keepParsing = true while (keepParsing && is.hasData()) { scalaOutputter.reset() val pr = dp.parse(is, jdomOutputter) ... keepParsing = !pr.isError() }
SAX Parse
The DaffodilParseXMLReader.parse method accepts input data to parse in the form of a InputSourceDataInputStream. The output representation of the infoset, as well as how parse errors are handled, are dependent on the content handler and the error handler provided to the DaffodilParseXMLReader. For example, the org.jdom2.input.sax.SAXHandler provides a JDOM representation, whereas other ContentHandlers may output directly to a java.io.OutputStream or java.io.Writer.
val contentHandler = new SAXHandler() xmlReader.setContentHandler(contentHandler) val is = new InputSourceDataInputStream(data) xmlReader.parse(is) val pr = xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT) val doc = saxHandler.getDocument
The DaffodilParseXMLReader.parse method is not thread-safe and may only be called again/reused once a parse operation is completed. This can be done multiple times without the need to create new DaffodilParseXMLReaders, ContentHandlers or ErrorHandlers. It might be necessary to reset whatever ContentHandler is used (or allocate a new one). A thread-safe implementation would require unique instances of the DaffodilParseXMLReader and its components. For example:
val contentHandler = new SAXHandler() xmlReader.setContentHandler(contentHandler) files.foreach { f => { contentHandler.reset val is = new InputSourceDataInputStream(new FileInputStream(f)) xmlReader.parse(is) val pr = xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT) val doc = saxHandler.getDocument }
The value of the supported features cannot be changed during a parse, and the parse will run with the value of the features as they were when the parse was kicked off. To run a parse with different feature values, one must wait until the running parse finishes, set the feature values using the XMLReader's setFeature and run the parse again.
One can repeat calls to parse() using the same InputSourceDataInputStream to continue parsing where the previous parse ended. For example:
val is = new InputSourceDataInputStream(dataStream) val contentHandler = new SAXHandler() xmlReader.setContentHandler(contentHandler) val keepParsing = true while (keepParsing && is.hasData()) { contentHandler.reset() xmlReader.parse(is) val pr = xmlReader.getProperty(DaffodilParseXMLReader.DAFFODIL_SAX_URN_PARSERESULT) ... keepParsing = !pr.isError() }
Unparse
Dataprocessor Unparse
The same DataProcessor used for parse can be used to unparse an infoset via the DataProcessor.unparse method. An InfosetInputter provides the infoset to unparse, with the unparsed data written to the provided java.nio.channels.WritableByteChannel. For example:
val inputter = new ScalaXMLInfosetInputter(node) val ur = dp.unparse(inputter, wbc)
SAX Unparse
In order to kick off an unparse via the SAX API, one must register the DaffodilUnparseContentHandler as the contentHandler for an XMLReader implementation. The call to the DataProcessor.newContentHandlerInstance method must be provided with the java.nio.channels.WritableByteChannel, where the unparsed data ought to be written to. Any XMLReader implementation is permissible, as long as they have XML Namespace support.
val is = new ByteArrayInputStream(data) val os = new ByteArrayOutputStream() val wbc = java.nio.channels.Channels.newChannel(os) val unparseContentHandler = dp.newContentHandlerInstance(wbc) val xmlReader = SAXParserFactory.newInstance.newSAXParser.getXMLReader xmlReader.setContentHandler(unparseContentHandler) try { xmlReader.parse(is) } catch { case _: DaffodilUnparseErrorSAXException => ... case _: DaffodilUnhandledSAXException => ... }
The call to the XMLReader.parse method must be wrapped in a try/catch, as DaffodilUnparseContentHandler relies on throwing an exception to end processing in the case of anyerrors/failures. There are two kinds of errors to expect: DaffodilUnparseErrorSAXException, for the case when the UnparseResult.isError, and DaffodilUnhandledSAXException, for any other errors.
In the case of an DaffodilUnhandledSAXException,DaffodilUnparseContentHandler.getUnparseResult will return null.
try { xmlReader.parse(new InputSource(is)) } catch { case _: DaffodilUnhandledSAXException => ... case _: DaffodilUnparseErrorSAXException => ... } val ur = unparseContentHandler.getUnparseResult
Failures and Diagnostics
It is possible that failures could occur during the creation of the ProcessorFactory, DataProcessor, or ParseResult. However, rather than throwing an exception on error (e.g. invalid DFDL schema, parse error, etc), these classes extend WithDiagnostics, which is used to determine if an error occurred, and any diagnostic information (see Diagnostic) related to the step. Thus, before continuing, one must check WithDiagnostics.isError. For example:
val pf = c.compile(file) if (pf.isError()) { val diags = pf.getDiagnostics() diags.foreach { d => System.out.println(d.toString()) } return -1; }
Saving and Reloading Parsers
In some cases, it may be beneficial to save a parser and reload it. For example, when starting up, it may be quicker to reload an already compiled parser than to compile it from scratch. To save a DataProcessor:
val dp = pf.onPath("/") dp.save(saveFile);
And to restore a saved DataProcessor:
val dp = Daffodil.reload(saveFile);
And use like below:
val pr = dp.parse(data);
or
val xmlReader = dp.newXMLReaderInstance ... // setting appropriate handlers xmlReader.parse(data) val pr = xmlReader.getProperty("...ParseResult")
- Alphabetic
- By Inheritance
- sapi
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
class
Compiler extends AnyRef
Compile DFDL schemas into ProcessorFactory's or reload saved parsers into DataProcessor's.
-
class
DaffodilParseXMLReader extends XMLReader
SAX Method of parsing schema and getting the DFDL Infoset via designated org.xml.sax.ContentHandler, based on the org.xml.sax.XMLReader interface
-
class
DaffodilUnhandledSAXException extends SAXException
This exception is thrown when and unexpected error occurs while unparsing an infoset with an XMLReader and a DaffodilUnparseContentHandler.
This exception is thrown when and unexpected error occurs while unparsing an infoset with an XMLReader and a DaffodilUnparseContentHandler. If caught, the DaffodilUnparseContentHandler.getUnparseResult returns null. This most likely represents a bug in Daffodil.
-
class
DaffodilUnparseContentHandler extends ContentHandler
Accepts SAX callback events from any SAX XMLReader for unparsing
-
class
DaffodilUnparseErrorSAXException extends SAXException
This exception is thrown when UnparseResult.isError returns true while unparsing an infoset with an XMLReader and a DaffodilUnparseContentHandler.
This exception is thrown when UnparseResult.isError returns true while unparsing an infoset with an XMLReader and a DaffodilUnparseContentHandler. If caught, the DaffodilUnparseContentHandler.getUnparseResult function can be used to get the UnparseResult and error diagnostics
-
class
DataLocation extends AnyRef
Information related to a location in data
-
class
DataProcessor extends WithDiagnostics with Serializable
Compiled version of a DFDL Schema, used to parse data and get the DFDL infoset
-
class
Diagnostic extends AnyRef
Class containing diagnostic information
-
class
ExternalVariableException extends Exception
This exception will be thrown if an error occurs when setting an external variable.
This exception will be thrown if an error occurs when setting an external variable. Example of errors include: - Ambiguity in variable to set - Variable definition not found in a schema - Variable value does not have a valid type with regards to the variable type - Variable cannot be set externally
-
class
InvalidParserException extends Exception
This exception will be thrown as a result of attempting to reload a saved parser that is invalid (not a parser file, corrupt, etc.) or is not in the GZIP format.
-
class
InvalidUsageException extends Exception
This exception will be thrown as a result of an invalid usage of the Daffodil API
-
class
LocationInSchemaFile extends AnyRef
Information related to locations in DFDL schema files
-
class
ParseResult extends WithDiagnostics
Result of calling DataProcessor.parse, containing any diagnostic information, and the final data location
-
class
ProcessorFactory extends WithDiagnostics
Factory to create DataProcessor's, used for parsing data
-
class
UnparseResult extends WithDiagnostics
Result of calling DataProcessor.unparse, containing diagnostic information
-
abstract
class
WithDiagnostics extends Serializable
Abstract class that adds diagnostic information to classes that extend it.
Abstract class that adds diagnostic information to classes that extend it.
When a function returns a class that extend this, one should call WithDiagnostics.isError on that class before performing any further actions. If an error exists, any use of that class, aside from those functions in WithDiagnostics, is invalid and will result in an Exception.
Value Members
-
object
Daffodil
Factory object to create a Compiler and set global configurations
-
object
DaffodilParseXMLReader
The full URIs needed for setting/getting properties for the DaffodilParseXMLReader
-
object
DaffodilXMLEntityResolver
Returns the EntityResolver used by Daffodil to resolve import/include schemaLocations.
Returns the EntityResolver used by Daffodil to resolve import/include schemaLocations.
The entity resolver attempts to resolve namespaces and systemId's in the following order:
1. Use an org.apache.xml.resolver.Catalog/CatalogManager. By default the Catalog only includes the daffodil-built-in-catalog.xml, but additional catalogs can be added by putting CatalogManager.properties on the classpath when daffodil is run.
2. If not resolved in step 1, schemaLocations are resolved relative to the importing schema URI, which could either be a file on the filesystem or in a jar on the classpath.
The EntityResolver isn't thread safe, but it also is expensive and stateful, so we use ThreadLocal to only create one instance per thread.
-
object
ValidationMode extends Enumeration
Validation modes for validating the resulting infoset against the DFDL schema
This is the documentation for the Apache Daffodil Scala API.
Package structure
org.apache.daffodil.sapi - Provides the classes necessary to compile DFDL schemas, parse and unparse files using the compiled objects, and retrieve results and parsing diagnostics
org.apache.daffodil.udf - Provides the classes necessary to create User Defined Functions to extend the DFDL expression language
org.apache.daffodil.runtime1.layers.api - Provides the classes necessary to create custom Layer extensions to DFDL.