There is a specific way of organizing a DFDL schema project that has been found to be helpful. It uses specific directory naming conventions and tree structure to manage name conflicts in a manner similar to how Java package names correspond to directory names. It also uses the SBT build tool with the Daffodil SBT Plugin to more easily manage, build, and test DFDL schema projects.
To quickly get started and generate this directory structure, you can install SBT and run the following command:
sbt new apache/daffodil-schema.g8
This prompts for various properties and creates the directory structure described below, including git and sbt configuration files, a basic DFDL schema file, and TDML and test files.
This set of conventions provides a number of benefits:
These conventions are actually usable for regular XML-schema projects, that is, they're not really DFDL-specific conventions. They're general conventions for organizing projects so as to achieve the above benefits.
Let's assume the DFDL schema contains two files named rfmt.dfdl.xsd, and format.dfdl.xsd, and that our format is named RFormat (rfmt) with an organization web identity of example.com.
The standard file tree would be:
rfmt/
├── .gitattributes - Git revision control system 'attributes' (see below)
├── .gitignore - Git revision control system 'ignore' file (should contain
│ 'target' and 'lib_managed' entries)
├── build.sbt - Simple Build Tool (sbt) specification file. Edit to change version
│ of Daffodil needed, or versions of other DFDL schemas needed
├── README.md - Documentation about the DFDL schema in Markdown file format
├── project/
│ ├── build.properties - Defines the sbt version
│ └── plugins.sbt - Defines plugins, including the Daffodil SBT plugin
└── src/
├── main/
│ └── resources/
│ └── com/
│ └── example/
│ └── rfmt/
│ ├── xsd/
│ │ ├── rfmt.dfdl.xsd - Primary RFormat DFDL schema file
│ │ └── format.dfdl.xsd - DFDL schema file imported/included from rfmt.dfdl.xsd
│ └── xsl/
│ └── xforms.xsl - Resources other than XSD go in other directories
└── test/
├── resources/
│ └── com/
│ └── example/
│ └── rfmt/
│ ├── Tests1.tdml - TDML test files
│ ├── data/ - Test data files not embedded in a TDML file
│ │ └── test01.rfmt - or .bin, .dat, .txt, etc.
│ └── infosets/ - Test infoset files not embedded in a TDML file
│ └── test01.rfmt.xml
└── scala/
└── com/
└── example/
└── rfmt/
└── Tests1.scala - Scala test driver file
Add the Daffodil SBT plugin to the project/plugins.sbt file:
addSbtPlugin("org.apache.daffodil" % "sbt-daffodil" % "1.1.0")
This adds the Daffodil SBT Plugin to the project, which configures settings and capabilities commonly needed to manage, build, and test DFDL schema projects. For more information about the settings that can be added to build.sbt or commands to run via SBT, see the GitHub README.
Use the below template for the build.sbt file:
name := "dfdl-rfmt"
organization := "com.example"
version := "0.0.1"
enablePlugins(DaffodilPlugin)
In some cases, git may mangle line endings of files so that they match the line ending of the system. In most cases, this is done in such a way that it you may never notice. However, in cases where file formats require specific line endings, this mangling of test data can lead to test failures. To prevent this, we recommend that a .gitattributes file be created in the root of the format directory with the following content to disabling the mangline:
/src/test/resources/**/data/** text=false
The above tells git that any test files in the data directory should be treated as if they were binary, and thus not to mangle newlines.
DFDL schemas should have the .dfdl.xsd
suffix to distinguish them from
ordinary XML Schema files.
A DFDL schema should have a target namespace.
Stylistically, the XSD elementFormDefault="unqualified"
is the preferred
style for DFDL schemas.
The xs:include
or xs:import
elements of a DFDL Schema can
import/include a DFDL schema that follows these conventions like this:
<xs:import namespace="urn:example.com/rfmt" schemaLocation="/com/example/rfmt/xsd/rfmt.dfdl.xsd"/>
The above is for using a DFDL schema as a library, from another different DFDL schema.
Within a DFDL schema, one DFDL schema file can reference another peer file that appears in the same directory (the src/main/resources/…/xsd directory) via:
<xs:include schemaLocation="format.dfdl.xsd"/>
That is, peer files need not carry the long /com/example/rfmt/xsd/
prefix
that makes the reference globally unique.
However, if one schema wants to include another different schema, then this
standard way of organizing schema projects ensures that when packaged into jar
files, the /src/main/resources
directory contents are at the "root" of the
jar file so that the schemaLocation
of the xs:import
or xs:include
containing the fully qualified path (/com/example/rfmt/xsd/rfmt.dfdl.xsd
)
will be found on the CLASSPATH
unambiguously. This convention is what
allows the schema files themselves to have short names like rfmt.dfdl.xsd, and
format.dfdl.xsd. Those names only need to be unique within a single schema
project. Across schema projects our standard DFDL schema project layout insures
unambiguous qualification is available.
You don't have to use Git version control, but many people do, and github.com is one of the reasons for this popularity.
Each DFDL schema should have its own Git repository if it is going to be revised independently. We encourage users to join the DFDLSchemas project on github and create repositories and publish schemas for any publicly-available formats there. For other formats that are not publicly available, one may want to put a placeholder for them on DFDLSchemas anyway (as IBM has done for some formats like Swift-MT.)
A DFDL schema using the recommended file structure as described here, can be packaged into a jar for convenient import/include from other schemas.
The sbt command does all the work:
sbt package # creates jar
sbt publishLocal # puts it into local .ivy2 cache where other maven/sbt will find it.
The resulting jar has the src/main/resources directory in it at the root of the jar. If this jar is on the classpath, then other schemas containing XSD import or include statements will search the jar with the schema location.
That enables a different schema's build.sbt
to contain a library dependency
on our hypthetical dfdl-rfmt schema using a dependency like this:
"com.example" % "dfdl-rfmt" % "0.0.1" % "test"
That will result in the contents of the src/main/resources
directory above
being on the classpath. XSD include and import statements search the classpath
directories.