Daffodil provides extensions to the DFDL specification.
These functions and properties are in the namespace defined by the URI
http://www.ogf.org/dfdl/dfdl-1.0/extensions which is normally bound to the dfdlx prefix
like so:
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions">
The DFDL language extensions described below have Long Term Support (LTS) in Daffodil going forward, and are proposed for inclusion in a future revision of the DFDL standard. DFDL schema authors can depend on the features and behaviors defined here without fear that these extensions will be withdrawn in the future.
Daffodil supports processing data that contains large opaque binary objects, also known as BLOBs. These enable processing of data types such as images, audio, or video where the data content is surrounded by important metadata. The DFDL Schema can expose the metadata fields for processing and carry along the opaque BLOB data in files.
There is separate documentation for the Binary Large Object (BLOB) feature.
dfdlx:trace(value, label)A function that can be used to debug DFDL expressions, similar to
the XPath fn:trace(value, label)
function.
This creates a message from the string argument label followed by value converted to a
string and logs the message.
The function returns the value.
The second label argument must be of type xs:string.
dfdlx:lookAhead(offset, bitSize)Read bitSize bits, where the first bit is located at an offset (in bits)
from the current location. The result is a xs:nonNegativeInteger. Restrictions:
dfdl:lengthKind="explicit" surrounding element.dfdl:bitOrder and dfdl:byteOrder are determined by the current schema component
and data location.dfdlx:lookAheadThe following two elements both populate element a with the value of the next 3 bits as an
unsignedInt.
They are not completely equivalent because the first will consume 3 bits of the
input stream where the second will not advance the input stream.
<xs:element name="a" type="xs:unsignedInt" dfdl:length="3" dfdl:lengthUnits="bits" />
<xs:element name="a" type="xs:unsignedInt" dfdl:inputValueCalc="{ dfdlx:lookAhead(0,3) }" />
The following example demonstrates using lookAhead to branch based on a field in the future.
In this case the choice of elements a vs. b depends on the value of the tag field which is
found after fields a and b:
<xs:choice dfdl:choiceDispatchKey="{ dfdlx:lookAhead(16,8) }">
<xs:element name="a" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="1"/>
<xs:element name="b" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="2"/>
</xs:choice>
<xs:element name="tag" type="xs:int" dfdl:length="8" />
bitAnd, bitOr, bitXor, bitNot, leftShift, rightShiftThese functions are defined on types long, int, short, byte, unsignedLong,
unsignedInt, unsignedShort, and unsignedByte
dfdlx:bitAnd(arg1, arg2)This computes the bitwise AND of two integers.
dfdlx:bitOr(arg1, arg2)This computes the bitwise OR of two integers.
dfdlx:bitXor(arg1, arg2)This computes the bitwise Exclusive OR of two integers.
dfdlx:bitNot(arg)This computes the bitwise NOT of an integer. Every bit is inverted. The result type is the same as the argument type.
dfdlx:leftShift(value, shiftCount)This is the logical shift left, meaning that bits are shifted from less-significant positions to more-significant positions.
value argument type.shiftCount argument is < 0.shiftCount argument is greater than the number of
bits in the type of the value argument.dfdlx:rightShift(value, shiftCount)This is the arithmetic shift right, meaning bits move from most-significant to less-significant positions. If logical (zero-filling) shift right is needed, you must use unsigned types.
value argument is shifted by the shiftCount.value is signed, then the sign bit is shifted in for the left-most bits.value is unsigned, then zeros are shifted in for the left-most bits.value argument type.shiftCount argument is < 0.shiftCount argument is greater than the number of
bits in the type of the value argument.dfdlx:doubleFromRawLong(longArg) and dfdlx:doubleToRawLong(doubleArg)IEEE binary float and double values that are not NaN will parse to base 10 text and unparse back
to the same exact IEEE binary bits.
However, the same cannot be said for NaN (not a number) values, of which there are many bit
patterns.
To preserve float and double NaN values bit for bit you can use these functions to compute
xs:long values that enable the DFDL Infoset to preserve the bits of a float or double value
even if it is a NaN.
dfdlx:alignmentKindValid values for this property are manual or automatic with automatic being the default
behavior.
When specified, the manual value turns off all automatic alignment based on the
dfdl:alignment and dfdl:alignmentUnits properties.
The schema author must use dfdl:leadingSkip, dfdl:trailingSkip, or just ensure all the
elements/terms are aligned based on their length.
This property is sometimes needed to facilitate creation of schemas where interactions occur between computed lengths (that is, stored length fields) and alignment regions that are automatically being inserted. It can be easier to do all alignment manually than to debug these interactions.
dfdlx:parseUnparsePolicyA property applied to simple and complex elements, which specifies whether the element supports only parsing, only unparsing, or both parsing and unparse. Valid values for this property are parse, unparse, or both. This allows one to leave off properties that are required for only parse or only unparse, such as dfdl:outputValueCalc or dfdl:outputNewLine, so that one may have a valid schema if only a subset of functionality is needed.
All elements must have a compatible parseUnparsePolicy with the compilation parseUnparsePolicy (which is defined by the root element daf:parseUnparsePolicy and/or the Daffodil parseUnparsePolicy tunable) or it is a Schema Definition Error. An element is defined to have a compatible parseUnparsePolicy if it has the same value as the compilation parseUnparsePolicy or if it has the value both.
For compatibility, if this property is not defined, it is assumed to be both.
dfdlx:layerLayers provide algorithmic capabilities for decoding/encoding data or computing checksums. Some are built-in to Daffodil. New layers can be created in Java/Scala and plugged-in to Daffodil dynamically. There is separate Layer documentation.
dfdlx:directionThis property can appear only on DFDL defineVariable statement annotations.
This property has possible values both (the default), parseOnly, or unparseOnly.
It declares
whether the variable is to be available for only parsing, only unparsing, or both.
Since this is a newly introduced extension property and existing schemas won't contain a definition
for it, it has a default value of both.
This property can conflict with the dfdlx:parseUnparsePolicy property which takes the same
values (both, parseOnly, and unparseOnly).
If dfdlx:parseUnparsePolicy='parseOnly' then it is a Schema Definition Error if
variables in the DFDL schema have dfdlx:direction='unparseOnly'.
Similarly if dfdlx:parseUnparsePolicy='unparseOnly' then it is a Schema Definition Error if
variables in the DFDL schema have dfdlx:direction='parseOnly'.
It is a Schema Definition Error if a variable defined with direction parseOnly is accessed
from an expression used by the unparser.
Symmetrically, it is a Schema Definition Error if a variable defined with direction
unparseOnly is accessed from an expression used by the parser.
This error is detected at DFDL schema compilation time, not runtime.
These properties take expressions for their values and are generally evaluated at both parse and
unparse time.
Hence, unless the whole schema is constrained by dfdlx:parseUnparsePolicy, any expressions for
these properties1 cannot
cannot reference DFDL variables with dfdlx:direction of parseOnly or unparseOnly.
byteOrderencodinginitiatorterminatorseparatorescapeCharacterescapeEscapeCharacterlengthoccursCounttextStandardDecimalSeparatortextStandardGroupingSeparatortextStandardExponentRepbinaryFloatReptextBooleanTrueReptextbooleanFalseRepcalendarLanguagedfdl:setVariable, a dfdl:newVariableInstance default value expression, or a
dfdl:defineVariable default value expression when
that variable being set/defaulted is itself referenced from a another expression and the variable
being set/defaulted has dfdlx:direction of both (the default)Parser-specific expressions include
dfdl:inputValueCalcdfdl:length (when dfdl:lengthKind='explicit')dfdl:occursCount (when `dfdl:occursCountKind='expression')dfdl:choiceDispatchKeymessage and test attributes of the dfdl:assert and dfdl:discriminator statement annotationsdfdl:setVariable, a dfdl:newVariableInstance default value expression, or a
dfdl:defineVariable default value expression when
that variable being set/defaulted is itself referenced from a another expression being
accessed at parser creation time, and the variable being set/defaulted has dfdlx:direction
of parseOnlyUnparser-specific expressions include:
dfdl:outputValueCalcdfdl:length (when `dfdl:lengthKind='explicit')dfdl:outputNewLinedfdl:setVariable, a dfdl:newVariableInstance default value expression, or a
dfdl:defineVariable default value expression when
that variable being set/defaulted is itself referenced from a another expression being
accessed at unparser creation time, and the variable being set/defaulted has dfdlx:direction
of unparseOnlydfdlx:repType, dfdlx:repValuesThese properties work together to allow DFDL schemas to define enumerations; that is, symbolic representations for integer constants. When parsing, Daffodil will convert these integers into the corresponding string values. When unparsing, Daffodil will convert strings into the corresponding integers.
An element of type (or derived from) xs:string can be defined using XSD enumeration facets
which constrain the valid values of this string.
These enumeration values are effectively symbolic constants.
The dfdlx:repType and dfdlx:repValues properties are then used to define the correspondence of
the symbolic strings to the corresponding integer values.
dfdlx:repTypeThe value of this property is an XSD QName of a simple type definition that must be derived
from xs:int, or xs:unsignedInt.
A simple type definition for a string can be annotated with dfdlx:repType
in order to declare that the representation of the string is not as text characters but is a
numeric integer value.
The type referenced from dfdlx:repType is usually a fixed length binary integer, but can be any
DFDL type derived from xs:int or xs:unsignedInt, with any DFDL representation properties.
The mapping between the representation integer and the symbolic constants is specified using the
dfdlx:repValues property.
dfdlx:repValuesThe value of this property is one or more integer values within
the numeric range defined for the type referenced by dfdlx:repType. When more than one value
is specified, they are in a whitespace separated list.
This property is placed on the xs:enumeration facets of a symbolic string constant having a
dfdlx:repType.
At parse time, if the value of the dfdlx:repType integer is found within the dfdlx:repValues
list, then the infoset value for the symbolic string gets the corresponing enumeration facet value.
It is a parse error if the dfdlx:repType integer is not found in any of the dfdlx:repValues
lists of the xs:enumeration facets.
At unparse time, the symbolic constant is mapped to the first integer in the dfdlx:repValues list.
It is an unparse error if the symbolic string value is not found among the xs:enumeration
facet values of the symbolic string type.
A simple example of a basic enum is:
<simpleType name="rep3Bit" dfdl:lengthUnits="bits" dfdl:length="3" dfdl:lengthKind="explicit">
<restriction base="xs:unsignedInt"/>
</simpleType>
<simpleType name="precedenceEnum" dfdlx:repType="pre:rep3Bit">
<restriction base="xs:string">
<enumeration value="Reserved_0" dfdlx:repValues="0"/>
<enumeration value="Reserved_1" dfdlx:repValues="1"/>
<enumeration value="Emergency" dfdlx:repValues="2"/>
<enumeration value="Reserved_3" dfdlx:repValues="3"/>
<enumeration value="Flash" dfdlx:repValues="4"/>
<enumeration value="Immediate" dfdlx:repValues="5"/>
<enumeration value="Priority" dfdlx:repValues="6"/>
<enumeration value="Routine" dfdlx:repValues="7"/>
</restriction>
</simpleType>
Above we see the dfdlx:repType is rep3Bit which is a 3 bit xs:unsignedInt. This can
represent the values 0 to 7 which one can see are the dfdlx:repValues of the xs:enumeration
facets for this enumeration string type which is named precedenceEnum.
In the above you can also see that the symbolic strings are in one-to-one correspondence with every possible value of the 3-bit representation integer. This one-to-one correspondence assures that data that is first parsed and then unparsed will recreate the exact numeric bits used.
However, in data security applications the following may be preferred:
<simpleType name="precedenceEnum" dfdlx:repType="pre:rep3Bit">
<restriction base="xs:string">
<enumeration value="Reserved" dfdlx:repValues="0 1 3"/>
<enumeration value="Emergency" dfdlx:repValues="2"/>
<enumeration value="Flash" dfdlx:repValues="4"/>
<enumeration value="Immediate" dfdlx:repValues="5"/>
<enumeration value="Priority" dfdlx:repValues="6"/>
<enumeration value="Routine" dfdlx:repValues="7"/>
</restriction>
</simpleType>
In the above we see that three numeric values, 0, 1, and 3 are the dfdlx:repValues mapped to
the symbolic string Reserved.
This technique has the advantage of blocking covert signals being transmitted by use of the
different reserved values since when unparsed, the constant string Reserved will always be
canonicalized to integer 0.
Putting data into canonical form when unparsing generally improves data security.
xs:hexBinaryDaffodil allows dfdlx:lengthUnits='bits' for this simple type.
New properties added as part of errata corrections to the DFDL v1.0 standard which take expressions for their values will need to be added to this list or those for parser-specific or unparser-specific properties. ↩