DFDL Training

Table of Contents

  1. Introduction
  2. Prerequisites
    1. XML - Extensible Markup Language
    2. XML Schema (aka XSD or XSDL)
    3. Basic Data Format Concepts
    4. DFDL Basic Terminology
  3. Training Courses with Lab Exercises
    1. Tools Needed for Hands-On Training
    2. CSV to Bin Training Course
    3. FakeTDL DFDL Training Course
  4. Learning from Example DFDL Schemas
  5. Other Learning Resources
  6. Intermediate and Advanced DFDL Training Topics
    1. DFDL Schema Composition
    2. Learning the Daffodil API

Introduction

There are a few paths to take in order to learn DFDL depending on your goals and background. This page provides resources allowing you to learn by working your way through labs that gradually introduce DFDL concepts, or you can plunge right into well-crafted DFDL schemas for basic data formats, and gradually work up to more complex ones.

There is a general Overview Presentation about DFDL with sections about:

  • Motivation: Why we need DFDL
  • Introduction to DFDL - including a tiny example
  • Larger Examples: CSV, PCAP, MIL-STD-2045

The remaining sections are perhaps less interesting for DFDL beginners.

  • Where to get DFDL Schemas
  • Apache Daffodil - What is in it
  • Example of using Daffodil's Java API (yes it's "Hello World!")

Reading Section 1(the introduction) of the DFDL Specification is also a basic introduction to DFDL.

Prerequisites

There are several things that you need to be familiar with to learn DFDL. These include:

  • XML - Extensible Markup Language
  • XML Schema (aka XSDL or XSD)
  • Basic Data Format Concepts
  • DFDL Basic Terminology

XML - Extensible Markup Language

Daffodil supports both JSON and XML, but the learning/training materials are very biased towards using XML.

XML Schema (aka XSD or XSDL)

DFDL uses a subset of the XML Schema Definition Language (XSDL or XSD) to express the structure of data, meaning the field names, their order, and nesting of hierarchical structure. This means that a DFDL Schema is an XML Schema.

If you are familiar with the notion of a grammar or BNF, XML Schema is conceptually similar. It is more verbose, but provides standardized places to add annotations to the schema which allows DFDL to add the data format information onto the schema in a standard way.

Basic Data Format Concepts

You will need to be familiar with these concepts:

DFDL Basic Terminology

DFDL uses these terms often:

  • Native - the raw data format of the input to a DFDL parse process
  • Infoset - the output representation of the parsed data. For learning purposes we'll assume the Infoset is represented in XML.
  • Parse - to convert data from native format to an infoset
  • Unparse - The preferred term for the opposite of parse: to convert data from an infoset back into native form. This is often called serialization or marshalling in other non-DFDL contexts.
  • DFDL Processor - either a DFDL Parser or DFDL Unparser
  • Well-Formed - data is well-formed if a DFDL Parser can successfully produce an infoset. Note that well-formed data may be invalid.
  • Valid - A formal term meaning the infoset (as XML) is schema valid in that it has-been (or can be) validated using the DFDL schema (as an XML Schema). Note that the DFDL schema can express more complex rules beyond just the usual XSD constraints by way of Schematron.

There is also a Glossary of DFDL Terms in the DFDL Specification.

Training Courses with Lab Exercises

There are training courses for DFDL which include hands-on lab exercises.

Tools Needed for Hands-On Training

To do hands-on learning of DFDL you will want to interact with many of the open-source DFDL schema projects on github.

You will need to download and install these tools:

  • git - This may be pre-installed if you are running Linux.
  • SBT (Simple Build Tool) - This build tool is used by most DFDL Schemas created for use with Apache Daffodil. It will automatically pull in the Daffodil SBT Plugin when a DFDL schema project requires it.
  • Apache Daffodil - The Daffodil libraries and its Command Line Interface (CLI)

If you are familiar with the VSCode IDE, you may also want to install:

Many developers use their preferred Java IDE such as JetBrains IntelliJ IDEA. Others like the low level approach of just using the Daffodil Command Line Interface (CLI).

CSV to Bin Training Course

The CSV to Bin course has 7 labs starting from CSV (Comma Separated Values) and creating variants of it eventually ending with simple binary data examples. This course was intended to take 3 days, the last day being implementation of the NTP (Network Time Protocol) message format.

You can review the slides which accompany the labs.

FakeTDL DFDL Training Course

The FakeTDL DFDL Training course has 5 labs all of which are about developing your own version of the Fake TDL data format starting from its specification document. On completion the DFDL schema should be equivalent to the official FakeTDL DFDL Schema.

You can review the slides which accompany the labs.

Learning from Example DFDL Schemas

There are several simple DFDL Schemas that are well-structured, follow best-practices, include self-testing, and so serve as good starting points for learning DFDL.

If a data layout diagram like this one for NTP (Network Time Protocol) doesn't intimidate you, then perhaps you will want to just dig directly into:

Another publicly available schema intended to help with understanding of binary data, specifically military messaging formats, is:

The MIL-STD-2045 Header Schema is a useful example showcasing:

  • enums - a Daffodil extension of the DFDL v1.0 language.
  • bit order - this format numbers the bits of each byte starting least-significant-bit-first.
  • Multi-version support - this schema handles both revisions C and D1 of the format simultaneously.

Other Learning Resources

There are a variety of other materials on the Internet that provide some DFDL training:

Note: AI bots like ChatGPT and Gemini don't know much about DFDL yet. (as of December 2025)

Intermediate and Advanced DFDL Training Topics

DFDL Schema Composition

The DFDL language is designed to allow large DFDL schemas to be created as compositions of other schemas. Large complex schemas can be built up as assemblies of component schemas. This way a library of reusable DFDL component schemas can be built up and reused. Each component can be developed and tested in isolation.

The DFDLSchemas site has these schemas which provide an extensive example of the techniques for composing a larger DFDL schema from smaller components:

  • Envelope Payload - an assembly of the next 3 schemas
  • TCP Message
  • MIL-STD-2045 Header
  • PCAP - a component of the above, but also assembles PCAP-specific schema content with the next schema.
    • EthernetIP - schema for Ethernet packets
      • EthernetIP makes use of an advanced Daffodil DFDL Language Extension called Layers to compute IPv4 packet checksums.

A slide deck on Schema Composition illustrates the nesting of DFDL schema payload components with surrounding DFDL schema headers/envelopes, which are also components.

Learning the Daffodil API

Most uses of Apache Daffodil will embed it within a data processing system by way of its API. As of Daffodil 4.0.0 there is only a Java API for Daffodil, though it is usable from other JVM languages such as Scala. Prior versions of Daffodil had a Java API and a separate Scala API. All the API documentation is available on this site via links such as:

Examples showing how to use the API from Java are available on the OpenDFDL site. See:

  • helloWorld - Shows how to parse, perform an XSLT transformation, and unparse data from Java code. Slides that walk through this hello-world API example are included at the end of the Overview Presentation about DFDL.
  • hexWords is an advanced example of a DFDL schema for a data format that is not byte-oriented. The data records are a multiple of 4 bits in length, hence, a data record can end in the middle of a byte. Using Daffodil via its API, hexWords shows one can parse such data from a data stream, letting Daffodil keep track of the bit position internally.