Skip to main content


The primary use of a metaschema is to generate format specific schema definitions for a given serializable form (e.g., XML, JSON, YAML). These generated schema can be used to validate that data is conformant to the asscoiated format, and thus conformant to the model defined by the Metaschema.

A Metaschema can be used to automatically generate converters capable of converting content that is schema conformant to one Metaschema generated format to another format generated by the same Metaschema. This is possible because the Metaschema allows data in one format to be mapped to the defined information model, and from the information model to the other format. In this way the Metaschema provides a supermodel, or information model, that unifies each related format.

Additionally, a Metaschema can be used to automatically generate format specific model documentation that is aligned with the concepts used in a given format.

Finally, a Metaschema can be used to automatically generate language-specific data structures or classes, and serializers/deserializers that are capable of writing and reading data that is conformant in a given Metaschema derived format using the language-specific data structures. This generative approach allows application developers to focus right away on business logic and user interface features instead of building the basic common data structures needed for all applications that work with information from a given domain.

These Metaschema capabilities, which can be applied to any information domain, serve the needs of developers who need to support multiple data formats for a given domain, or that need to choose a specific technology stack that is well-suited to their application. In either case, use of the generative capabilities of the Metaschema allow content to be easily converted to a given format, and for modeling efforts to be multipled across a number of formats without incuring additional overhead for the additional formats.

We hope and expect that developers’ experience with different approaches will inform further efforts to unify and consolidate a coherent Metaschema-based information modeling framework.

This specification provides a basis for the development of interoperable toolchains supporting the generative capabilities of the Metaschema framework, and as a reference for information modelers producing Metaschema-based information models.

Design Goals

The design of the Metaschema modeling approach addresses the following needs:

  1. Unify support for compatible data descriptions in multiple disparate formats, such as XML, JSON, YAML and potentially others
  2. Produce schema documentation from the same source as schema files and tools
  3. Enable distributed, semi-coordinated experimentation with the format(s) and related tools as supported by a given Metaschema model

The primary goal is to reduce the overhead for maintaining multiple format(s) and data descriptions for a given model. This addressed in the Metaschema modeling approach providing a mechanism for all modeling to occur at the information model level. Comodity tooling can be used to produce schemas in a given format, saving time and maintenance costs.

A secondary goal is to reduce the adoption costs of adopting a given model, supporting a robust community of use for content created in related data formats. This is accomplished through comodity tooling that provides content conversion utilities and programing language APIs that ease the burden of implementation for a given Metaschema-based model.

Design Approach

The Metaschema provides a reduced, lightweight modeling language with certain specially-enforced constraints. The following philosophy was used in the current design: - Use schema constructs that map cleanly into features offered by XML and JSON schema technologies . This ensures that all information can be preserved in (lossless bidirectional) conversion. - Mediate between the structural differences in the XML, JSON, and YAML data formats by providing format-specific tailoring capabilties to improve the expression and conciseness of Metaschema-based data in a given format. - Beyond the applicable metaschema, no further inputs and or reliance on arbitrary conventions or runtime settings should be necessary to reliably produce correspondent JSON or YAML from any metaschema-described, schema-valid XML, and vice-versa. - Focus on the production of specifications and running code supporting automated generation of schemas and model-related artifacts, consistent with the model defined (and documented) by a given Metaschema definition. - Test all implementations against the semantics of metaschema elements and constructs by means of continual use of unit tests.

Since the Metaschema is designed to support only a “greatest common factor” of schema features, the supported features are a reduced from those available in a given schema language. This has the added benefit of making Metaschema easier to learn and use.

A metaschema definition describes a data structure comprising assemblies, each of which is composed of (more) assemblies and fields. Fields present data content while assemblies organize things. Both fields and assemblies can have flags, which are name-value pairs further qualifying their fields or assemblies with idenifying or characterizing data.

Comprehensive mappings from assembly, field and flag definitions, into analogous (representative) XSD and/or JSON Schema structures, enables modeling data in a neutral form. Since these mappings are fully defined, the mappings between corresponding XML and JSON expressions come for free.

This section contains the following topics:

This page was last updated on January 10, 2020.