Data Serialization Language

This posts explains about data serialization formats, that are markup languages that focus on data itself rather than text format.

Do not confuse with document markup languages, that focused on tech format.

List of Data Serialization Languages

Data serialization languages can be classified as:

  • Binary
  • Text-based

Binary are used for communication between machines.

Text-based want to be kept human readability, though they still be used for communication.

Binary Data Serialization Languages

Binary data serialization languages featured on this post:

  • ASN.1
  • MessaPack
  • Protobuf
  • Apache Thrift
  • Apache Avro
  • XOP

ASN.1

ASN.1 is a framework rather than just a data serialization format.

You can read this post about ASN.1.

MessaPack

MessaPack is community-driven.

Protobuf

Protocol Buffers (protobuf) is developed by American company Google.

Flatbuffers

Flatbuffers is developed by American company Google.

Apache Thrift

Apache Thrift was developed originally by Facebook and then donated to the Apache Foundation.

Apache Avro

Apache Avro is part of the Hadoop ecosystem.

XOP

XML-binary Optimized Packaging (XOP) uses XOP packages.

Text-based Data Serialization Languages

Data serialization formats featured on this post:

  • XML
  • JSON
  • YAML
  • TOML
  • SGML

The most popular are XML and JSON.

XML

Extensible Markup Language (XML)

XML can be defined based on:

  • Document type definition (DTD)
  • XML Schema

Document Type Definition (DTD) has a limited set of data type, and it does not allow to create new types. DTD is not extensible.

XML Schema is newer than DTD. It is strongly typed. It is written in XML syntax.

This definition file is optional. If there is no definition file, an XML with a correct syntax is just well-formed. When an XML additionally fulfills a definition, it is valid.

JSON

JavaScript Open Notation (JSON) removes some redundancy added by XML

YAML

YAML Ain’t Markup Language (YAML) is more human-readable than JSON by using indents and break-lines instead of squared brackets and accolades. On the other hand, it is slower to be parsed and not as universal and popular as JSON.

A popular Python module that includes a yaml library is called pyyaml.

TOML

Tom’s Obvious, Minimal Language (TOML) is oriented to config files.

TOML code repository

SGML

Standard Generalized Markup Language (SGML).

HTML is based on SGML.

XPDL

XPDL is a serialization language for BPMN diagrams. It is defined by the Workflow Management Coaltion (WfMC).

You might also be interested in…

Leave a Reply

Your email address will not be published. Required fields are marked *