This posts explains about data serialization formats, that are markup languages that focus on data itself rather than text format.
Do not confuse with document markup languages, that focused on tech format.
List of Data Serialization Languages
Data serialization languages can be classified as:
- Binary
- Text-based
Binary are used for communication between machines.
Text-based want to be kept human readability, though they still be used for communication.
Binary Data Serialization Languages
Binary data serialization languages featured on this post:
- ASN.1
- MessaPack
- Protobuf
- Apache Thrift
- Apache Avro
- XOP
ASN.1
ASN.1 is a framework rather than just a data serialization format.
You can read this post about ASN.1.
MessaPack
MessaPack is community-driven.
Protobuf
Protocol Buffers (protobuf) is developed by American company Google.
Flatbuffers
Flatbuffers is developed by American company Google.
Apache Thrift
Apache Thrift was developed originally by Facebook and then donated to the Apache Foundation.
Apache Avro
Apache Avro is part of the Hadoop ecosystem.
XOP
XML-binary Optimized Packaging (XOP) uses XOP packages.
Text-based Data Serialization Languages
Data serialization formats featured on this post:
- XML
- JSON
- YAML
- TOML
- SGML
The most popular are XML and JSON.
XML
Extensible Markup Language (XML)
XML can be defined based on:
- Document type definition (DTD)
- XML Schema
Document Type Definition (DTD) has a limited set of data type, and it does not allow to create new types. DTD is not extensible.
XML Schema is newer than DTD. It is strongly typed. It is written in XML syntax.
This definition file is optional. If there is no definition file, an XML with a correct syntax is just well-formed. When an XML additionally fulfills a definition, it is valid.
JSON
JavaScript Open Notation (JSON) removes some redundancy added by XML
YAML
YAML Ain’t Markup Language (YAML) is more human-readable than JSON by using indents and break-lines instead of squared brackets and accolades. On the other hand, it is slower to be parsed and not as universal and popular as JSON.
A popular Python module that includes a yaml library is called pyyaml.
TOML
Tom’s Obvious, Minimal Language (TOML) is oriented to config files.
SGML
Standard Generalized Markup Language (SGML).
HTML is based on SGML.
XPDL
XPDL is a serialization language for BPMN diagrams. It is defined by the Workflow Management Coaltion (WfMC).