Filetypes – Project Haystack

Filetypes

OverviewZincJSONTrioCSVRDF

Overview

Haystack defines several text formats for encoding the fixed set of standard data types. Every file type is mapped to a filetype definition.

Zinc

Zinc is a recursive acronym for "Zinc Is Not CSV". It is the original Haystack format designed to encode CSV with strong typing. Zinc provides full fidelity to encode all Haystack kinds without loss of typing. It provides a compact and readable syntax at the expense of requiring a non-trivial custom parser. Zinc's scalar encodings are also the basis for Trio and JSON. Zinc is the default format used for the HTTP API (although today JSON is also equally supported and utilized).

See Zinc chapter for further discussion and grammar.

JSON

The JSON format specifies a standardized method for mapping the Haystack data types to JSON without loss of information. There are two methods for encoding Haystack types to JSON:

  • Version 4 - The default JSON encoding for Haystack
  • Version 3 - The Haystack 3 encoding for JSON. This encoding is supplanted by the version 4 encoding but still supported for backwards compatibility.

See Json chapter for further details.

Trio

Trio is an acronym for Tag Record Input/Output. Trio is derived from YAML. It uses the Zinc encodings for scalar types to provide full type fidelity. Trio is targeted for use cases when humans need to hand code Haystack data. For example its the format used for defs. Most examples in the documentation are formatted in Trio.

Trio is a line oriented format. Dicts are encoded with each tag on its own line. Dicts are separated by a line of "---". There is also some syntax sugar for multi-line strings and nested collection data values.

Trio is not ideal for representing grids because it does not support grid meta nor column meta. As such, Trio should not typically be used to encode requests/responses in the HTTP API.

See Trio chapter for further details.

CSV

CSV stands for Comma Separated Values. CSV files are easily imported and exported from spreadsheets and relational databases. CSV is the inspiration for Zinc. The main drawback with CSV is that there is essentially only one collection type (table/grid) and one scalar type (strings). As such CSV does not provide full fidelity with Haystack kinds. However as a widely supported open format, we specify a standard mechanism to export Haystack data to CSV.

See CSV chapter for further details.

RDF

Project Haystack specifies a standard export to RDF triples via two different formats: Turtle and JSON-LD. Both defs and instance data have a standard export mapping.

See RDF chapter for details.