- Index
- »
- docHaystack
- »
- Zinc
Zinc
Overview
Zinc stands for "Zinc Is Not CSV". Zinc is a plaintext syntax for serializing Haystack grids using a souped up CSV format. Unlike CSV, Zinc supports typed scalar values (such as Bool, Int, Float, Str, Date, etc) and arbitrary meta-data at the grid and column level. Unlike JSON, Zinc results in much higher compression for tabular data.
Zinc is represented by the def filetype:zinc.
Literals
The basic syntax of Zinc uses a custom literal syntax for each type:
- Null: N
- Marker: M
- Remove: R
- NA: NA
- Bool: TorF(for true, false)
- Number: 1,-34,10_000,5.4e-45,9.23kg,74.2°F,4min,INF,-INF,NaN
- Str: "hello","foo\nbar\"(uses all standard escape chars as C like languages)
- Uri: `http://project-haystack.com/`
- Ref: @17eb0f3a-ad607713,@xyz "Display Name"
- Symbol: ^hot-water
- Date: 2010-03-13(YYYY-MM-DD)
- Time: 08:12:05(hh:mm:ss.FFF)
- DateTime: 2010-03-11T23:55:00-05:00 New_Yorkor2009-11-09T15:39:00Z
- Coord: C(37.55,-77.45)
- XStr: Type("value")
- List: [1, 2, 3]
- Dict: {dis:"Building" site area:35000ft²}
- Grid: <<ver:"3.0" ... >>
Syntax
Every grid has one line of meta-data applied to the entire grid, followed by one line of column definitions, then zero or more lines of rows. Each line is separated by a "\n" newline character.
The meta-data line must always begin with a ver tag and a value of "3.0". Let's look at a simple example:
ver:"3.0" firstName,bday "Jack",1973-07-23 "Jill",1975-11-15
Note the first line defines the grid meta-data, which is just the version tag.  The second line defines two columns named firstName and bday.  There are two data rows each with a Str value for firstName and a Date value for bday.  Every row must define a cell value for each column.
Metadata may be specified on the grid itself or on each column as a set of name/value tags. Tags are specified as "name: val" or if value is omitted, then it is a marker tag. Tags are separated by a space. Here is an example:
ver:"3.0" database:"test" dis:"Site Energy Summary" siteName dis:"Sites", val dis:"Value" unit:"kW" "Site 1", 356.214kW "Site 2", 463.028kW
It is common to have sparse tables where rows have a null value for a given column.  This is indicated either using the N literal or by omitting a the cell entirely.  For example these two rows are semantically identical:
"a",N,2,N,N,"z" "a",,2,,,"z"
If there is only one column, then a null row must be represented with the N character.
Nested lists, dicts, or grids may be used for any meta data value or cell:
ver:"3.0"
type,val
"list",[1,2,3]
"dict",{dis:"Dict!" foo}
"grid",<<
  ver:"2.0"
  a,b
  1,2
  3,4
  >>
"scalar","simple string"
  
Nested dicts are optionally allowed to use a comma between name value pairs. However, commas are not allowed for grid and column meta-data.
Grammar
Grammar legend:
:= is defined as <x> non-terminal "x" literal [x] optional (x) grouping x+ one or more times x* zero or more times x|x or
The formal grammar for Zinc:
<grid>        :=  <gridMeta> <cols> [<row>]*
<gridMeta>    :=  <ver> <tagsNoComma> <nl>
<ver>         :=  "ver:" <str>  // must be "3.0"
<tagsNoComma> :=  <tag>*   // separated by one space (0x20)
<tagsCommaOk> :=  (<tag>, [","])*  // trailing comma allowed/optional
<tag>         :=  <tagMarker> | <tagPair>
<tagMarker>   :=  <id>  // val is assumed to be Marker
<tagPair>     :=  <id> ":" <val>
<cols>        :=  <col> ("," <col>)* <nl>
<col>         :=  <id> <tagsNoComma>
<row>         :=  <cell> ["," <cell>]* <nl>
<cell>        :=  <val>  // empty cell is same as null
<val>         :=  <scalar> | <list> | <dict> | <grid>
<list>        :=  "[" (<val> ",")* "]"  // trailing comma allowed/optional
<dict>        :=  "{" <tagsCommaOk> "}"
<grid>        :=  "<<" <grid> ">>"
Zinc tokens:
<id>          :=  <alphaLo> (<alphaLo> | <alphaHi> | <digit> | '_')*
<scalar>      :=  <null> | <marker> | <remove> | <na> | <bool> | <ref> | <symbol> | <str> |
                  <uri> | <number> | <date> | <time> | <dateTime> | <coord> | <xstr>
<null>        := "N"
<marker>      := "M"
<remove>      := "R"
<na>          := "NA"
<bool>        := "T" | "F"
<symbol>      := "^" <refChar>+
<ref>         := "@" <refChar>+ [ " " <str> ]
<refChar>     := <alpha> | <digit> | "_" | ":" | "-" | "." | "~"
<str>         := """ <strChar>* """
<uri>         := "`" <uriChar>* "`"
<strChar>     := <unicodeChar> | <strEscChar>
<uriChar>     := <unicodeChar> | <uriEscChar>
<unicodeChar> := any 16-bit Unicode char >= 0x20 (except str/uri quote)
<strEscChar>  := "\b" | "\f" | "\n" | "\r" | "\r" | "\t" | "\"" | "\\" | "\$" | <uEscChar>
<uriEscChar>  := "\:" | "\/" | "\?" | "\#" | "\[" | "\]" | "\@" | "\`" | "\\" | "\&" | "\=" | "\;" | <uEscChar>
<uEscChar>    := "\u" <hexDigit> <hexDigit> <hexDigit> <hexDigit>
<xstr>        := <xstrType> "(" <str> ")"
<xstrType>    := <alphaHi> (<alphaLo> | <alphaHi> | <digit> | '_')*
<number>      := <decimal> | "INF" | "-INF" | "NaN"
<decimal>     := ["-"] <digits> ["." <digits>] [<exp>] [<unit>]
<exp>         := ("e"|"E") ["+"|"-"] <digits>
<unit>        := <unitChar>*
<unitChar>    := <alpha> | "%" | "_" | "/" | "$" | any char > 128  // see Units
<date>        := YYYY-MM-DD
<time>        := hh:mm:ss.FFFFFFFFF
<dateTime>    := YYYY-MM-DD'T'hh:mm:ss.FFFFFFFFFz zzzz
<coord>       := "C(" <coordDeg> "," <coordDeg> ")"
<coordDeg>    := ["-"] <digits> ["." <digits>]
<alphaLo>     := ('a' - 'z')
<alphaHi>     := ('A' - 'Z')
<alpha>       := <alphaLo> | <alphaHi>
<digit>       := ('0' - '9')
<digits>      := <digit> (<digit> | "_")*
<hexDigit>    := ('a'-'f') | ('A'-'F') | digit
The space character 0x20 is allowed between tokens.
Notes
The following are notes for implementators:
Identifiers vs Keywords
Identifiers must start with a lower case letter. Keywords begin with an upper case letter: "N", "T", "F", "M", "NA", "INF", "NaN", etc
URIs
Escape chars in URIs are used to remove special meaning for reserved characters.  For example if a filename contains the # character, then it must be escaped so that the # is not treated as a fragment identifier:
`file \#2`
Parsers should be prepared to encounter and preserve the backslash in these cases.
Number Tokens
When parsing, a leading digit may be a number, date, time, or datetime. You can use the following technique to consume these scalars:
- consume all the various chars into a string
- if dashes and no colons must be date
- if colons and no dashes must be time
- if colons and dashes must be dateTime, check for Zor timezone
- must be number with optional unit
DateTime
DateTime scalars are encoded using both offset and the timezone name:
2010-11-28T07:23:02.773-08:00 Los_Angeles // negative offset and timezone 2010-11-28T23:19:29.741+08:00 Taipei // positive offset and timezone 2010-11-28T18:21:58+03:00 GMT-3 // timezone may include '-' 2010-11-28T12:22:27-03:00 GMT+3 // timezone may include '+' 2010-01-08T05:00:00Z UTC // UTC example 2010-01-08T05:00:00Z // UTC may omit timezone name
Version History
Zinc 1.0
- initial version
- Bin format: Bin mime:"text/plain"
Zinc 2.0
- change hex RecId syntax to @ Ref syntax
- remove support for cell display strings and metadata
- remove support for column display strings (use dis metadata tag)
- update Bin format: Bin(text/plain)
Zinc 3.0
- add nested lists, dicts, grids
- add NA
- add XStr
- remove Bin format to use XStr syntax
Zinc 3.0 Haystack 4 features
- Version remains the same "3.0"
- Symbol literals
- Allow commas in nested dict literals