#258 Proposal for new JSON Encoding

Brian Frank Fri 20 Feb 2015

By way of this post, I would like to propose a new JSON encoding for Haystack tags. There is a lot of momentum to bring JSON up to the level where it could replace Zinc. But the current JSON encoding is more designed as an easy to use casual API; but it doesn't currently provide 100% full fidelity with the Haystack type system. So I am not comfortable saying its a suitable replacement for Zinc in its current state until we iron out those dark corner cases.

The main issue with the current JSON spec is that the types Date, Time, DateTime, URI, Ref, and Marker must be encoded to a JSON String type. The current spec leaves it up to the parser to check if a string matches one of those types, and to parse it back into the appropriate type. This makes for very clean JSON. The problem is that if I have a string value which happens to match one of those types, then it won't get decoded back correctly into a string. For example:

// zinc
ver: "2.0"
str,date
"2014-01-01",2015-02-02

// JSON
{"meta":{"ver":"2.0"},
 "cols":[{"name":"str"}, {"name":"date"}],
 "rows":[
   {"str":"2014-01-01", "date":"2015-02-02"},
 ]}

In the JSON above is impossible to tell that the str tag should be decoded back into a Str and not a Date. That is the big issue we need to solve. Other smaller issues:

  • the way we add units as an additional field in the row object is ugly (at least that seems to be the consensus I've heard :)
  • don't have explicit handling for INF, -INF, and NaN

The new proposed design is to continue to encode all scalar values as a single JSON string (with exception of Bool). But now non-Str types are encoded with a leading bang followed by an ASCII char indicating the type. In the special case of a string which starts with a bang, its prefixed with "!S".

"!@xyz-abc Display Name"     // Ref
"!M"                         // Marker 
"!N45.5kW"                   // Number with unit
"!NINF", "!N-INF", "!NNaN"   // Number specials
"!D2014-01-03"               // Date
"!H23:49"                    // Time
"!T2015-02-28T20:14:30Z UTC" // DateTime
"!Uhttp://acme.org"          // URI
"!C(30,-40)                  // Coord
"!S!hello"                   // Str with bang

This would be a breaking change to the existing JSON encoding. I don't know how much that has been used to date. If not heavily used we could just change it. Or we could roll the ver tag to something like "2.1" or something like that (but then it wouldn't match Zinc version tag).

Mike Jarmy Sat 21 Feb 2015

Sounds good to me, +1.

With the old encoding, there were definitely some subtle bugs that could show up. And I'm glad we are proposing to get rid of the separate-tag-for-units approach, that was cumbersome.

Is a number without a unit encoded as

"!N45.5"

or with the normal JSON encoding as

45.5

I feel like both should be acceptable, at least when parsing for sure.

Christian Tremblay Mon 23 Feb 2015

I believe it's a good thing as JSON format is simple and widely adopted.

Regarding nHaystack, we'll need to address the fact that old devices (NPM2) aren't able to deal with JSON (as is) as Java 1.3 is used... Zinc is possible though...

Regarding pyhaystack, I built everything based on JSON. But as it's just starting, it's time to change things and make modifications.

Brian Frank Tue 10 Mar 2015

Anymore comments on this? If not then I say its a go, and I'm going to switch over to the new JSON format for docs and the Java toolkit

Brian Frank Thu 19 Mar 2015

After some more thought, I would like to tweak the syntax of this proposal to use a prefix of "x:" instead of "!X" - I think it makes the JSON a bit more readable for debugging and is still just as easy to generate/parse.

I would also like to suggest that encoding of Numbers uses a space b/w the floating point value and unit to make it easier to parse.

Updated syntax would be:

"r:xyz-abc Display Name"      // Ref
"m:"                          // Marker 
"n:45.5 kW"                   // Number with unit
"n:INF", "n:-INF", "n:NaN"    // Number specials
"d:2014-01-03"                // Date
"h:23:49"                     // Time
"t:2015-02-28T20:14:30Z UTC"  // DateTime
"u:http://acme.org"           // URI
"c:30,-40                     // Coord
"s:x:hello"                   // any Str with colon as 2nd char

Does anyone prefer bang syntax over that one? Further ideas?

Tucker Watson Fri 20 Mar 2015

I much prefer the new format using colons. The only one that's a little weird to me is the marker but true/false is already being used for Bool. Will parser accept numeric type and consider it unitless or will it only accept n: encoded numbers?

Brian Frank Fri 20 Mar 2015

The only one that's a little weird to me is the marker but true/false is already being used for Bool

To ensure 100% fidelity we wouldn't want to mix up Bool and Marker, so although its a little weird, I think "m:" makes the most sense for consistency.

will parser accept numeric type and consider it unitless or will it only accept n: encoded numbers?

I would say parsers should accept floating JSON values as Numbers, but by convention we should always generate numbers as "n:XXX" strings.

Shawn Jacobson Mon 30 Mar 2015

+1 on the new format

Shawn Jacobson Wed 1 Apr 2015

I am going to implement this in nodehaystack. Can we agree on the following syntax for previously undefined types?

"n:"            // Null
"b:T"           // Bool (T for true, F for false)
"i:text/plain"  // Bin
"x:"            // Remove

Brian Frank Wed 1 Apr 2015

Null should just be a Json null, and Bool should just be a JSON bool - in those cases we have clean mappings to JSON I think we should keep them.

And then how about "b:text/plain" for Bin and "x:" for remove

Shawn Jacobson Wed 1 Apr 2015

Makes since and sounds good

Shawn Jacobson Wed 1 Apr 2015

Any reason not to use the same format with CSV to provide the same fidelity there as well?

// Zinc
ver:"2.0" projName:"test"
dis "Equip Name",equip,siteRef,installed
"RTU-1",M,@153c600e-699a1886 "HQ",2005-06-01
"RTU-2",M,@153c600e-699a1886 "HQ",1999-07-12

// JSON
{"meta":{"ver":"2.0", "projName":"test"},
 "cols":[{"name":"dis", "dis":"Equip Name"}, {"name":"equip"}, {"name":"siteRef"}, {"name":"installed"}],
 "rows":[
   {"dis":"s:RTU-1", "equip":"m:", "siteRef":"r:153c600e-699a1886 HQ", "installed":"d:2005-06-01"},
   {"dis":"s:RTU-2", "equip":"m:", "siteRef":"r:153c600e-699a1886 HQ", "installed":"d:1999-07-12"}
]}

// CSV
ver:"2.0",projName:"test"
dis Equip Name,equip,siteRef,installed
s:RTU-1,m:,r:153c600e-699a1886 HQ,d:2005-06-01
s:RTU-2,m:,r:153c600e-699a1886 HQ,d:1999-07-12

Brian Frank Thu 2 Apr 2015

Any reason not to use the same format with CSV to provide the same fidelity there as well?

You can't really make CSV support grid level or column meta data, so its never going to be full fidelity. And if you do, then you pretty much have Zinc at that point. Zinc and now JSON should really be two 100% full fidelity formats for Haystack-to-Haystack communication. I think CSV is more for exporting data to external systems like Excel, in which case that format makes life more difficult, not easier.

Shawn Jacobson Fri 3 Apr 2015

That makes sense, I concur.

Login or Signup to reply.