All Topics

#50 JSON parsing of query?his

Christian Tremblay Fri 18 May 2012

I've started to work with haystack to gather data from a Jace and bring it on a Web page to build reports using Google APIs.

I've used JSON as my standard to build tables from the CSV read from haystack.

Actually, it's not too complicated to modify the data retrieved but I can see a potential problem when I retrieve the list of histories (due to the way I use to build JSON). Maybe you can help.

A standard request (query?his) gives : kind:"Num",id:<JACE.Point>,his,dis:"Point",point,unit:"null",tz:"New_York"

All we have to do to parse this string in JSON is to put quotes around tags, surround with brackets, etc...

{"kind":"Num", "id":"JACE.Point","his":"","dis":"Point","point":"","unit":"null","tz":"New_York"}

Problem is : if tags have been added to points (sensor, discharge, temp, etc... I will have to treat every one of them and add the quotes. (I've used a simple string.replace to get rid of supplemental tags... replacing the tag by "")

Would it be possible to limit the details retrieved by a request. Something like : query?his.id&dis the answer would then be id:<JACE.Point>,his,dis:"Point" This way, we know exactly wich tags we will treat.

Or maybe we could have an option to provide JSON format directly from the servlet ?

Thanks for your hints, advices, help

Brian Frank Fri 18 May 2012

I think the correct way to do it is to have your HTTP request send the "Accept" header to request the data be encoded as "text/json".

The problem with JSON and the reason I didn't use that to begin with was that JSON has a very limited set of data types (bool, number, string). This is really a problem when we have key data types which include Marker, Ref, Date, Time, and DateTime. So the question then begins do we just encode these types as strings and lose some valuable type information? Or do we encode using these values with some nested structure such as:

{"ts":{"type":"DateTime", val:"2012-...."}, ...}

The probably with that model is that its sort of annoying to work with.

Christian Tremblay Fri 18 May 2012

First, I totally agree with the fact that we don't want a oBix-like structure that would be bigger and less readable.

If server gives us only string, it' not big deal to "translate" them to whatever format we like. We only have to know what is coming from the server.

After quick tests, format retrieved with accept header set to application/json or text/json is not standard (doesn't pass jsonlint test). We also loose ID surrounded by < and >.

Maybe due to the fact that I'm not so familiar with Javascript, I may lack something somewhere...

Brian Frank Fri 18 May 2012

After quick tests, format retrieved with accept header set to application/json or text/json is

No, that code isn't in there yet :-) It is pretty easy to add though once we figure out what the JSON encoding should be.

If server gives us only string, it' not big deal to "translate" them to whatever format we like. We only have to know what is coming from the server.

For discussion purposes, lets suppose we have this list of tags:

site
dis: "Store-A"
area: 15,000ft²
openDate: 1990-06-01

So we have a marker, string, number with unit, and date tag. The simplest way to encode into JSON would be:

{"site":"Marker", "dis":"Store-A", "area":15000, "openDate":"1990-06-01"}

But you can see a couple problems there. Area should most likely be a JSON number, but then we have to throw out the unit. And now it is ambiguous that site is a Marker and openDate is supposed to be a Date.

So unless we use a more complex encoding, we have to settle for a loss of fidelity. This means you cannot "round trip" JSON and expect to get back the proper data types. We may decide to give up fidelity in order to have convenience, but I've struggled with this a bit. I have also avoided JSON in SkySpark up until this point for this very reason - convenience or full fidelity?

Other options I have considered:

Assume strings which match a given pattern such as YYYY-MM-DD are actually a specific type such as Date. And then use some special escape sequence so that the actual string literal "2012-04-30" would be encoded as "!2012-04-30" so that we could tell its a string versus a date

Use some naming convention to add extra fields into the JSON like this:

{"area":15000, "area-unit":"ft²", 
 "openDate":"1990-06-01", "openDate-type":"Date"}

Christian Tremblay Fri 18 May 2012

Depending of "why" we need data to be presented using a specific format, I think mayb we should not try to cover too large / too much situations.

From my understanding, when I import data, I only want a quick way to create a kind of table. JSON format seemed easier than XML to achieve that.

For my app, I take the data, add a list tag and define something like : (after creating an array of strings with .split(\n))...

{ "list":[ {"id":"myId","kind":"myKind","stuff":"myStuff"},{"id":"myId","kind":"myKind","stuff":"myStuff"}, {"id":"myId","kind":"myKind","stuff":"myStuff"}] }

It gives me an easy way to iterate through the data and create a table with what I need.

Example :

(for var i in json.list){ id[i]=json.list[i].id; }

To be honnest, if the output was as it is but with a semantic JSON-Like, it would be perfect.

Example :

FROM kind:"Num",id:<SERVISYS.Random>,his,dis:"Random",point,unit:"null",tz:"New_York"

TO "kind":"Num","id":"<SERVISYS.Random>","his":"","dis":"Random","point":"","unit":"null","tz":"New_York"

OR (to organize additionnal tags so they won't be like "tag":"") "kind":"Num","id":"<SERVISYS.Random>","type":"his,point","tags":"discharge,air,sensor","dis":"Random","point":"","unit":"null","tz":"New_York"

For histories/value/timestamp format... I think that the app receiving the info has to know that : "ts" will give a string reprensenting a date, "val" will be a string you can parseFloat, etc.

Again, as I'm not a perfectly-well-trained-programmer... I may be missing some important subtilities.

Kevin Kelley Fri 18 May 2012

I wanna talk about this a bit, kind of on a tangent...

This is like the "lenses" idea from Haskell. For a bi-directional lens (a typesafe transform from one representation to another), you need a get and a put, and a means of re-injecting information lost when coming from the more lossy medium.

We deal with this for data-import, by keeping a (project-specific) TagTypes dictionary, that maps tagnames to (Axon) datatype. I bring it up here because I think that the ongoing efforts to standardize the tag libraries, ought to include a typeinfo somehow.

If you constrain your tags to be single-typed, and if you limit the JSON to keyed values (everything in an object where the key is the tagname and value is the string rep)... I think it works.

(In fact a Lint check, for improperly-typed tag-values, would catch a lot of problems -- if there were tag type metadata.

Type signatures for Axon functions would be nice, too -- but here we'd need to have union-types like (Grid or List) to fully describe current-practice.)

Getting back to interchange... every distributed system has to deal with this somehow. Either you control the format by defining everything up-front, or you provide a discovery mechanism. For example, a SOAP wsdl that gets returned by a default query, and defines the available APIs and the datatypes they use. Or, a REST system that lets you query for contents, then explore into those items for sub-elements...

All this is to say: I'm not so much talking about JSON; but this problem is one of many similar ones, which could maybe be solved by either: predefined, or: discoverable, tag-type metadata. I think metadata is necessary: I don't think you can embed everything, at least not without making your representation be a fully bootstrappable lisp...

Andy Frank Fri 18 May 2012

From a practicality standpoint - I think I side with Christian. There are other options for a full fidelity interchange format. But a really common case is that I want data - and I know exactly what I'm expecting. So most of the time its a non-issue to convert the data myself. Dropping to strings seems the simplest approach here.

Brian Frank Wed 23 May 2012

After talking to a few people offline, I think it makes sense to focus the JSON encoding on convenience even if it means losing type information. I'll map things as follows:

Marker: "✓" string literal
Number: as JSON number (we will lose unit)
Bool: as JSON boolean
Everything else: as JSON string literal

I'll post those changes to bitbucket sometime in next few weeks

Tucker Watson Fri 30 Nov 2012

Any update on JSON support? A javascript Zinc library would be ideal but in the meantime JSON would be great for js apps.

Also I was wondering why Haystack REST API doesn't define an add/edit/delete (commit) method. This isn't hard to implement per project but would be nice to have standardized.

Brian Frank Fri 30 Nov 2012

Also I was wondering why Haystack REST API doesn't define an add/edit/delete (commit)

Right now this is because I'm not sure how simple it would be to implement across different systems. But maybe we should kick off separate thread to discuss strategies to tackle that.

Any update on JSON support?

Well the new stuff, we should be all set to add JSON. But first we have to define how we want it to work. There are two basic paths:

full fidelity with no loss of typing, but harder to process
simpler format, but give up typing info

Example of 1:

{"dis": {"type": "Str", "val":"Whitehouse"},
 "area": {"type":"Number", "val": "55000ft²"},
 "built": {"type":"Date", "val": "1792-01-01"}}

Example of 2:

{"dis": "Whitehouse",
 "area": "55000ft²",
 "built": "1792-01-01"}

Tucker Watson Sat 1 Dec 2012

I've been waffling on this and while I think either would work I'd vote for the simpler format. My main use of JSON would be for history queries and object queries where I have an expected result. For example, readAll(widget)

Simple

widget = {"x":5,"y":8,"data":"/widget1.xml"}
widget.x = 5
widget.y = 8
widget.data = "/widget1.xml"'

widget = {"x": {"type":"Number", "val":5},
          "y": {"type":"Number", "val":8},
          "data": {"type":"Uri", "val":"/widget1.xml"}
         }

widget.x.val = 5;
widget.y.val = 8;
widget.data.val = "/widget1.xml"

Zinc can still be used for full fidelity and does a better job with less bandwidth.

Alper Üzmezler Sat 1 Dec 2012

I vote for the simpler format as well. We can reference the types from standard.csv.

Andy Frank Mon 3 Dec 2012

I think I'd lean towards #2 - the simple format as well.

Brian Frank Wed 7 May 2014

This morning I was working on adding JSON receive support to SkySpark. I actually think with a few tweaks we can maintain full fidelity with the Haystack type system and provide read/write support without loss of type information.

My proposal is to make two changes to the JSON encoding:

If the value is a URI then encode it with tick marks such as "`file.txt`"
If the value is a Number with a unit, then add an additional field in the containing JSON Object formatted as "{tag}.unit" with the unit name. Because tags must not contain a dot, there is no conflicts with other tags in the meta/rows

With these changes you can safely infer a JSON string into the correct type which might be Marker, Ref, Uri, Date, Time, DateTime, Bin, or Coord.

Here is example with proposed changes

// Zinc
ver:"2.0"
uri,area
`http://project-haystack.org/`,5000ft²

// JSON
{"meta":{"ver":"2.0"},
 "cols": [{"name":"uri"}, {"name":"area"}],
 "rows": [
   {"uri":"`http://project-haystack.org/`", "area":5000.0, "area.unit":"ft²"}
 ]}

What does everyone think?

Jason Briggs Wed 7 May 2014

I'm on board, you have my vote. I think this is a good strategy.

Richard McElhinney Thu 8 May 2014

Anything that allows us to use standard web encoding technologies is a good thing. It is a little bit of a compromise because we are relying on naming conventions to allow higher fidelity, but I think our Use/Case is strong enough that we should go for it.

It will be clearly documented in the spec so there should be no problems.

Cheers, Richard

Mike Jarmy Thu 8 May 2014

+1 to this idea, I've always really disliked the one we've had until now because of the lack of fidelity.

If we can get full fidelity with JSON than browser-based projects will be far faster since there will be no need for a zinc parser written in javascript.

Brian Frank Mon 12 May 2014

Okay unless there are more comments I'll consider this settled and update docs with my two proposed changes.

Ben Cromwell Fri 22 Aug 2014

If the value is a URI then encode it with tick marks such as "`file.txt`"

I know it's a bit late now as it's implemented, however, I am curious as to why this is the case.

What problem are the backticks solving?

Does it not just add an extra step for a parser to implement (stripping the backticks out)? As we already know from the key that it's a URI, and the value is quoted, it feels like the backticks are redundant.

Brian Frank Fri 22 Aug 2014

As we already know from the key that it's a URI, and the value is quoted, it feels like the backticks are redundant.

That is the problem, you need to know the value's type independent of tag names. Otherwise you can't do proper deserialization without knowing what every tag means in the source data. There are many use cases where maintaining type fidelity without knowledge of what the tags mean which are important (such as round tripping data across a protocol/file system).

Shawn Jacobson Mon 30 Mar 2015

Not 100% on how this maps to grids. In the Java Haystack Toolkit, it appears that units are lost in grids. Any thoughts on the best way to solve this?

We could parse all rows to determine if a column requires the tag.unit column
We could move HNum object into quotes so the unit can be included
We could make the HNum object an array so the number and unit are included [num, unit]
Move all JSON object into quotes and maintain ZINC as the parser for all objects. This appears to only affect HNum and HMarker objects, and to me would be the best approach.

I'm sure I'm missing some options, but what would be the best way to ensure grids maintain units?

Brian Frank Mon 30 Mar 2015

Shawn, see topic 258. Given that no one seems to have additional feedback, that is the new design I'll be using in the Java Toolkit sometime in next few weeks

Shawn Jacobson Thu 2 Apr 2015

No problem, I like the solution in topic 258 which solves this issue as well. Thanks!

Project Haystack

#50 JSON parsing of query?his

Christian Tremblay Fri 18 May 2012

Brian Frank Fri 18 May 2012

Christian Tremblay Fri 18 May 2012

Brian Frank Fri 18 May 2012

Christian Tremblay Fri 18 May 2012

Kevin Kelley Fri 18 May 2012

Andy Frank Fri 18 May 2012

Brian Frank Wed 23 May 2012

Tucker Watson Fri 30 Nov 2012

Brian Frank Fri 30 Nov 2012

Tucker Watson Sat 1 Dec 2012

Alper Üzmezler Sat 1 Dec 2012

Andy Frank Mon 3 Dec 2012

Brian Frank Wed 7 May 2014

Jason Briggs Wed 7 May 2014

Richard McElhinney Thu 8 May 2014

Mike Jarmy Thu 8 May 2014

Brian Frank Mon 12 May 2014

Ben Cromwell Fri 22 Aug 2014

Brian Frank Fri 22 Aug 2014

Shawn Jacobson Mon 30 Mar 2015

Brian Frank Mon 30 Mar 2015

Shawn Jacobson Thu 2 Apr 2015