#124 NHaystack -- request for comment on new approach to ID generation

Mike Jarmy Thu 5 Sep 2013

Currently, nhaystack generates IDs by creating a base64-encoded version of the component's slotPath, or history's ID, like so:

c.c2xvdDovRm9vL1NpbmVXYXZlMg~~
h.L25oYXlzdGFja19zaW1wbGUvQXVkaXRIaXN0b3J5

Several people have complained that these IDs are too opaque and difficult to work with. After consulting with a couple of other members of the community, I've come up with a new approach, but I'd like to request comments from the rest of the community before we finalize the design.

First off, I originally adopted the base64 approach because the Haystack REST API is an HTTP protocol. As such, it would be really nice if all IDs followed the URI specification, so that they can be resovled in a browser via HTTP GET. The URI specification has a restricted range of characters that are allowed. In particular, the "/", "$" and " " characters are disallowed in identifiers. After a fair amount of experimentation, I decided the simplest approach would be to just base64 encode the slotPath, so that we would have a bi-directional encoding. This approach was simplest for me, but has proved problematic for several users.

The new design will always accept the old base64 IDs and resolve them. However, it will no longer generate them. Instead, by default it will generate IDs that are "mangled" versions of the slotPath and historyID, like so:

slot:/Foo/SineWave1 --> C.Foo.SineWave1

history:/nhaystack_simple/AuditHistory --> H.nhaystack_simple.AuditHistory

As you can see, we use "C" and "H" for the space identifiers, and mangle the slotPaths by replacing the "/" characters with "."

Another major change forthcoming in the new design is that, for Components, if they are tagged up properly as part of a Site-Equip-Point heirarchy, then an encoding of their "nav path" will be used instead. This is a really important feature that will improve the ease with which third party tools can analyze nhaystack data.

So the Site-Equip-Point (aka "SEP") style ID will look something like this:

S.Carytown.RTU1.OAT

but only for Components that are tagged up properly. Note that in the example, each of the three levels in the ID containing the navName of the Component at that level.

Note that the component in that example can actually be resolved via two different IDs, the SEP ID and the Component ID (actually its three if you include the old base64 version).

By the way, within the AX station, the linkages between refs will actually be encoded in the bog file as Component IDs, although to the outside world the SEP ID will be presented.

However, there is one more major wrinkle -- namely special characters like $20 and the like. $20 represents " ", the space character. Its common to find these special character encodings in Niagara slotPaths, for all kinds of characters, like ",", "#", "-", "/", etc, etc.

We could still maintain the bidirectionality of the ID encoding if we just replaced the "$" with a URI-friendly character like "~". The problem with that approach is that the resulting ID is pretty ugly.

slot:/Foo/Sine$20Wave1 --> C.Foo.Sine~20Wave1

So what I've chosen to do is use a dash, "-", to represent spaces, like so:

slot:/Foo/Sine$20Wave1 --> C.Foo.Sine-Wave1

This is much more readable. At this point we still have bi-directionality, but I am leaning towards using the "-" character for a couple of other characters in the interest of promoting readability of the IDS (and using the ugly "~" for the rest). So far the two candidates are "#" and "-", which encode in AX to $23 and $2d respectively.

slot:/Bar/Sine$23Wave2 --> C.Bar.Sine-Wave2
slot:/Abc/Sine$2dWave3 --> C.Abc.Sine-Wave3

We would loose bidirectional ID encoding if we did this, which would make the internal workings of nhaystack rather more complicated, but I think its probably worth the trade-off.

So my primary question to the community about this design is: which characters (if any?) other than " " ought to be blessed as being made extra-readable in the new IDs. Which special characters do you come across most often? Also, if you have any sample config.bog files that you like to send me that have a lot of special characters and/or unusual slotPath names, that would help me considerably.

Of course, comments or questions about the overall design itself are welcome as well.

I am most of the way done with coding this new design, so once its firmed up, a new version of nhaystack will be posted in a few days.

Christian Tremblay Thu 5 Sep 2013

How will you deal with accentuated characters like é, è, à, û, î, etc... ?

Mike Jarmy Thu 5 Sep 2013

They would be encoded according to their $-style hexidecimal encoding.

For instance, the identifier "extérieur" is encoded as "ext$e9rieur" in a bog file, because the ascii code in hexidecimal for "é" is E9.

That identifier would be encoded as "ext~e9rieur" in the URI-friendly haystack encoding. Since we will publish how the encoding works, an external tool that is calling into the Niagara station as a client could take the encoding and display it to the user in its correct form as "extérieur".

Stuart Longland Thu 4 May

Hate to bump an old thread, but a silly question. How are other non-latin1 characters encoded?

If a JACE was deployed with nHaystack in China, how would the Chinese characters be represented and how do we determine this as a Project Haystack client?

Richard McElhinney Tue 16 May

Hi Stuart,

I haven't tried so that would be something that would need to be looked into.

Have you tried doing a deployment with non-latin1 characters?

Regards, Richard

Login or Signup to reply.