#716 An open source tool for tagging point names

Si Chen Thu 13 Jun

Hello everybody,

It seems a lot of us are looking for ways to parse and tag point names efficiently, and this is a problem we should try to work on together to get more adoption for haystack.

I'm thinking about building some tools into our open source app, opentaps SEAS (https://github.com/opentaps/opentaps_seas), for doing this:

  1. User interface to filter through list of data points. You can choose the data points and then add tags to all of them at the same time. The filter could have several contain and not contain, so you can say "contain DAT" and "not contain SP", for example. The filter could also be regexp, for those of you who want to use it :)
  2. Save the filter and the tags applied as a "rule", so "contain DAT" and "not contain SP" -> discharge, air, temp, sensor
  3. Save many rules together as a "rule set" - "Siemens points" or "Building X points"
  4. Run a rule set on all points. It will show all the points and what tags would be applied, and you can choose the ones you want to apply.
  5. Run a report of all your data points to see what tags have been applied, and also which points have no tags applied.

Please let me know what you think of this? Do you know of a better way to do this?

Joel Bender Thu 13 Jun

Sounds like you are talking about classification trees, a kind of Decision Tree. If you can find some syntax patterns in your names that you can rip apart into tokens, feed the set of tokens along with a classification label (the "leaf nodes" of your tree) into a machine learning application as training data. Rinse and repeat.

For generating Brick content, which can then be turned into collections of tags, you might want to start with plastering.

Si Chen Fri 14 Jun

That is very interesting. Thanks for the suggestion.

Have you tried this approach?

What kind of classification accuracy can you get with it?

How well does it work across different vendors’ or building owners’ naming schemes?

Gabe Fierro Fri 14 Jun

I would recommend checking out some of the academic work in this space, which presents some real-world results at applying these techniques to different naming schemes and vendors:

Its hard to reason generally about the kind of accuracy you can get because this depends so much on the particular idiosyncratic and inconsistent naming scheme used in a building. These papers present a semi-supervised approach where a set of rules for normalizing data is learned from asking an "expert". The key then becomes how to be sample-efficient with that expert: how do you choose a minimal set of examples that let you parse the most of the point names you have available.

I've started work on translating between Brick classes and tags (not Haystack tags yet, but they will be) here. This could be applied to the output from the Plaster library Joel linked above

Si Chen Mon 17 Jun

Thanks for the links, Gabe. I've read the Plaster article, but not the first paper you cited, so it was pretty interesting to see their approach as well.

After some more thinking, it seems the real issue is what users feel comfortable with, especially with any kind of AI-enabled technology. For example, maybe autonomous vehicles are safer, but most people prefer driving. Or a closer example: there's a tool in Quickbooks for classifying bank and credit card expenses. It will let you create rules (if "Citi Autopay" then vendor = "CitiBank" and account="..."), and it will also infer them for you. When I use this tool, if it's not classifying because I haven't created a rule, I'll go and create a rule. But if it mis-classifies some entries, I'll think "This program is broken again."

Carlos Rivera Tue 18 Jun

Hi Si Chen, that looks like a great idea. Where I work at Sistrol I have developed a semi-automatic classifier, but it is kind of messy and of course it is not reliable 100%. I take the uri of a point and split it into tokens. Then using a set of dictionaries I try to obtain classes to which the point belongs. The result is points classified into vectors, a vector of the form: (Installation, Site, Equip, SecEquip, EquipName, SecEquipName, Point, PointName)

After that we build a haystack model, again extracting tags from dictionaries.

It is not perfect, and it requires a lot of user input, fixing the dictionaries and fixing the classification (it would be great to use machine learning to automatically update the dictionaries, but we are far away from that).

Also the user interface is pretty bad, just imagine that the dictionaries are written using Microsoft Excel!! Although that is changing as we are slowly migrating everything into MongoDB (The application is web based written in Python with Flask).

I like your idea because you are thinking right away about a user interface, I think that's a big part of the problem.

What I do is an in-house development, it would be great if I could open source it but it is not up to me. I will ask though, or maybe they let me collaborate with opentaps SEAS (I still need to take a good look at it).

Regards.

Gabe Fierro Tue 18 Jun

I would agree with you that a key part of an effective machine learning approach will be making the results explainable. The tool should inform the user why it chose the tagging/classification it did so that the user can provide feedback. Hopefully this feedback can be used to improve the tool in an automated way.

Setting aside the issue of how to determine which Haystack tags are "sufficient" for a consistent interpretation of some concept, the primary issue in developing these automated techniques is a lack of source (training) data (with or without accompanying ground truth). The point-parsing approaches developed in the literature rely upon a semi-supervised approach in part because there is no authoritative dataset of building metadata that could be used to train a more general model.

Would folks/companies be willing to contribute dumps of point/equip names from their installations in order to produce such a dataset?

Si Chen Thu 20 Jun

Thanks for your feedback! We're going to be working on this and will let you know when we have something to show you :)

Si Chen Mon 15 Jul

Hey everybody,

Here's a video of our new feature to define rules for haystack tags: https://www.youtube.com/watch?v=pTu_ITKoGHc&feature=youtu.be

Let me know what you think!

Terry Herr Sat 20 Jul

Great topic and thread.

I consider this problem [the time/cost involved with tagging and its inconsistency] to be second most important problem to solve to achieve mass adoption of analytics in commercial buildings. The first is just having one standard.

We have manually tagged a lot of buildings and would be happy to contribute data sets to anyone who can help contribute to an open source solution.

Login or Signup to reply.