#90 i18n

Matthew Lohbihler Mon 3 Jun 2013

Has anyone done any work in internationalizing Haystack?

Christian Tremblay Wed 26 Jun 2013

Don't think so. I would gladly help for french.

Brian Frank Wed 26 Jun 2013

We have customers of SkySpark who have translated many of the tags, but will need to check how willing they would be willing to donate their work.

Matthew Lohbihler Fri 28 Jun 2013

Actually, i'm unsure of how it would be i18n'ed anyway. We would need alternate language equivalents of all tags and enum values (plus "and", "or", "not", "true", false", etc). Queries would need some manner of declaring the language that they are in so that they could be translated to the language of the target tag database. We might even need to standardize number and date formats per locale.

Custom tags and values would break the whole thing, but if we leave them out or declare them to be un-translated, the rest is at least feasible.

Christian Tremblay Fri 28 Jun 2013

I would not translate queries or code related terms... It's common to program using English.

From my point of view, translation is a GUI matters. We shouldn't fall into the same error than Microsoft Excel... That brings a lot of problems when creating macro for both English and French version... (function names are translated so in french you use

=SOMME(A1:A2) instead of =SUM(A1:A2)

If possible, I would vote for a kind of "display Name" wich change depending on selected language... but under the hood, english tags are used.

Christian Tremblay Fri 28 Jun 2013

And one last detail... a problem that could complicate things if we translate requests

special characters

  • température
  • dérogation
  • è, à, ç, û, î ... etc...

Rest API requests will be polluted by a lot of &20 kind of replacement characters...

Matthew Lohbihler Fri 28 Jun 2013

Christian, good points. A lingua franca works for me, as long as it's English. :)

Brian, to what extent did your customers translate tags? As i had described? Given Christian's comments, why did they feel the need to do so?

Brian Frank Fri 28 Jun 2013

I think the programmatic name of tags should always be ASCII and English - as Christian said its the lingua franca of the programming world.

However, its nice to have a mapping of tag names to their translated text in different languages. For example, in our UI the tag names become the column names in most tabular UIs. But we can then map those tags to their localized strings for end users.

Matthew Lohbihler Mon 1 Jul 2013

I hear what you guys are saying, but I don't think this issue will go away so easily. Haystack is a tag library and query language, not a programming language. It differs significantly in terms of audience and translatable scale.

Most programming languages have fewer than 50 reserved keywords. If you've ever seen code written by non-English speakers, you could not claim that English is the lingua franca of programming; only the language of the keywords. Haystack has around 130 tag names and some extra number of enum values, and with the current rapid advancement and others that are sure to follow, this number could easily climb into the several hundreds. People will not be as willing to learn all of these terms when there are so many of them. And note that many of them are not even words, but acronyms and abbreviations (if that).

And the people who need to learn them are also greater in number. It's not just the relatively limited number of programmers that will use haystack; its audience will be much wider.

I expect that non-English speakers will eventually develop tag sets in their own language, and that sooner or later a reconciliation will need to be done.

Christian Tremblay Tue 2 Jul 2013

There won't be an easy answer. One big challenge will be to find a way around language "order" for terms... let me explain by an example.

In english, tags are ordered in a logical way...that fits with language.

Outdoor air temperature sensor pretty cool and you can even say it, and if you print that on a GUI, it won't appear like a bunch of words put together.

In french, this is another thing

extérieur air température sonde is incorrect, feels like a robot trying to speak french :-) . It lacks order and prépositions wich could lead to an error for the word extérieur depending if we link the adjective to température or air

Sonde de température d`air extérieur and more commonly

Sonde de température extérieure

(see the last e for extérieure, it's needed as in french température is féminin... but air is masculin so...no e needed when used with air.

What I'm trying to illustrate : it's not only a matter of translating words. It's translating sentences and that is really hard to do for a software.

I think that if there's a kind of lexicon, it should be filled based on completed tags, not only words if the goal is to use them in a GUI.

We could also make the choice to translate only words. But this would lead to terms that cannot be direclty used in a GUI, but search would be possible (rechercher : sonde, température, air, alimentation)

I keep thinking that under the hood prog should be done in english.

Deborah MacPherson Wed 3 Jul 2013

Hi Christian -

I understand what you are saying about French syntax due to working on architectural specifications for a large hospital in Montréal. All of the spec sections are translated through a translation service and QAQC process we have devised for this project. What is making this all work however are "keynotes" - pushed from a .txt file I manage - into +/- 30 BIM models with a huge number of users in multiple offices. We don't want anyone physically typing any French, especially Québécois, where the accent marks are very important.

Each keynote presents on the drawings as the spec section number with the French term followed by English in parenthesis. For example 042000 - BLOC DE BÉTON (CONCRETE UNIT MASONRY). In the .txt file there are also some codes used by Revit that tie each note to objects and parts of objects. This is great because placeholders can be used such as 074213 - FRENCH TEXT HERE FRENCH TEXT HERE (PROVIDE WEEP TUBES AT ALL SOFFIT JOINTS). When translated, the French can be inserted to automatically update without users needing to retag. The French is always longer than English. Since translations are automatically pushed into detail components and drawing sheets, a general idea of the length is needed so the final versions do not obscure important drawing information.

This same system is also being used for annotations - phrases - like DIFFUSEUR LINÉAIRE – VOIR MÉCANIQUE (LINEAR DIFFUSER – SEE MECHANICAL). You are completely right that many translations need discussion amongst the team to decide issues such as ....linking the adjective to température or air.

In addition to the .txt file which includes all keynotes and annotations currently in use, a Lexicon in Excel is also needed to keep track of what is to come, what has been deleted, to sort and check the same terms are used whether part of a roofing system, waterproofing, or other so the quality will be the same in all applications. For example BARRE D'EXTRÉMITÉ (TERMINATION BAR) means stainless steel of a specified size and material everywhere, not aluminum in some places.

Many sources are available for translating word by word. Termium especially is great http://www.btb.termiumplus.gc.ca/ with different definitions for different domains. However there are no effective resources for phrases and syntax.

If Project Haystack's tag model was to expand to include phrases, situations like linking the adjective to température or air will need to be part of the GUI since either could be needed and choosing one versus the other might make a huge difference.

Matthew Lohbihler Wed 3 Jul 2013

Christian,

Also note that i was not suggesting that Haystack should support natural language queries. Indeed, queries in English are not English: "point and (air or outside) and not occupied". Like SQL, queries can make grammatical sense when they are short, but quickly become robotic-sounding with complexity. And note that query terms are associative, so the order in which terms are provided doesn't matter.

As long as there is a one-to-one mapping of tag names et al between supported languages, query translation is trivial.

Christian Tremblay Wed 3 Jul 2013

You are right about natural language vs queries. I was more refering to the fact that english tags, when viewed (if used directly in a GUI) are almost perfect vs language...

Discharge Air Temperature Sensor makes sense if I'm talking to someone.

This said, I would be perfectly comfortable with a query like

SELECT air, alimentation, température FROM db WHERE type = sonde

Maybe a better idea would be something handled by a user interface so the user would select tags in its own language. Query would be built with english equivalent, using AS parameter with the locale...

Example : From an interface I'm looking for sonde, température....

Query built use :

SELECT sensor AS sonde, temperature AS température from db etc...

Result can be shown using local language.

This is easy to implement as we can just choose to work with "masculin" words. Using a kind of table lexicon.

Login or Signup to reply.