Working Group

#1070 Working Group for Machine Learning

Jan Široký Thu 8 Jun 2023

Hi, my colleagues and I have been developing various machine learning applications in Haystack-compatible environments (see https://energytwin.io/). We have consistently faced challenges in defining machine learning model tags. Questions have arisen, such as how to link a model with a specific point, whether it is possible to define multiple models for one point, and how to reference model-independent variables, among others.

I would like to propose the creation of a working group to help define tags related to machine learning. We will be happy to share our experiences and ideas with the community and collaborate in the process of defining these machine learning tags.

Who in the community would be interested in joining this working group?

Rick Jennings Thu 8 Jun 2023

Hi Jan,

This sounds like a great initiative and I would be interested in joining this working group.

Thanks for setting this up!

Rick

Georgios Grigoriou Fri 9 Jun 2023

Hi Jan,

Happy to join the cause and contribute!

Georgios

Stephen Frank Fri 9 Jun 2023

I am also interested. We have some custom tagging we developed at NREL for this purpose that we can share.

annie dehghani Fri 9 Jun 2023

I would be happy to contribute what we have developed for this purpose as well. Like Stephen, we have developed some custom tagging. It would be great to define this in a more formal and uniform way.

Also, there was an ask at Haystack Connect to help model simulation data, and it occurs to me that there may be a lot of cross over with this group. Maybe this group is open to modelling "simulation" results more broadly whether it's physics-based simulation, machine learning, simple regression etc?

Adam J Wallen Mon 12 Jun 2023

Hi ML Team,

Craig Stevenson and I presented at Haystack Connect and are interested in the possible synergy between physics-based simulations tagging and machine learning tagging. Thanks for the shout-out, Annie!

Thanks, Adam

Jan Široký Fri 16 Jun 2023

Hi Annie,

This is a good point. We can consider referencing simulation data as well. This referencing principle can be applied to both physics-based models and machine learning models.

Thanks, Jan

Jan Široký Tue 20 Jun 2023

Hi all,

I have sent you the invitation for the kickoff meeting scheduled on August 10th at 11 am ET.

Please feel free to share your ideas or requirements with me prior to the kickoff. I will make an effort to incorporate them into the first draft.

Thanks, Jan

Keith Bishoρ Tue 20 Jun 2023

Looking forward to working with this group.

Jan Široký Fri 10 Nov 2023

It took us some time to process all the inputs (special thanks to Keith B. and Stephen F.). However, we have successfully formulated the initial proposal.

As deliberated during the ML WG calls, our preference is to commence with a less exhaustive definition, allowing flexibility for the end user (e.g., we refrain from specifying how the identified ML parameters should be stored).

The definition provided below is a preliminary draft; we invite your comments on any aspect, with a particular focus on naming conventions, any missing elements, and potential use cases that may not be addressed.

def:^mlModel
is:^entity
mandatory
doc:
  Machine learning model entity representing an overarching container for 
  various components, including inputs, outputs, parameters, and metrics.
---
def:^mlInputVarRefs
is:^list
of:^mlVarRef
tagOn:^mlModel
doc:
  List of independent variables, also known as model inputs or features,
  associated with a machine learning model.
---
def:^mlOutputVarRef
is:^ref
of:^mlVar
tagOn:^mlModel
doc:
  Dependent variable, also known as the model output or target,
  associated, with a machine learning model.
  Represents the predicted outcome generated by the model.
---
def:^mlIdentificationPeriod
is:^span
tagOn:^mlModel
doc:
  Training period description, known as the identification period
  or baseline, utilized during the model training process.
---
def:^mlModelParameters
is:^dict
ro
tagOn:^mlModel
doc:
  Result of model identification, which may appear as a list of
  model parameters for simpler models or as a reference to a stored model,
  in the form of a file uri. The structure of the dict is user-specific.
---
def:^mlModelMetrics
is:^dict
ro
tagOn:^mlModel
doc:
  Goodness-of-fit metrics provided in the form of a simple dictionary.
  For example: {r2:0.7889, cvrmse:58}.
---
def:^mlVar
is:^entity
mandatory
doc:
  Machine learning variable representing both model inputs and outputs.
---
def:^mlVarPoint
is:^ref
of:^point
tagOn:^mlVar
doc:
  Reference to a point associated with a machine learning variable,
  known as a machine learning variable point.
---
def:^mlVarFilter
is:^filterStr
tagOn:^mlVar
doc:
  Filter used for querying points by tags, providing more flexibility
  than mlVarPoint, although it is not mandatory.
---
def:^mlVarRef
is:^ref
of:^mlVar
doc: Reference to a machine learning variable.
---
def:^mlModelRef
is:^ref
of:^mlModel
tagOn:^mlPrediction
doc:
  Applied to a prediction point, referencing the specific
  machine learning model used for generating predictions.
---
def:^mlPrediction
is:^pointFunction
doc: Point is a prediction or forecast of another point. 

Login or Signup to reply.