Data and information models – Rationale

The aim of the working group’s data model work is to try to give a picture of how the different data models and initiatives on the market relate to each other. The work is by no means exhaustive, but rather aims to give a procurer a better idea of what to expect from different models. The working group aims to:

  • Create the conditions for reuse of data in business systems

    • Make it easier to open up silo systems and make data available.

    • Facilitate horizontal integration of data

  • Create conditions for data-driven innovation

  • Make it easier for municipalities and regions to put IoT systems and system components into competition.

NOTE: In this document, we use the term information model to describe information models, data models and ontologies. Although this is not entirely optimal, it is done for simplification. It is not entirely clear what is what when information is retrieved from the respective publishers.

The working group has looked at different data models in order to give a procurer a better idea of the way forward, but as it is a fragmented market where changes happen quickly, the working group has not been able to provide specific guidelines or guidance beyond the recommendations in the IoT-2 and IoT-3 principles. Data models are continuously being worked on in different parts of the world. Procurers therefore need to continuously monitor developments.

For the sake of analysis, the group has divided the information sets into three parts throughout this section:

  • Sensor domain – that which is closest to the sensor and handles information that is close to the sensor

  • IoT domain – that which is focused on being able to share information between different applications

  • Application domain – where the needs of the application drive the design.

About IoT information and data models

The different models that the working group looked at could be classified as information model, data model or ontology, or some kind of mixture of these. Each is intended to serve a specific purpose. It is this purpose that creates the benefit for the implementer. At the same time, it poses a problem because some of the models are quite rigidly controlled in their content. While others are more rigidly controlled in their structure.

The working group looked at LightWeightM2M, OneM2M, OGC/W3C SSN/SOSA and FIWARE.

The above figure is intended to visualise how the different standards relate to the above-mentioned information domains. The working group would like to stress that this is not an exact science, but a way to try to clarify what the landscape around information models looks like.

Synchronicity

Synchronicity has developed a recommendation on which data models should be used for IoT solutions.

LightWeightM2M

LightWeightM2M has developed a number of standards for data models.

Open Geospatial Consortium and W3C

Open Geospatial Consortium and W3C have jointly developed proposals for ways to access information from physical IoT devices.

SAREF

SAREF, part of ETSI (telecom standards body), has developed a reference ontology for the IoT area; see here. The SAREF ontology is available in a so-called core, and extensions. These extensions are available for several domains e.g. Energy, Environment, Building, eHealth/Aging well, Water. The OASC MIMs recommend the use of SAREF.

OneM2M

OneM2M (body for open industry standards in IoT) has a stated strategy to be able to share data across silos; see here. The ontology is based on the idea of being a “Common Service Layer”. OneM2M includes a REST API, where data is translated between different data models.

FIWARE

FIWARE is a framework for achieving interoperability, with a number of data models which they call Smart Data Models. In conjunction with the data models, a context broker is implemented based on the NGSI-LD standard from ETSI. These data models are recommended by OASC (Open Agile Smart Cities), and NGSI is recommended by CEF (Connecting Europe Facility).

For IoT, data model-related problems are nothing new. It is a classic integration problem between industries and sectors that already have established data models. However, in IoT, the problem becomes particularly clear since IoT data is to be shared and used in a number of business systems. There are a number of different solution patterns for this. The data models therefore often become a matter of a sensor perspective or an operations perspective.

What the working group has seen is that the problems with data models can be broken down into four different elements that need to be considered:

  • Problem one – the object definition

    • What is the object. A swimming spot is different things to different actors/administrative departments (the Leisure and Recreation department needs to keep track of everything there, Planning just wants to be able plan for the swimming spot, and Communication wants to publish the water temperature). In our example, we do not know if all three departments can agree that it is actually a swimming spot. Someone might consider it an outdoor pool, a beach, etc.

      • In the IoT perspective, it is about defining the different objects being managed.

      • The physical reality (defined as FeatureOfInterest in several ontologies) – e.g. the water temperature in Alby Lake.

      • The virtual object describing the FeatureOfInterest – e.g. the field for temperature value, the coordinates of the lake

    • You could say that problem one relates to the term model to be used

  • Problem two – the object description

    • How are the defined objects described? Is there a list/lists to base this on?

    • In the IoT perspective, it is about the information models and classifications available to describe the FeatureOfInterest, Observation, Sensor and ObservableProperty (see definitions here https://www.w3.org/TR/vocab-ssn)

  • Problem three – the degree of generalisation or LevelOfDetail (LOD)

    • This is actually a sub-problem of problems one and two. For example, a swimming spot is described with a detailed polygon or a point.  Or is it a building complex or a wall in a room.

    • This problem is strongly related to digital twins.

  • Problem four – How to model the digital model (which should represent reality to some extent)

    • There is a plethora of models here, e.g. SAREF, FIWARE, OneM2M, etc.

Data and information management in IoT systems – rationale

The working group sees the following factors as important for municipalities and regions to consider in managing data and information from IoT systems

  • Cost of maintenance of data and information from IoT systems

    • The working group sees that the cost of storing data and information is negligible in most cases. However, the cost of maintaining collected data is a more noticeable cost. Maintenance of data and information requires municipalities and regions to describe quality of information, have information owners, handle information maintenance, handle information in accordance with current legislation (e.g. archiving, data purging, GDPR, privacy legislation, etc.), manage information (storage, backup, data sharing, etc.).

  • Information ownership for data and information from IoT systems

    • Within each municipality and region, there needs to be one or more people who own the data and information collected in IoT systems. Without active information ownership, there is a high risk that too much or too little data and information is stored, which can have major consequences for the activities that use the information from the IoT system. The information owner is responsible for ensuring that the information coming in from IoT systems is valuable to the applications that use the information.

    • The information owner needs to ensure that their data sets comply with all relevant legislation related to data and information. In practice, this means that there should be no data without an information owner.

In simplified terms, information and data from IoT systems can be described as illustrated above. Here, the information is divided into three different domains, depending on its nature. A municipality/region needs to manage information and data in all three domains in one way or another. The working group has based its rationale regarding information and data on the above illustration.

 

DESCRIPTION OF FIGURE:

The figure above describes how data and information flow between IoT devices and applications.

Issues that a procurer needs to consider regarding data and information

Who is the information owner and who is the information manager?

In an IoT system, data and information flows between the IoT device (sensor/actuator) and the application. The IoT system offers a variety of services to manage data and information along that path. These can be processes that modify data and information (e.g. data editing and enrichment), storage services, etc. A reasonable assumption is that there is one and only one information owner who owns the information in all systems that manage it along the path to the application, e.g. one information owner for swimming water temperatures in the entire information flow. The information owner then needs to be responsible for ensuring the needs of all applications.

The information manager can vary in the data flow of IoT data. The working group finds that the information manager can be considered responsible for:

  1. The systems that manage and process IoT data and information on its path between IoT device and application.

    1. An information manager might maintain a specific value for only a few milliseconds, as it is transported (e.g. in a mobile network).

    2. Another information manager maintains stored information for a longer period of time.

    3. Yet another information manager is responsible for processing the information, e.g. adding context, quality assuring data series, etc.

    4. Further information managers may be responsible for the application of data, i.e. that the organization receives the services and data it needs.

In relation to the IoT system, the working group believes it is important to lift the information ownership aspect from a specific system and instead look at the entire flow and usefulness of the information in the organisation.

What data must or should a municipality/region store?

Which data should be stored and for how long depends entirely on operational needs that need to be met. A municipality/region can choose different approaches to data and information storage. Some of these are listed below, along with their advantages and disadvantages.

Different strategies can be chosen for different data sets within the same IoT system, provided that the needs of the application are always met.

  1. Store everything (both raw data and processed data)

    1. A municipality/region can play it safe and require all data to be stored.

    2. Advantages of storing all data

      1. All data is available and it will be easy to change either raw data or processed data.

    3. Disadvantages of storing all data

      1. This type of strategy is likely to be costly for the municipality/region, mainly because the maintenance of data and information is costly (see the discussion above about why data maintenance is costly).

      2. There is a risk that the same information will exist in many different versions and processing variants in the IoT, with the risk of data being used incorrectly.

      3. By storing all data, there is an increased risk of violating various laws and regulations.

  2. Store only raw data and have defined quality processes for processing data for applications

    1. Supplemental description: All raw data is stored as the lowest common denominator [minimum] for all applications that use data from the IoT system. The municipality/region defines what processing routines are required for the different applications, and procedures for these are developed.

    2. Advantages of storing only raw data

      1. Raw data exists to be able to reproduce processed data

      2. It is relatively easy to change processing routines and maintain a complete history (e.g. if a bug is detected in processing routines)

      3. Data is stored only once, except for the processing quantity stored in the respective application.

    3. Disadvantages of storing only raw data

      1. By storing raw data, there is a risk that the municipality/region will store large quantities of data that will never be used.

      2. Processing routines can be complicated to maintain if they have to take into account different levels of quality in data from different sensors or time epochs. By the same token, metadata can become complex and difficult to understand, making the data difficult to use.

  3. Store only processed data

    1. With this behaviour, only the processed data that is created is stored. Raw data and intermediate data are deleted.

    2. Advantages of storing only processed data

      1. Less data to maintain, store and describe.

    3. Disadvantages of storing only processed data

      1. No raw data available to recreate processed data, e.g. if there is a bug in processing routines or if a new application needs customised historical data.

Some questions to ask to get a picture of what data the municipality/region wants to store

A useful strategy in this context is to decide how data will be purged and erased as early as the start of the collection stage. Some data may need to be archived, and perhaps handed over to archives for future generations.

  • What are the legal requirements for the collected data?

    • Personal data (Personal Data Processor, Personal Data Officer), archiving, Open Data Directive, principle of public access to information and secrecy, open data, etc.

  • How important is it to be able to modify historical processed data?

    • E.g. if there are errors in processing routines, take additional applications into consideration.

  • How important is it to be able to use and understand data and its quality after a long period of time?

    • This places demands on how well documented the data should be.