Health information model

From Endeavour Knowledge Base

This article describes the approach taken to producing information models, including ; what they are, what their purpose is, and what the technical components of the models are.

The article does not include the content of any particular model.

What is the health information model (IM) and what is its purpose?

The IM is a representation of the meaning and structure of data held in the electronic records of the health and social care sector, together with libraries of query, extract and mappings.

The main purpose is to bridge the chasm that exists between highly technical digital representations and plain language so that when questions are asked of data, a lay person could use plain language without prior knowledge of the underlying models

It is a computable abstract logical model, not a physical structure or schema. "computable" means that operational software operates directly from the model artefacts, as opposed to using the model for illustration purposes. As a logical model it models data that may be physically held any a variety of different types of data stores, including relational or graph data stores. Because the model is independent of the physical schemas, the model itself has to be interoperable and without any proprietary lock in.

The IM is a broad model that integrates a set of different approaches to modelling using a common ontology. The components of the model are:

  1. A concept ontology, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontology is made up of the world's leading ontology Snomed-CT, with a London extensions, various code based taxonomies (e.g. ICD10, Read, supplier codes and local codes)
  2. A data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by live systems that have published data, Note that this data model is NOT a standard model but a collated set of entities and relationships bound to the concepts based on real data, that are mapped to a common model.
  3. A library of business specific concept and value sets, which are expression constraints on the ontology for the purpose of query
  4. A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
  5. A library of Queries for querying and extracting instance data from reference data or health records.
  6. A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model.
  7. A super language including the main semantic web vocabularies (RDF,RDFS,OWL2,SHACL) as well as a set of Discovery vocabularies designed for health data modelling.
  8. A query model, which is a high level model of processes and queries held in the query library and directly mapped to mainstream query languages such as SPARQL and SQL.
  9. An open source set of utilities that can be used to browse, search, or maintain the model.

The remainder of this article considers how models and ontologies can be constructed using this approach.

Conceptualisation?

The main difference between the Discovery IM and many other approaches is the harmonisation of the terms used in the conventional 'terminology' domain and the terms used in the conventional 'data model' domain. Both are considered part of the one ontology and the two link at the level of property values.

All types, classes, , property identifiers and object value identifiers are uniquely named using international resource identifiers. In most cases the identifiers are externally provided (e.g. Snomed-CT identifiers) whilst in others that have been created for a particular model. Organisations that author elements of the models use their own identifiers.

From a data modelling perspective the arrangements of types may be referred to as archetypes, which are conceptually similar to FHIR profiles. There are an unlimited number of these which frees the model from any particular conventional relational database schema. Inheritance of types is supported which enables broad classifications of types and re-usability.

The two disciplines (terminology and data modelling) are different, but they are obviously related. The binding between a data model and the range of values that should be applied to a property of an entry creates an interdependency, making sure that the data model and the values are synchronised. For example, an encounter record entry may be defined as a record of an "interaction between a patient (or on behalf of the patient) and a health professional or health provider". The encounter entry has a property which bound to the concept of encounter which is itself semantically defined. In other words the data model of an entry of an encounter links to the type of encounter it is a record of.

The data model does not use the idea of "tables". Tables in the relational database sense of the word may be used to implement the model. There are an unlimited number of data model entity types, each one varying according to their properties, and arranged in a class hierarchy. If records are implemented in a graph data base there would be a 1:1 relationship between a data model shape and a type, but if implemented in a database the number of tables could vary from ONE to ANY number, depending on performance and maintenance factors.
Benefits of harmonisation accrue in user interfaces. For example, is a user elects to search for a systolic blood pressure, the application can use the information model to discovery that an entry for a systolic blood pressure will have a date and probably a numeric value.

Types of data as a graph

The data in a health record stored can be conceptualised as a set of relationships between one thing and many others. Some people call this a graph. Others call these objects , properties and values. When entities and properties are grouped, they are sometimes called archetypes.

Information model language

Main article information modelling language describes the language in more detail.

The semantic web approach is adopted for the purposes of identifiers and grammar. In this approach, data can be described via the use of a plain language grammar consisting of a subject, a predicate, and an object; A triple, with an additional context referred to as a graph or RDF data set. The theory is that all health data can be described in this way (with predicates being extended to include functions).

However, the semantic web languages are highly complex and a set of more pragmatic approaches are taken for the more specialised structures.

The consequence of this approach is that W3C web standards can be used such as the use of Resource Descriptor Framework or RDF. This sees the world as a set of triples (subject/ predicate/ object) with some things named and somethings anonymous. Systems that adopt this approach can exchange data in a way that the semantics can be preserved. Whilst RDF is an incredibly arcane language at a machine level, the things it can describe can be very intuitive when represented visually. In other words the Information modelling approach involves an RDF Graph.

In addition to semantic web languages, other commonly used languages are in place are used to enable the model to be accessed by more people.. For example the Snomed-CT expression constraint language is a common way of defining concept sets. ECL is logically equivalent to a closed world query on an open world OWL ontology. The IM language uses the semantic language of SPARQL together with entailment to model ECL but ECL can be exported or used as input as an alternative.