Health information model: Difference between revisions

From Endeavour Knowledge Base
(6 intermediate revisions by the same user not shown)
Line 3: Line 3:
The article does not include the content of any particular model.  
The article does not include the content of any particular model.  


== What is the Discovery health information model (IM)? ==
== What is the health information model (IM) and what is its purpose? ==
The IM is a representation of the meaning and structure of data held in the electronic records of the health and social care sector, together with libraries of query, extract and mappings.
The IM is a representation of the meaning and structure of data held in the electronic records of the health and social care sector, together with libraries of query, value sets, concept sets, data set definitions and mappings.


It is an abstract logical model, not a physical structure or schema. As a logical model it models data that may be physically held any a variety of different types of data stores, including relational or graph data stores. Because the model is independent of the physical schemas, the model itself has to be interoperable and without any proprietary lock in.
The main purpose is to bridge the chasm that exists between highly technical digital representations and plain language so that when questions are asked of data, a lay person could use plain language without prior knowledge of the underlying models.


The IM is a broad model that integrates a set of different approaches to modelling using a common vocabulary. The components of the model are:
It is a computable abstract logical model, not a physical structure or schema. "computable" means that operational software operates directly from the model artefacts, as opposed to using the model for illustration purposes. As a logical model it models data that may be physically held any a variety of different types of data stores, including relational or graph data stores. Because the model is independent of the physical schemas, the model itself has to be interoperable and without any proprietary lock in.


# An ontology, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontology is made up of the world's leading ontology Snomed-CT, with a London extension and supplemented with additional concepts for data modelling.
The IM is a broad model that integrates a set of different approaches to modelling using a common ontology. The components of the model are:
# A data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by  live systems that have published data, Note that  this data model is NOT a standard model but a collated set of entities and relationships bound to the concepts based on real data,  that are mapped to a single model.
 
# A library of business specific concept and value sets, which are  expression constraints on the ontology for the purpose of query
# A set of ontologies, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontologies is made up of the world's leading ontology Snomed-CT, with a London extensions, various code based taxonomies (e.g. ICD10, Read, supplier codes and local codes)
# A common data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by  live systems that have published data, Note that  this data model is NOT a standard model but a collated set of entities and relationships bound to the concepts based on real data,  that are mapped to a common model.
# A library of business specific concept value sets, (aka reference sets) which are  expression constraints on the ontology for the purpose of query
# A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
# A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
# A library of Queries for querying and extracting instance data from reference data or health records.  
# A library of Data set (query) definitions  for querying and extracting instance data from the information model, reference data, or health records.
# A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model.
# A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model.
# An open source set of utilities that can be used to browse, search, or maintain the model.
# An open source set of utilities that can be used to browse, search, or maintain the model.
# A modelling language using the World wide web semantic languages that can be used to exchange all elements of the model.


The remainder of this article considers how models and ontologies can be constructed using this approach.
<br />


== What is different? ==
== Model building blocks and visualisation ==
The main difference between the Discovery IM and other approaches is the harmonisation of the terms used in the conventional 'terminology' domain and the terms used in the conventional 'data model' domain. Both are considered part of the one ontology and the concepts often differ only in the approach to their definition..  
The model consists of classes, sets and objects that are instances of classes. 
[[File:Ethnicity.jpg|thumb|Ethnicity]]
Objects can act as objects in their own rights (e.g. an instance of chest pain) or may also act as classes (e.g. the class of objects that are chest pain). Likewise sets have members that are objects and the objects may also act as classes or sets. For example a set for the 2011 Ethnicity census will contain a member object of "British" which is also a set with members such as English and so on.


For example, an encounter may be defined as an "interaction between a patient (or on behalf of the patient) and a health professional or health provider". The definition of an encounter with Description logic as a subclass with some properties can be produced using the open world OWL2 grammar.
The model itself is stored as an RDF based  knowledge graph, which means it is implementable in any mainstream Graph database technology. There are no vendor specific extensions to RDF.  


However, an encounter can also be said to be a type of entry in a health record, as modelled in FHIR or openEHR. From this perspective an encounter will be defined as a logical 'schema' with a set of properties such as a date  or encounter type.
In line with the RDF standard,  all  persistent types, classes, , property identifiers and object value identifiers are uniquely named using international resource identifiers. In most cases the identifiers are externally provided (e.g. Snomed-CT identifiers) whilst in others that have been created for a particular model. Organisations that author elements of the models use their own identifiers.  


The two disciplines (Description logic and data model constraints) are different, but  they are obviously related. The binding between a data model and the range of values that should be applied to a property blurs the boundary between one and the other. This boundary is artificial and historical and harmonising the two into a single holistic whole enables very flexible storage database designs.
From a data modelling perspective the arrangements of types may be referred to as archetypes, which are conceptually similar to FHIR profiles. In the semantic web world they would be considered "shapes". There are an unlimited number of these which frees the model from any particular conventional relational database schema. Inheritance of types is supported which enables broad classifications of types and re-usability.  


Because ontologies use inheritance and entailment as their main approach to definitions this also enables simple hierarchical approaches to data models. Instead of everyone having to make the same choice between having an encounter table or consultation table, or A&E attendance table, a system can be designed to suit, for example having a single high level table such as an event table, or a small number of core tables with extensions, or a complete library of tables, or no tables at all. A map between the information model and the chosen schema is all that is needed.<br />Benefits of harmonisation accrue in user interfaces. For example, is a user elects to search for a systolic blood pressure, if the application uses the information model it  knows that a systolic blood pressure is held as a concept which is a value set of the concept property  in an observation and can automatically produce the related properties such as data and value
The variation between the parts of the model that model terminology concepts and those that model data use slightly different grammars in keeping with their different purposes. The information model language describes the differences.  
== Conceptualisation ==
[[File:Graph.jpg|thumb|Types of data as a graph]]


The data in a health record stored can be conceptualised as a set of relationships between one thing and many others.
The models can be viewed in their raw technical form (in JSON or Turtle) or can be viewed by the information model viewer at the online tool [https://im.endeavourhealth.net/#/ Information model directory] 
 
Some people call this a graph. Others call these objects , properties and values. From a grammatical language perspective they are subjects, predicates and objects.
 
The example on the right is entirely arbitrary but illustrates a problem. What does "condition record" mean, or indeed what is a "condition"? Why is a patient linked to a person and what does "linked to" mean? 
 
The answer is that the "terms" or "concepts" used in a model should be derived from a vocabulary whose terms have meaning and are formally defined. Some terms have meaning in whatever context they are used whereas others have different meanings in different contexts. In defining terms, it is necessary to defined them precisely enough for a computer to interpret the meaning safely i.e. the context of an idea is part of the idea itself.
 
The most difficult challenge is to agree the definition and meaning of the concepts in the context they are used. The agreement as to a particular model is less important. A definition defines a concept in relation to other concepts. Within a domain of interest such as healthcare, all concepts are indirectly related in some way to all other concepts in that domain. 
 
Luckily, standards have evolved to enable machine readable definitions.
 
The crucial step in the discovery approach is to apply this principle to both the things that are being recorded (such as clinical concepts), as well as the structure of entries in records themselves.


== Information model language ==
== Information model language ==
Line 52: Line 41:
''Main article'' [[Health Information modelling language - overview|information modelling language]] describes the language in more detail.
''Main article'' [[Health Information modelling language - overview|information modelling language]] describes the language in more detail.


The semantic web approach is adopted. In this approach, data can be described via the use of a plain language grammar consisting of a subject, a predicate, and an object. A triple. The theory is that all health data can be described  in this way (with predicates being extended to include functions).
The semantic web approach is adopted for the purposes of identifiers and grammar. In this approach, data can be described via the use of a plain language grammar consisting of a subject, a predicate, and an objectA triple, with an additional context referred to as a graph or RDF data set. The theory is that all health data can be described  in this way (with predicates being extended to include functions).
 
However, the semantic web languages are highly complex and a set of more pragmatic approaches are taken for the more specialised structures.


The consequence of this approach is that W3C web standards can be used such as the use of [[wikipedia:Resource_Description_Framework|Resource Descriptor Framework o]]<nowiki/>r RDF. This sees the world as a set of triples (subject/ predicate/ object) with some things named and somethings anonymous. Systems that adopt this approach can exchange data in a way that the semantics can be preserved. Whilst RDF is an incredibly arcane language at a machine level, the things it can describe can be very intuitive when represented visually. In other words the Information modelling approach involves an RDF Graph.
The consequence of this approach is that W3C web standards can be used such as the use of [[wikipedia:Resource_Description_Framework|Resource Descriptor Framework o]]<nowiki/>r RDF. This sees the world as a set of triples (subject/ predicate/ object) with some things named and somethings anonymous. Systems that adopt this approach can exchange data in a way that the semantics can be preserved. Whilst RDF is an incredibly arcane language at a machine level, the things it can describe can be very intuitive when represented visually. In other words the Information modelling approach involves an RDF Graph.


However RDF has no inherent semantics or schematics. To bring those in the model uses the semantic web languages of RDFS, OWL2 DL and SHACL as its main languages with SPARQL as its query representation. It incorporates ontologies such as Snomed-CT and W3C-PROV.
In addition to semantic web languages, other commonly used languages are in place are used to enable the model to be accessed by more people.. For example the Snomed-CT expression constraint language is a common way of defining concept sets. ECL is logically equivalent to a closed world query on an open world OWL ontology. The IM language uses the semantic language of SPARQL together with entailment to model ECL but ECL can be exported or used as input as an alternative.
 
It is recognised that other languages are in common use. Consequently mappings are constructed to enable the model to be used. For example the Snomed-CT expression constraint language is a common way of defining concept sets. ECL is logically equivalent to a closed world query on an open world OWL ontology. The IM language uses the semantic language of SPARQL together with entailment to model ECL but ECL can be exported or used as input as an alternative.


<br />
<br />
== Information manager==
''Main article'' [[Information model service|information model services.]]For an information model to be useable, it has to be accessible in some way either via user interfaces or by APIS.
Thus the information model comes with a set of open source modules making up an application "Information manager" , which is a web based application designed to show the model.
For a web application or set of APIs to be useful there has to be at least one service. There is a free to use  [[information model service]], i.e. an operational service that provides access to one or more information models.
The service provides a set of APIs as well as provide instances of the model for implementations to use directly should they wish to.
All implementation code including the evolving service, APIs, language grammars and object models are also available on Github in the following repositories:
https://github.com/endeavourhealth-discovery/IMAPI
A viewer of the information model and an early version of the manager is at:
https://github.com/endeavourhealth-discovery/IMViewer

Revision as of 16:38, 5 May 2022

This article describes the approach taken to producing information models, including ; what they are, what their purpose is, and what the technical components of the models are.

The article does not include the content of any particular model.

What is the health information model (IM) and what is its purpose?

The IM is a representation of the meaning and structure of data held in the electronic records of the health and social care sector, together with libraries of query, value sets, concept sets, data set definitions and mappings.

The main purpose is to bridge the chasm that exists between highly technical digital representations and plain language so that when questions are asked of data, a lay person could use plain language without prior knowledge of the underlying models.

It is a computable abstract logical model, not a physical structure or schema. "computable" means that operational software operates directly from the model artefacts, as opposed to using the model for illustration purposes. As a logical model it models data that may be physically held any a variety of different types of data stores, including relational or graph data stores. Because the model is independent of the physical schemas, the model itself has to be interoperable and without any proprietary lock in.

The IM is a broad model that integrates a set of different approaches to modelling using a common ontology. The components of the model are:

  1. A set of ontologies, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontologies is made up of the world's leading ontology Snomed-CT, with a London extensions, various code based taxonomies (e.g. ICD10, Read, supplier codes and local codes)
  2. A common data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by live systems that have published data, Note that this data model is NOT a standard model but a collated set of entities and relationships bound to the concepts based on real data, that are mapped to a common model.
  3. A library of business specific concept value sets, (aka reference sets) which are expression constraints on the ontology for the purpose of query
  4. A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
  5. A library of Data set (query) definitions for querying and extracting instance data from the information model, reference data, or health records.
  6. A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model.
  7. An open source set of utilities that can be used to browse, search, or maintain the model.


Model building blocks and visualisation

The model consists of classes, sets and objects that are instances of classes.

Ethnicity

Objects can act as objects in their own rights (e.g. an instance of chest pain) or may also act as classes (e.g. the class of objects that are chest pain). Likewise sets have members that are objects and the objects may also act as classes or sets. For example a set for the 2011 Ethnicity census will contain a member object of "British" which is also a set with members such as English and so on.

The model itself is stored as an RDF based knowledge graph, which means it is implementable in any mainstream Graph database technology. There are no vendor specific extensions to RDF.

In line with the RDF standard, all persistent types, classes, , property identifiers and object value identifiers are uniquely named using international resource identifiers. In most cases the identifiers are externally provided (e.g. Snomed-CT identifiers) whilst in others that have been created for a particular model. Organisations that author elements of the models use their own identifiers.

From a data modelling perspective the arrangements of types may be referred to as archetypes, which are conceptually similar to FHIR profiles. In the semantic web world they would be considered "shapes". There are an unlimited number of these which frees the model from any particular conventional relational database schema. Inheritance of types is supported which enables broad classifications of types and re-usability.

The variation between the parts of the model that model terminology concepts and those that model data use slightly different grammars in keeping with their different purposes. The information model language describes the differences.

The models can be viewed in their raw technical form (in JSON or Turtle) or can be viewed by the information model viewer at the online tool Information model directory

Information model language

Main article information modelling language describes the language in more detail.

The semantic web approach is adopted for the purposes of identifiers and grammar. In this approach, data can be described via the use of a plain language grammar consisting of a subject, a predicate, and an object; A triple, with an additional context referred to as a graph or RDF data set. The theory is that all health data can be described in this way (with predicates being extended to include functions).

However, the semantic web languages are highly complex and a set of more pragmatic approaches are taken for the more specialised structures.

The consequence of this approach is that W3C web standards can be used such as the use of Resource Descriptor Framework or RDF. This sees the world as a set of triples (subject/ predicate/ object) with some things named and somethings anonymous. Systems that adopt this approach can exchange data in a way that the semantics can be preserved. Whilst RDF is an incredibly arcane language at a machine level, the things it can describe can be very intuitive when represented visually. In other words the Information modelling approach involves an RDF Graph.

In addition to semantic web languages, other commonly used languages are in place are used to enable the model to be accessed by more people.. For example the Snomed-CT expression constraint language is a common way of defining concept sets. ECL is logically equivalent to a closed world query on an open world OWL ontology. The IM language uses the semantic language of SPARQL together with entailment to model ECL but ECL can be exported or used as input as an alternative.