Health information model: Difference between revisions

From Endeavour Knowledge Base
No edit summary
(26 intermediate revisions by the same user not shown)
Line 3: Line 3:
The article does not include the content of any particular model.  
The article does not include the content of any particular model.  


== What is a model? ==
== What is the health information model (IM)? ==
In this context, a model is considered as a representation of the arrangement of data within health records, together with a model of the means of interrogating the data in a way that a human can understand  and a machine can interpret.
The IM is a representation of the meaning and structure of data held in the electronic records of the health and social care sector, together with libraries of query, extract and mappings.


There are as many models as there are business processes in healthcare. It is impractical to consider a single 'standard' model of healthcare data. Apart from the fact that the nature of health data is constantly changing, there is rarely agreement as to what the data should be, beyond that needed for a particular set of business processes.
It is a computable abstract logical model, not a physical structure or schema. "computable" means that operational software operates directly from the model artefacts, as opposed to using the model for illustration purposes. As a logical model it models data that may be physically held any a variety of different types of data stores, including relational or graph data stores. Because the model is independent of the physical schemas, the model itself has to be interoperable and without any proprietary lock in.


The lack of a single health and care common model, or 'standard' leads to the challenge of how systems can interoperate with each other in the absence of a standard model, without loss of semantic meaning.
The IM is a broad model that integrates a set of different approaches to modelling using a common ontology. The components of the model are:


This challenge is accepted by the Discovery information model.  
# A concept ontology, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontology is made up of the world's leading ontology Snomed-CT, with a London extensions, various code based taxonomies (e.g. ICD10, Read, supplier codes and local codes) 
# A data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by live systems that have published data, Note that  this data model is NOT a standard model but a collated set of entities and relationships bound to the concepts based on real data,  that are mapped to a common model.
# A library of business specific concept and value sets, which are  expression constraints on the ontology for the purpose of query
# A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
# A library of Queries for querying and extracting instance data from reference data or health records.
# A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model.
#A super language including the main semantic web vocabularies (RDF,RDFS,OWL2,SHACL) as well as a set of Discovery vocabularies designed for health data modelling.
#A query model, which is a high level model of processes and queries held in the query library and directly mapped to mainstream query languages such as SPARQL and SQL.
# An open source set of utilities that can be used to browse, search, or maintain the model.


Hitherto, the response within the healthcare IT community to this challenge has been to establish specific models with agreed structure and content, and hope they can be used for many purposes. Examples of this approach are HL7 FHIR and OpenEHR. Such agreed models have enabled systems to communicate with each other by sharing or exchanging data, and have improved the efficiency of many health management processes resulting in better health outcomes.  
The remainder of this article considers how models and ontologies can be constructed using this approach.


The approach taken to modelling in Discovery is a little different. The theory behind the approach is that semantic interoperability can be achieved by the use of a community agreed grammar and a community  agreed vocabulary,  implemented using a well established sector independent set of W3C language standards, used by the semantic web. In other words, just as humans understand each other by sharing a grammar and vocabulary and a protocol such as written text,  so can machines.
== What is different? ==
The main difference between the Discovery IM and other approaches is the harmonisation of the terms used in the conventional 'terminology' domain and the terms used in the conventional 'data model' domain. Both are considered part of the one ontology, with one combined language, albeit with different grammar for the different parts of the model.  


The only difference between a machine interpretable model and human exchange is that a particular language construct must be logical. Humans can say illogical things such as "it was the best of times, it was the worst of times". Systems that comply with the agreed grammar cannot, or at least, should not.  
For example, an encounter record entry  may be defined as a record of an  "interaction between a patient (or on behalf of the patient) and a health professional or health provider". The encounter entry is bound to the concept of encounter which is itself semantically defined. In other words the data model of an entry of an encounter links to the type of encounter it is a record of.


There is still a need to agree models for particular businesses but as these use the same grammatical rules and use the same vocabulary, it may be much quicker to create them. If system A sends data in model 1 and system  B receives that data, but the users of system B do not use model 1, then system B can still accept and interpret the data from system A.
The two disciplines (Description logic and data model schema constraints) are different, but  they are obviously related. The binding between a data model and the range of values that should be applied to a property of an entry creates an interdependency, making sure that the data model and the values are synchronised.


The remainder of this article considers how this can be done.
The data model does not use the idea of "tables". Tables in the relational database sense of the word may be used to implement the model. There are an unlimited number of data model entity types, each one varying according to their properties, and arranged in a class hierarchy. If records are implemented in a graph data base there would be a 1:1 relationship between a data model shape and a type, but if implemented in a database the number of tables could vary from ONE to ANY number, depending on performance and maintenance factors.<br />Benefits of harmonisation accrue in user interfaces. For example, is a user elects to search for a systolic blood pressure, the application can use the information model to discovery that an entry for a  systolic blood pressure will have a date and probably a numeric value.
 
== Conceptualisation ==
== Visualisation ==
[[File:Graph.jpg|thumb|Types of data as a graph]]
[[File:Graph.jpg|thumb|Types of data as a graph]]


The data in a health record stored can be conceptualised as a set of relationships between one thing and many others.


 
Some people call this a graph. Others call these objects , properties and values. From a grammatical language perspective they are subjects, predicates and objects.
The data in a health record stored can be visualised as a graph, and a model of health data can be visualised as a graph of types of data,
 
In a graph, anything can, in theory, link to many other things and links are constantly changing , just as the human brain appears to operate. This intuitive approach means that there are as many potential nodes and relationships (vertices and edges) as needed and they can change rapidly. 
 
This contrasts the approach with the relatively fixed structures often adopted in healthcare at the moment.  


The example on the right is entirely arbitrary but illustrates a problem. What does "condition record" mean, or indeed what is a "condition"? Why is a patient linked to a person and what does "linked to" mean?   
The example on the right is entirely arbitrary but illustrates a problem. What does "condition record" mean, or indeed what is a "condition"? Why is a patient linked to a person and what does "linked to" mean?   


The answer is that the "terms" or "concepts" used in a model should be derived from a vocabulary whose terms have meaning and are formally defined. Some terms have meaning in whatever context they are used whereas others have different meanings in different contexts. In defining terms, it is necessary to defined them precisely enough for a computer to interpret the meaning safely i.e. the context of an idea is part of the idea itself.   
The answer is that the "terms" used in a model should be derived from a vocabulary whose terms have meaning and are formally defined. Some terms have meaning in whatever context they are used whereas others have different meanings in different contexts. In defining terms, it is necessary to defined them precisely enough for a computer to interpret the meaning safely i.e. the context of an idea is part of the idea itself.   


The most difficult challenge is to agree the definition and meaning of the concepts in the context they are used.  The agreement as to a particular model is less important. A definition defines a concept in relation to other concepts. Within a domain of interest such as healthcare, all concepts are indirectly related in some way to all other concepts in that domain.   
The most difficult challenge is to agree the definition and meaning of the concepts in the context they are used.  The agreement as to a particular model is less important. A definition defines a concept in relation to other concepts. Within a domain of interest such as healthcare, all concepts are indirectly related in some way to all other concepts in that domain.   


Luckily, standards have evolved to enable machine readable definitions.  
W3C semantic web standards have evolved to enable machine readable definitions. The ubiquitous JSON enables these to be used by modern software applications, whereas the W3C Turtle language provides a slightly more human readable version.  


The crucial step in the discovery approach is to apply this principle to both the things that are being recorded (such as clinical concepts), as well as the structure of entries in records themselves.
<br />
 
== Semantic Web ==
The classic approach to modelling in the software domain as been via the use of the idea of a object with properties, those properties having values, which may be simple data types or other objects. Models of objects are often presented using a diagrammatic notation; Unified Modelling Language (UML) . Where there are many objects with the same properties (albeit those properties having slightly different values), these are designed as "classes".
 
The approach to the Discovery information models starts from a different starting point, which is the use of a grammar and a vocabulary to define data. Likewise the approach to interrogating data involves a query language that uses the same grammatical constructs.
 
The semantic web approach is adopted. In this approach, data can be described via the use of a plain language grammar consisting of a subject and a predicate and an object. The theory is that all health data can be described  in this way (with predicates being extended to include functions).
 
The consequence of this approach is that web standards can be used i.e. the use of [[wikipedia:Resource_Description_Framework|Resource Descriptor Framework o]]<nowiki/>r RDF. This sees the world as a set of triples (subject/ predicate/ object) with some things named and somethings anonymous. Systems that adopt this approach can exchange data in a way that the semantics can be preserved. Whilst RDF is an incredibly arcane language at a machine level, the things it can describe can be very intuitive when represented visually.
 
Put together with graph this means that a graph can be organised with  subjects as nodes,  objects as nodes, and predicates as relationships.
 
In other words the Information modelling approach involves an RDF Graph.
 
== Concepts, classification, sets or classes==
A general language consists of a vocabulary of words arranged according to a syntax that follows grammatical rules.
 
Information consists of ideas. Another word for ideas is a 'concept' and a concept is usually described using a term (or a set of terms to provide context), that term implying the idea. The term "chest pain" implying the idea of a pain in the chest is one example.
 
Every entry in a health record can be considered to be an  instance of  a concept. In other words, an entry in a care record can be described in terms of a "type". For example a record of a a blood pressure may be considered as a type of observation, or an entry in a GP record of a consultation could be considered as a type of GP consultation entry.
 
Ideas (concepts) may be described and defined. However, there are as many definitions for one idea as there are people using the idea i.e. billions of ideas with many objects being represented for each i.e. trillions. 
 
To make sense of this, ideas are normally [[wikipedia:Classification_(general_theory)|classified]]. The approach taken in Discovery is to classify according to "sets" and thus adopting the approach taken by modern ontologies.  A set is a definition of a set of things that have the same properties ( i.e. a class and a set are the same thing). Sets of ideas may contain other subsets which are objects that have the same and more specific properties than the super set (or super class) i.e. a subclass of a superclass.<br />Putting together RDF  and sets, the net result aligns  with RDF, RDFS and OWL2 i.e. [https://www.w3.org/TR/owl2-syntax/ the ontology web language]. The vocabulary of OWL2 is used to precisely define concepts in relation to other concepts. OWL2 uses an underlying idea of "Description Logic" which is a way of defining things in a logical and consistent way so that a classification can be reliably produced.


== Data sets and schemas ==
== Information model language ==
Having a grammar and a vocabulary represented in RDF and OWL is not enough. To model things for specific purposes it is necessary to describe precise structures. These may be referred to as data sets or schemas, or more commonly, information models.


N.B. IN Discovery, the term information model encompasses structures, semantics and query.
''Main article'' [[Health Information modelling language - overview|information modelling language]] describes the language in more detail.


A data set takes rather vague general statements and arranges concepts in precise manner. This aligns precisely with the semantic web language [https://www.w3.org/TR/shacl/ Shape constraint language] (SHACL).
The semantic web approach is adopted. In this approach, data can be described via the use of a plain language grammar consisting of a subject, a predicate, and an object;  A triple, with an additional context referred to as a graph or RDF data set. The theory is that all health data can be described  in this way (with predicates being extended to include functions).


To support both machine readability and standard based interoperability, Discovery adopts the necessary elements of SHACL in addition to RDF/ RDFS and OWL
The modelling language also models process e.g.  query and steps in a set of query. These are held as JSON serializable objects and translatable to standard query languages.


== Interrogating data for information ==
The consequence of this approach is that W3C web standards can be used such as the use of [[wikipedia:Resource_Description_Framework|Resource Descriptor Framework o]]<nowiki/>r RDF. This sees the world as a set of triples (subject/ predicate/ object) with some things named and somethings anonymous. Systems that adopt this approach can exchange data in a way that the semantics can be preserved. Whilst RDF is an incredibly arcane language at a machine level, the things it can describe can be very intuitive when represented visually. In other words the Information modelling approach involves an RDF Graph.
Having established a representation of data using a set of grammars (RDF/RDFS/OWL/SHACL) it is necessary to represent a means by which the data could be interrogated to produce useful information.


Once again the semantic web community has established a machine readable common grammar for query known as [https://www.w3.org/TR/sparql11-query/ SPARQL.] SPARQL is designed to ask questions of RDF and is thus an ideal way of representing query logic.
To populate the data models and ontologies , the semantic web languages of RDF, RDFS, OWL2 DL and SHACL are used main languages with SPARQL as its "target" run time query language.  


== Dialects and alternative languages ==
In additions mappings to other commonly used languages are in place are constructed to enable the model to be used. For example the Snomed-CT expression constraint language is a common way of defining concept sets. ECL is logically equivalent to a closed world query on an open world OWL ontology. The IM language uses the semantic language of SPARQL together with entailment to model ECL but ECL can be exported or used as input as an alternative.
It is all very well supporting semantic web standards, but the world often adopts alternative approaches.


To that end, Discovery modelling tends to support grammars and syntaxes that are in common use, as long as they do not distract from the core models.
== Mapping from published data ==
''main article'' [[Mapping and matching concepts|mapping concepts and structures]]


Such as examples include [https://confluence.ihtsdotools.org/display/DOCECL/Expression+Constraint+Language+-+Specification+and+Guide Expression constraint language (ECL)] which is a way of expressing entailment queries of complex class expressions. Another includes [https://graphql.org/ GRAPHQL] which is a way of querying constructs by presenting a template of expected results.
which describes the current mapping and approach to mappings of codes or text, in context from publisher systems.
== Information manager==


== Information model APIs and languages ==
''Main article'' [[Information model service|information model services.]]For an information model to be useable, it has to be accessible in some way either via user interfaces or by APIS.
For an information model to be useable, it has to be accessible in some way. The means of accessing an information model is via the use of a language i.e. an [[Health Information modelling language - overview|information modelling language]] and this is described in a separate article. The language assumes a graph representation of the model and uses RDF concepts as its basis.
Thus the information model comes with a set of open source modules making up an application "Information manager" , which is a web based application designed to show the model.
[[File:IM logical object model.png|thumb|IM Service architecture]]


For an information model to be useful, it has to have at least one [[information model service]], i.e. an operational service that provides access to one or more information models. A service must provide a set of APIs as well as provide instances of the model for implementations to use directly should they wish to.  
For a web application or set of APIs to be useful there has to be at least one service. There is a free to use  [[information model service]], i.e. an operational service that provides access to one or more information models.  


The diagram on the right shows a tiered architecture for such a service. Information model APIs are described in a separate article.
The service provides a set of APIs as well as provide instances of the model for implementations to use directly should they wish to.


All implementation code including the evolving service, APIs, language grammars and object models are also available on Github in the following repositories:
All implementation code including the evolving service, APIs, language grammars and object models are also available on Github in the following repositories:
Line 100: Line 78:
https://github.com/endeavourhealth-discovery/IMAPI
https://github.com/endeavourhealth-discovery/IMAPI


Utilities that use it and transform between syntaxes are at:
A viewer of the information model and an early version of the manager is at:
 
https://github.com/endeavourhealth-discovery/InformationManager
 
A viewer of the information model is at:


https://github.com/endeavourhealth-discovery/IMViewer
https://github.com/endeavourhealth-discovery/IMViewer
== Information model purposes and functions ==
The information models have 4 core functional requirements internal to a model: '''Description of the model , validation of model content, population of the model, and query of the model.''' In support of query there is also the need to support  '''inference''' which generates new insights that were not necessarily authored.
In addition the information model must support the same 4 core functional requirements on actual health data that is modelled.
Systems that use the models can use any or all of three approaches:
# Direct use of the model data content as a database (or set of files that can populate a database via  script)
# Use via a set of APIs (both local and remote) designed to provide access to the data within the model, or to trigger outputs of the model for 1)
#Use of the information model technologies themselves via the use of the published open source code
The main functional purposes of an information model is further described:
*'''Description of the model.''' There is little point in having a model unless it can be described and understood. Knowing what is in a model is a pre-requisite to using it. For  example, there is no point in trying to find out if a patient record indicates whether or not they have diabetes if the model doesn't include the ability to record it. In order to understand a model, two techniques are required: diagrammatic representation and human readable text representation. A model must support both.
*'''Data Validation''' is essential for consistent business operations. Data models, user input forms, and data set specifications are designed to enable data collections to be validated. Maintaining a standard for data collection is essential. For example, if you have a patient record in front of you, you will likely need to know their approximate age. To work this out  date of birth must be recorded. Validating that the date of birth can be and has been recorded is important. However,  if ''more than one'' date of birth was recorded for the same patient, it would be less valuable. Thus a modelling language must include the ability to '''constrain''' data models to suit particular business needs as part of validation, even when the data model shows more than one.
*'''Population of the model.'''  It is impractical to build model content from scratch and likewise virtually impossible to populate instances with existing data without some manipulation. An information model must contain the ability to model mappings between currently held data and model conformant data.
*'''Enquiry (or query''') is necessary to generate information from data. There is little point in recording data unless it can be interrogated and the results of the interrogation acted upon. Thus a modelling language must include the ability to query the data as defined or described, including the use of inference rules to find data that was recorded in one context for use in another.
*'''Inference''' is pivotal to decision making. For example, if you are about to prescribe a drug containing methicillin to a patient, and the patient has previously stated that they are allergic to penicillin, it is reasonable to infer that if they take the drug, an allergic reaction might ensue, and thus another drug is prescribed. Thus a modelling language must include the ability to infer things and classify things for safe decisions to be made.
<br />
== Example of model content basic assumptions ==
In constructing a model of health data, it is necessary to have an agreement as to the sort of things that a model will contain and how they will be categorised.
It is fair to say that there will probable never be a universally accepted approach to this problem, but nevertheless, any information model needs to at least put a few markers down.
Healthcare modelling approaches such as hl7 and openEHR have each made some basic assumptions as to their respective starting categorisations. They are however incompatible and as a result, transfer of information between systems using the different approaches has proved expensive. The fall back position has been to continue with whatever model a particular system has and progress is delayed.
A safe starting point is to consider some categorical terms that are unlikely to be controversial and would be consistent with the open standards in place. For the sake of making a start, the following categorisations are proposed: '''Event, Entry, Provenance, ontology, types, state, query'''
* Everything that is recorded starts with an '''''event.''''' In this context an event is a machine level event that signals a change of state or a desire to change a state. The event is usually associated with a description of what the event is and some data associated with the event. The data associated with the event normally includes the intention, such as a desire to add/amend or delete data in a record, as well as the data which was recorded as part of the event.
* The net result of an event is the creation/update/deletion of, an '''''Entry''''' in a health record. The term ‘Entry’ is used in its intuitive meaning here. If one were to look at a record it would consist of entries, not events.
* Because an entry is generated from one or more events, an entry has '''''provenance'''''. Provenance enables the audit and validation of an entry, including all events that led to the state of the current entry. A subset of an entries provenance is the “audit trail”, which is pivotal for medico legal purposes.
* An entry in a record has a number of attributes which describe the entry. For an information model to succeed there must be an agreement as to what these attributes mean. This is achieved by the use of a shared '''''Ontology'''''. An ontology precisely defines the meaning of an attribute, and the type of values that an attribute might have. This means that ANY data can be exchanged as long as an entry uses attributes from the agreed ontology.
* Agreement on the definition of concepts is not enough. Agreement on '''''context''''' is also important. Most would agree that a date of birth is the date a person on was born. But what about an entry in a record for Diabetes? Does it mean the person has the condition or does it mean the clinician is considering the condition? Context is provided by the ontology also but must use an ontology structure that can preserve context.
* There are a huge number of business processes in healthcare. Each business process is associated with a requirement to exchange data that is relevant to the business. This is partly achieve by assigning '''''types''''' to entries. Types indicate the main purpose of the entry. An agreement as to what the types are, and consequently, what the associated attributes of an entry of a type should be, and what the values of the attributes should be, is essential for business.
* It is generally the case that an entry can be considered as either representing an event in time (a different use of the word event) or a persistent state. Technically these categories are conceptual rather than real but are important for business level modelling. For example, a date of birth might be considered as a state and therefore might be modelled as a cardinal of 1 against a person, even though a series of historical entries have recorded a date of birth. State can be described by the use of types to indicate state versus event entries to indicate things that happened but do not persist. Many types are both.
* Put together this equates to an ontology of concepts which are used as types, attributes and values, together with structural definitions of their relationships for context and business purpose. Terms used to describe these things are purely convention ; resources, resource profiles, archetypes, templates, value sets, dataset definitions are all simply ontological relationships.
* All of this is irrelevant unless entries can be queried. Query itself produces new structures such as the above. Consequently a means of querying a records, which are projected as a graph is needed.<br />
== Model structure and content ==
Surprisingly, with the use of an agreed ontology and an agreed way of representing it via an open standard language such as the [[Health Information modelling language|information modelling language]], there is no real need to have one model structure.
Content of a model, including the definition of types,  is driven entirely by the business which it is designed to support. A specialist in immunology is likely to need different content than a General Practitioner. However, there needs to be  an agreement on what the concepts in use mean, particularly in context. Otherwise data cannot be exchanged.
The information modelling language means that one can have as many information model instances as needed. The language is like any other language but with some logical constraints. It may be possible to model the novel of War and Peace, but to state that "it was the best of times, it was the worst of times" is NOT allowed.
Thus the common information model is in fact no more than a model that models information as used in a common way. The idea that somehow models can be "Standardised", is somewhat quaint unless the business itself is standardised. If the business is standardised (i.e. everyone agrees to do the same thing) then a common model is a standard.
Thus in the Endeavour Discovery model the only standardisation is:
# The basic assumptions as to the difference between events, entries, and their provenance
# The selection of the best fit ontologies for particular purposes, as long as those ontologies conform with the information model language constructs, which enable world wide adoption by the systems that already use the language
For the content of the models themselves this can be accessed through the IM viewer (under development) or by downloading the model and viewing via a generic RDF graph viewer.
The approach to modelling covers 3 aspects of health record information:
# Models of data stored in health records and their supporting records.
# Ways of retrieving data both from the model itself and the health record data stored, i.e. various forms of query.
# Models of maps between originally entered data and a selected model designed so that one semantically defined query will pick up data entered in a variety of ways.
Ideally a model should be designed both for human visualisation and for computers to use. This is the approach taken to the Discovery information model.
This article describes the meta data model of an information model (and does not include the content of a particular model. The article makes reference to the languages that may be used to access the model, using either interoperability standards or a pragmatic approach, and this language is described in the article introducing the [[Health Information modelling language - overview|health modelling language.]]
The information model component types can be illustrated as follows:
[[File:Modelling Components.png|center|thumb|800x800px|Main information model component types]]

Revision as of 10:38, 27 November 2021

This article describes the approach taken to producing information models, including ; what they are, what their purpose is, and what the technical components of the models are.

The article does not include the content of any particular model.

What is the health information model (IM)?

The IM is a representation of the meaning and structure of data held in the electronic records of the health and social care sector, together with libraries of query, extract and mappings.

It is a computable abstract logical model, not a physical structure or schema. "computable" means that operational software operates directly from the model artefacts, as opposed to using the model for illustration purposes. As a logical model it models data that may be physically held any a variety of different types of data stores, including relational or graph data stores. Because the model is independent of the physical schemas, the model itself has to be interoperable and without any proprietary lock in.

The IM is a broad model that integrates a set of different approaches to modelling using a common ontology. The components of the model are:

  1. A concept ontology, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontology is made up of the world's leading ontology Snomed-CT, with a London extensions, various code based taxonomies (e.g. ICD10, Read, supplier codes and local codes)
  2. A data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by live systems that have published data, Note that this data model is NOT a standard model but a collated set of entities and relationships bound to the concepts based on real data, that are mapped to a common model.
  3. A library of business specific concept and value sets, which are expression constraints on the ontology for the purpose of query
  4. A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
  5. A library of Queries for querying and extracting instance data from reference data or health records.
  6. A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model.
  7. A super language including the main semantic web vocabularies (RDF,RDFS,OWL2,SHACL) as well as a set of Discovery vocabularies designed for health data modelling.
  8. A query model, which is a high level model of processes and queries held in the query library and directly mapped to mainstream query languages such as SPARQL and SQL.
  9. An open source set of utilities that can be used to browse, search, or maintain the model.

The remainder of this article considers how models and ontologies can be constructed using this approach.

What is different?

The main difference between the Discovery IM and other approaches is the harmonisation of the terms used in the conventional 'terminology' domain and the terms used in the conventional 'data model' domain. Both are considered part of the one ontology, with one combined language, albeit with different grammar for the different parts of the model.

For example, an encounter record entry may be defined as a record of an "interaction between a patient (or on behalf of the patient) and a health professional or health provider". The encounter entry is bound to the concept of encounter which is itself semantically defined. In other words the data model of an entry of an encounter links to the type of encounter it is a record of.

The two disciplines (Description logic and data model schema constraints) are different, but they are obviously related. The binding between a data model and the range of values that should be applied to a property of an entry creates an interdependency, making sure that the data model and the values are synchronised.

The data model does not use the idea of "tables". Tables in the relational database sense of the word may be used to implement the model. There are an unlimited number of data model entity types, each one varying according to their properties, and arranged in a class hierarchy. If records are implemented in a graph data base there would be a 1:1 relationship between a data model shape and a type, but if implemented in a database the number of tables could vary from ONE to ANY number, depending on performance and maintenance factors.
Benefits of harmonisation accrue in user interfaces. For example, is a user elects to search for a systolic blood pressure, the application can use the information model to discovery that an entry for a systolic blood pressure will have a date and probably a numeric value.

Conceptualisation

Types of data as a graph

The data in a health record stored can be conceptualised as a set of relationships between one thing and many others.

Some people call this a graph. Others call these objects , properties and values. From a grammatical language perspective they are subjects, predicates and objects.

The example on the right is entirely arbitrary but illustrates a problem. What does "condition record" mean, or indeed what is a "condition"? Why is a patient linked to a person and what does "linked to" mean?

The answer is that the "terms" used in a model should be derived from a vocabulary whose terms have meaning and are formally defined. Some terms have meaning in whatever context they are used whereas others have different meanings in different contexts. In defining terms, it is necessary to defined them precisely enough for a computer to interpret the meaning safely i.e. the context of an idea is part of the idea itself.

The most difficult challenge is to agree the definition and meaning of the concepts in the context they are used. The agreement as to a particular model is less important. A definition defines a concept in relation to other concepts. Within a domain of interest such as healthcare, all concepts are indirectly related in some way to all other concepts in that domain.

W3C semantic web standards have evolved to enable machine readable definitions. The ubiquitous JSON enables these to be used by modern software applications, whereas the W3C Turtle language provides a slightly more human readable version.


Information model language

Main article information modelling language describes the language in more detail.

The semantic web approach is adopted. In this approach, data can be described via the use of a plain language grammar consisting of a subject, a predicate, and an object; A triple, with an additional context referred to as a graph or RDF data set. The theory is that all health data can be described in this way (with predicates being extended to include functions).

The modelling language also models process e.g. query and steps in a set of query. These are held as JSON serializable objects and translatable to standard query languages.

The consequence of this approach is that W3C web standards can be used such as the use of Resource Descriptor Framework or RDF. This sees the world as a set of triples (subject/ predicate/ object) with some things named and somethings anonymous. Systems that adopt this approach can exchange data in a way that the semantics can be preserved. Whilst RDF is an incredibly arcane language at a machine level, the things it can describe can be very intuitive when represented visually. In other words the Information modelling approach involves an RDF Graph.

To populate the data models and ontologies , the semantic web languages of RDF, RDFS, OWL2 DL and SHACL are used main languages with SPARQL as its "target" run time query language.

In additions mappings to other commonly used languages are in place are constructed to enable the model to be used. For example the Snomed-CT expression constraint language is a common way of defining concept sets. ECL is logically equivalent to a closed world query on an open world OWL ontology. The IM language uses the semantic language of SPARQL together with entailment to model ECL but ECL can be exported or used as input as an alternative.

Mapping from published data

main article mapping concepts and structures

which describes the current mapping and approach to mappings of codes or text, in context from publisher systems.

Information manager

Main article information model services.For an information model to be useable, it has to be accessible in some way either via user interfaces or by APIS. Thus the information model comes with a set of open source modules making up an application "Information manager" , which is a web based application designed to show the model.

For a web application or set of APIs to be useful there has to be at least one service. There is a free to use information model service, i.e. an operational service that provides access to one or more information models.

The service provides a set of APIs as well as provide instances of the model for implementations to use directly should they wish to.

All implementation code including the evolving service, APIs, language grammars and object models are also available on Github in the following repositories:

https://github.com/endeavourhealth-discovery/IMAPI

A viewer of the information model and an early version of the manager is at:

https://github.com/endeavourhealth-discovery/IMViewer