Health information model

From Endeavour Knowledge Base

The Discovery health information model is a semantically interpreted model of the data held in Discovery, the model covering 3 aspects:

  1. Model of the data stored
  2. Model of the means of access to retrieve data both from the model itself and the data stored, i.e. various forms of query
  3. Model of the means of adding data to the store and to the model.

The model is designed both for human visualisation and for computers to use. More precisely, the model can be considered as a set of modular models, each depending on the business purpose, with a 'common model' encompassing data from all the modular models.

This article describes the meta data model of the information model (and does not include the content of a particular model. The article makes reference to the languages that may be used to access the model, using either interoperability standards or a pragmatic approach, and this language is described in the article introducing the health modelling language.

Basic assumptions

In constructing a model of health data, it is necessary to have an agreement as to the sort of things that a model will contain and how they will be categorised.

It is fair to say that there will probable never be a universally accepted approach to this problem, but nevertheless, any information model needs to at least put a few markers down.

Healthcare modelling approaches such as hl7 and openEHR have each made some basic assumptions as to their respective starting categorisations. They are however incompatible and as a result, transfer of information between systems using the different approaches has proved expensive. The fall back position has been to continue with whatever model a particular system has and progress is delayed.

A safe starting point is to consider some categorical terms that are unlikely to be controversial and would be consistent with the open standards in place. For the sake of making a start, the following categorisations are proposed:

  • Everything starts with an event. In this context an event is a machine level event that signals a change of state or a desire to change a state. The event is usually associated with a description of what the event is and some data associated with the event. The data associated with the event normally includes the intention, such as a desire to add/amend or delete data in a record, as well as the data which was recorded as part of the event.
  • The net result of an event is the creation/update/deletion of, an Entry in a health record. The term ‘Entry’ is used in its intuitive meaning here. If one were to look at a record it would consist of entries, not events.
  • Because an entry is generated from one or more events, an entry has provenance. Provenance enables the audit and validation of an entry, including all events that led to the state of the current entry. A subset of an entries provenance is the “audit trail”, which is pivotal for medico legal purposes.
  • An entry in a record has a number of attributes which describe the entry. For an information model to succeed there must be an agreement as to what these attributes mean. This is achieved by the use of a shared Ontology. An ontology precisely defines the meaning of an attribute, and the type of values that an attribute might have. This means that ANY data can be exchanged as long as an entry uses attributes from the agreed ontology.
  • Agreement on the definition of concepts is not enough. Agreement on context is also important. Most would agree that a date of birth is the date a person on was born. But what about an entry in a record for Diabetes? Does it mean the person has the condition or does it mean the clinician is considering the condition? Context is provided by the ontology also but must use an ontology structure that can preserve context.
  • There are a huge number of business processes in healthcare. Each business process is associated with a requirement to exchange data that is relevant to the business. This is partly achieve by assigning types to entries. Types indicate the main purpose of the entry. An agreement as to what the types are, and consequently, what the associated attributes of an entry of a type should be, and what the values of the attributes should be, is essential for business.
  • It is generally the case that an entry can be considered as either representing an event in time (a different use of the word event) or a persistent state. Technically these categories are conceptual rather than real but are important for business level modelling. For example, a date of birth might be considered as a state and therefore might be modelled as a cardinal of 1 against a person, even though a series of historical entries have recorded a date of birth. State can be described by the use of types to indicate state versus event entries to indicate things that happened but do not persist. Many types are both.
  • Put together this equates to an ontology of concepts which are used as types, attributes and values, together with structural definitions of their relationships for context and business purpose. Terms used to describe these things are purely convention ; resources, resource profiles, archetypes, templates, value sets, dataset definitions are all simply ontological relationships.
  • All of this is irrelevant unless entries can be queried. Query itself produces new structures such as the above. Consequently a means of querying a records, which are projected as a graph is needed.

Visualisations

Types of data as a graph


The data in a health record stored can be visualised as a graph and a model of that data can be visualised as a graph of types, as on the right hand side. In this type of approach an entry in a record would be represented as node with the type of node as a label on the node.

Attributes of each record would either be relationships with other records (e.g. a persons address), or data properties. Properties of records that are themselves concepts, in a pure graph would be data nodes, and in a property graph may be data properties with references to the concept operating as a particular node.

Information model APIs and languages

For an information model to be useable, it has to be accessible in some way. The means of accessing an information model is via the use of a language i.e. an information modelling language and this is described in a separate article. The language assumes a graph representation of the model and uses RDF concepts as its basis.

IM Service architecture

For an information model to be useful, it has to have at least one information model service, i.e. an operational service that provides access to one or more information models. A service must provide a set of APIs as well as provide instances of the model for implementations to use directly should they wish to.

The diagram on the right shows a tiered architecture for such a service. Information model APIs are described in a separate article.

All implementation code including the evolving service, APIs, language grammars and object models are also available on Github in the following repositories:

https://github.com/endeavourhealth-discovery/IMAPI

https://github.com/endeavourhealth-discovery/InformationManager

https://github.com/endeavourhealth-discovery/IMViewer

Information model purposes and functions

The information Information models have 4 core functional requirements internal to the model: Description of the model , validation of model content, population of the model, and query of the model. In support of query there is also the need to support inference which generates new insights that were not necessarily authored.

In addition the information model must support the same 4 core functional requirements on actual health data that is modelled.

Systems that use the models can use any or all of three approaches:

  1. Direct use of the model data content as a database (or set of files that can populate a database via script)
  2. Use via a set of APIs (both local and remote) designed to provide access to the data within the model, or to trigger outputs of the model for 1)
  3. Use of the information model technologies themselves via the use of the published open source code

The main functional purposes of an information model is further described:

  • Description of the model. There is little point in having a model unless it can be described and understood. Knowing what is in a model is a pre-requisite to using it. For example, there is no point in trying to find out if a patient record indicates whether or not they have diabetes if the model doesn't include the ability to record it. In order to understand a model, two techniques are required: diagrammatic representation and human readable text representation. A model must support both.
  • Data Validation is essential for consistent business operations. Data models, user input forms, and data set specifications are designed to enable data collections to be validated. Maintaining a standard for data collection is essential. For example, if you have a patient record in front of you, you will likely need to know their approximate age. To work this out date of birth must be recorded. Validating that the date of birth can be and has been recorded is important. However, if more than one date of birth was recorded for the same patient, it would be less valuable. Thus a modelling language must include the ability to constrain data models to suit particular business needs as part of validation, even when the data model shows more than one.
  • Population of the model. It is impractical to build model content from scratch and likewise virtually impossible to populate instances with existing data without some manipulation. An information model must contain the ability to model mappings between currently held data and model conformant data.
  • Enquiry (or query) is necessary to generate information from data. There is little point in recording data unless it can be interrogated and the results of the interrogation acted upon. Thus a modelling language must include the ability to query the data as defined or described, including the use of inference rules to find data that was recorded in one context for use in another.
  • Inference is pivotal to decision making. For example, if you are about to prescribe a drug containing methicillin to a patient, and the patient has previously stated that they are allergic to penicillin, it is reasonable to infer that if they take the drug, an allergic reaction might ensue, and thus another drug is prescribed. Thus a modelling language must include the ability to infer things and classify things for safe decisions to be made.


Model structure and content

Surprisingly, with the use of an agreed ontology and an agreed way of representing it via an open standard language such as the information modelling language, there is no real need to have one model structure.

Content of a model, including the definition of types, is driven entirely by the business which it is designed to support. A specialist in immunology is likely to need different content than a General Practitioner. However, there needs to be an agreement on what the concepts in use mean, particularly in context. Otherwise data cannot be exchanged.

The information modelling language means that one can have as many information model instances as needed. The language is like any other language but with some logical constraints. It may be possible to model the novel of War and Peace, but to state that "it was the best of times, it was the worst of times" is NOT allowed.

Thus the common information model is in fact no more than a model that models information as used in a common way. The idea that somehow models can be "Standardised", is somewhat quaint unless the business itself is standardised. If the business is standardised (i.e. everyone agrees to do the same thing) then a common model is a standard.

Thus in the Endeavour Discovery model the only standardisation is:

  1. The basic assumptions as to the difference between events, entries, and their provenance
  2. The selection of the best fit ontologies for particular purposes, as long as those ontologies conform with the information model language constructs, which enable world wide adoption by the systems that already use the language.