Health Information modelling language - overview: Difference between revisions

From Endeavour Knowledge Base
No edit summary
Line 79: Line 79:
# MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.
# MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.


=== OWL2 ===
=== OWL2 and RDFS ===


For the purposes of authoring and reasoning  the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL], which itself is a sublanguage of the [https://www.w3.org/TR/owl2-syntax/ OWL2 language]
For the purposes of authoring and reasoning  the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL], which itself is a sublanguage of the [https://www.w3.org/TR/owl2-syntax/ OWL2 language]


Within an information model instance itself the data relationships are held on their closed form i.e. inferred properties and relationships are explicitly included.  This means that inference is not normally required in the IM live itself. In other words, whereas an ontology may be authored using the open world assumption, prior to population of the live IM, classifications and inheritance are resolved. This uses the same approach as followed by Snomed-CT, whereby the inferred relationship RF2 files containing inherited properties and the "isa" relationship are included explicitly.
In addition to the open world assumption of OWL, RDFS constructs of domain and ranges are used in a closed word manner.


Some standard OWL2 DL axioms are used during authoring in order to provide a means of specifying additional relationships that are of value when defining relationships. The following table lists the main owl  types used and example for each.  Note that their aliases are used for brevity. Please refer to the OWL2 specification to describe their meanings.
Within an information model instance itself the data relationships are held on their post inferred closed form i.e. inferred properties and relationships are explicitly stated using a normalisation process to eliminate duplications from super types.  In other words, whereas an ontology may be authored using the open world assumption, prior to population of the live IM, classifications and inheritance are resolved. This uses the same approach as followed by Snomed-CT, whereby the inferred relationship containing the inherited properties and the "isa" relationship are included explicitly.


In the live IM many these are replaced with the RDFS standard terms and simplified. For example OWL existential quantifications are mapped to "role groups" in line with Snomed-CT.
In the live IM many these are replaced with the RDFS standard terms and simplified. For example OWL existential quantifications are mapped to "role groups" in line with Snomed-CT.
'''Use of Annotation properties'''
Annotation properties are the properties that provide information beyond that needed for reasoning.  They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. 
Typical annotation properties are names and descriptions.
{| class="wikitable"
{| class="wikitable"
|+
|+
Line 127: Line 133:
|owl:Property chain
|owl:Property chain
|-
|-
|Existential quantification
|Existential quantification ( ObjectSomeValuesFrom)
|Chest pain and
|Chest pain and
Finding site of  - {some} thoracic structure
Finding site of  - {some} thoracic structure
Line 139: Line 145:


role groups
role groups
|-
|Individual
|All chest pain subclasses but not the specific i''nstance of acute chest pain''
|
|-
|-
|DataType definition
|DataType definition
Line 156: Line 158:
|rdfs:range
|rdfs:range
|}
|}
'''Use of Annotation properties for original codes'''
{| class="wikitable"
 
|+
Annotation properties are the properties that provide information beyond that needed for reasoning.  They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. Annotation properties can also be used for implementation supporting properties such as release status, version control, authoring dates and times and so on. 
!Annotation
 
!Meaning
Typical annotation properties are names and descriptions. They are also used as meta data such as a status of a concept or the version of a document.
 
Many concepts are derived directly from source systems that used them as codes, or even free text.
 
The concept indicates the source and original code or text (or combination) in the form actually entered into the source system. It should be noted that many systems do not record codes exactly as determined by an official classification or provide codes via mappings from an internal id.  It is the codes or text used from the publishers perspective that  is used as the source.
 
Thus in many cases, it is convenient to auto generate a code, which is then placed as the value of the “code” property in the concept, together with the scheme. From this, the provenance of the code can be inferred.
 
Each code must have a scheme. A scheme may be an official scheme or  proprietary scheme or a local scheme related to a particular sub system.
 
For example, here are some scheme/ code combinations
{| class="MsoTableGrid"
|-
|-
| width="109" |<span><span>Scheme</span></span>
|rdfs:label
| width="316" |<span><span>Original Code/Text/Context</span></span>
|The name or term for an entity
| width="106" |<span><span>Concept code/ Auto code</span></span>
| width="224" |<span><span>Meaning</span></span>
|-
|-
| width="109" |<span><span>Snomed-CT&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></span>
|rdfs:comment
| width="316" |<span><span>47032000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></span>
|the description of an entity
| width="106" |<span><span>47032000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></span>
| width="224" |<span><span>Primary hydrocephaly</span></span>
|-
|-
| width="109" |<span><span>EMIS- Read</span></span>
|
| width="316" |<span><span>H33-1&nbsp;&nbsp;</span></span>
|
| width="106" |<span><span>H33-1&nbsp;&nbsp;</span></span>
|}
| width="224" |<span><span>Bronchial asthma</span></span>
<br />
|-
| width="109" |<span><span>EMIS – EMIS</span></span>
| width="316" |<span><span>EMISNQCO303</span></span>
| width="106" |<span><span>EMLOC_EMISNQCO303</span></span>
| width="224" |<span><span>Confirmed corona virus infection</span></span>
|-
| width="109" |<span><span>Barts/Cerner</span></span>
| width="316" |<span><span>Event/Order=687309281</span></span>
| width="106" |<span><span>BC_687309281</span></span>
| width="224" |<span><span>Tested for Coronavirus (misuse of code term in context)</span></span>
|-
| width="109" |<span><span>Barts/Cerner</span></span>
| width="316" |<span><span>Event/Order= 687309281/ResultTxt= SARS-CoV-2 RNA DETECTED</span></span>
| width="106" |<span><span>BC_dsdsdsdx7</span></span>
| width="224" |<span><span>Positive coronavirus result</span></span>


&nbsp;
=== SHACL shapes - data model ===
|}Note that in the last example, the original code is actually text and has been contextualised as being from the Cerner event table, the order field having a value of 687309281 and the result text having a value of ResultTxt= SARS-CoV-2 RNA DETECTED
 
=== Data model - SHACL shapes ===
The shapes constraint language, as in the semantic ontology, the language borrows the constructs from the W3C standard SHACL, which can also be represented in any of the RDF supporting languages such as TURTLE or JSON-LD.   
The shapes constraint language, as in the semantic ontology, the language borrows the constructs from the W3C standard SHACL, which can also be represented in any of the RDF supporting languages such as TURTLE or JSON-LD.   


Line 239: Line 207:
</syntaxhighlight>
</syntaxhighlight>


=== Set definitions - RDF/ ECL/ IMQuery ===
=== Query ===
Set definitions have two broad purposes and come in two patterns.
As the IM itself is held as RDF quads (triple+ graphs) the IM is generally queried using SPARQL for graph query and Lucene query (OpenSearch) for text query.


* Value sets are sets that are designed to act as values of a property within a data model. For example, an entry for an allergy is likely to contain allergy concepts
However, in order to support real world queries, which are usually articulated in plain language, the IM employs a pragmatic Query DSL to make the query of the IM, or any health record store, easier to construct.  
* Concept sets are used ubiquitously for sets, in particular for use in query. Sometimes referred to as reference sets.


The patterns of set definitions fall into two patterns
The query DSL can be seen as a "bridge" between the plain language of a user interface and the lower level queries in SQL or SPARQL that result from it. It is designed to hide much of the complexity involved in the underling query constructs which for health record query can result in hundreds of lines of obscure advanced SQL.  It is designed to be easy for developers to use, can can be understood by informaticians by direct visualisation.


# A simple listing of all members of a set. In effect this is the "run time" release format and is the format favoured by the NHS Snomed-CT reference sets
The DSL itself, in JSON form can be submitted via the APIs to the IM service or a data service that supports a model compliant data store.
# A definition of the members based on a closed world query of the information model. For example "oral NSAIDs" may be defined as subtypes of NSAID that have the property of dose form and value oral. This is the form supported by ECL. Within the live IM this is represented as triples conforming to the IM query model form. The IM provides transforms between its internal representation and ECL.


'''Example - complex set'''
As well as the DSL the IM provides reference software for SQL and SPARQL conversion


Lets say a commissioner needs to know who the patients are that have had Covid vaccines.
The query DSL is specified more fully in the article on [[information model query]]
 
<br />
Covid vaccines are recorded either as immunisation records, or medication records, or both.  To query the medication records, a set of vaccine medication concepts are searched for, these being stored in medication order record entries. Covid vaccines change every few weeks as new brands or strengths are released.
 
A definition of a covid vaccine is helpful, thus a concept set is defined.
<br /><syntaxhighlight lang="turtle">
<<39330711000001103          # is a Covid vaccine
OR                                            #or (
<<10363601000001109:          # is a uk product
                                                                      #and
    <<s10362601000001103 = 10362601000001103} }      #has vmp Covd vaccine)
 
</syntaxhighlight><br />

Revision as of 09:25, 2 June 2022

Purpose and scope of the language

This article describes the languages used for creating querying and maintaining the information model, as well as the means by which health record queries can be defined in a system independent manner.

As the information model is an RDF graph, the modelling language uses the main stream semantic web languages. In addition there is a pragmatic json-LD based domain specific language for query definition which is based on a constraint of, and specialisation of, SPARQL with support for Graphql type queries.

The language includes description logic, shape constraints, expression constraints, and a pragmatic approach to modelling query of real data.

Details on the W3C standard languages that make up the grammar are described below.

It should be noted that these are modelling languages, not the physical data schema or actual query. These are defined in the languages commensurate with the technology (e.g. sql)

The main purpose of a modelling language is to exchange data and information about information models in a way that machines can understand. It is not expected that clinicians would use the languages directly. The use of standard languages ensures that the models are interoperable across many domains including non health care domains.

The languages cover the following areas:

  1. An ontology, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontology is made up of the world's leading ontology Snomed-CT, with a London extension and supplemented with additional concepts for data modelling.
  2. A data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by live systems that have published data to a data service that uses these models. Note that this data model is NOT a standard model but a collated set of entities and relationships bound to the concepts based on real data, that are mapped to a single model.
  3. A library of business specific concept and value sets, which are expression constraints on the ontology for the purpose of query
  4. A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
  5. A library of Queries for querying and extracting instance data from reference data or health records.
  6. A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model.

Contributory languages

Health data can be conceptualised as a graph, and thus the model is a graph model.

When exchanging models using the language grammar both json-ld and turtle are supported as well as the more specialised syntaxes such as owl functional syntax or expression constraint language.

The modelling language is an amalgam of the following languages:

  • RDF. An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph. RDF Forms the statements describing the data. RDF in itself holds no semantics whatsoever. i.e. it is not practical to infer or validate or query based purely on an RDF structure. To use RDF it is necessary to provide semantic definitions for certain predicates and adopt certain conventions. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things. RDF can be represented using either TURTLE syntax or JSON-LD.
  • RDFS. This is the first of the semantic languages. It is used for the purposes of some of the ontology axioms such as subclasses, domains and ranges as well as the standard annotation properties such as 'label'
  • OWL2 DL. For the ontology. This brings with it more sophisticated description logic such as equivalent classes and existential quantifications and is used in the ontology and for defining things when an open world assumption is required.
  • SHACL. For the data models. Used for everything that defines the shape of data or logical entities and attributes. Although SHACL is designed for validation of RDF, as SHACL describes what things 'should be' it can be used as a data modelling language
  • SPARQL Used as the logical means of querying model conformant data (not to be confused with the actual query language used which may be SQL)
  • IM Query Language, used as a bridge between plain language and the mainstream query languages such as SQL or SPARQL

Example (OWL2)

Consider a definition of chest pain

Chest pain
 is Equivalent to : pain of truncated structure
                    and
                    has site : Thoracic structure

Grammars and syntaxes

Foundation grammars and syntaxes - RDF, TURTLE and JSON-LD

Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:

  • A terse abbreviated language, TURTLE
  • SPARQL for query
  • JSON-LD representation, which can used by systems that prefer JSON, wish to use standard approaches, and are able to resolve identifiers via the JSON-LD context structure.


Identifiers, aliasing prefixes and context

Concepts are identified and referenced by the use of International resource identifiers (IRIs).

Identifiers are universal and presented in one of the following forms:

  1. Full IRI (International resource identifier) which is the fully resolved identifier encompassed by <>
  2. Abbreviated IRI a Prefix followed by a ":" followed by the local name which is resolved to a full IRI
  3. Aliases. The core language tokens (that are themselves concepts) have aliases for ease of use. For example rdfs:subClassOf is aliased to subClassOf,

There is of course nothing to stop applications using their own aliases and when used with JSON-LD @context may be used to enable the use of aliases.

Data is considered to be linked across the world, which means that IRIs are the main identifiers. However, IRIs can be unwieldy to use and some of the languages such as GRAPH-QL do not use them. Furthermore, when used in JSON, (the main exchange syntax via APIs) they can cause significant bloat. Also, identifiers such as codes or terms have often been created for local use in local single systems and in isolation are ambiguous.

To create linked data from local identifiers or vocabulary, the concept of Context is applied. The main form of context in use are:

  1. PREFIX declaration for IRIs, which enable the use of abbreviated IRIs. This approach is used in OWL, RDF turtle, SHACL and Discovery itself.
  2. VOCABULAR CONTEXT declaration for both IRIs and other tokens. This approach is used in JSON-LD which converts local JSON properties and objects into linked data identifiers via the @context keyword. This enables applications that know their context to use simple identifiers such as aliases.
  3. MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.

OWL2 and RDFS

For the purposes of authoring and reasoning the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile OWL EL, which itself is a sublanguage of the OWL2 language

In addition to the open world assumption of OWL, RDFS constructs of domain and ranges are used in a closed word manner.

Within an information model instance itself the data relationships are held on their post inferred closed form i.e. inferred properties and relationships are explicitly stated using a normalisation process to eliminate duplications from super types. In other words, whereas an ontology may be authored using the open world assumption, prior to population of the live IM, classifications and inheritance are resolved. This uses the same approach as followed by Snomed-CT, whereby the inferred relationship containing the inherited properties and the "isa" relationship are included explicitly.

In the live IM many these are replaced with the RDFS standard terms and simplified. For example OWL existential quantifications are mapped to "role groups" in line with Snomed-CT.

Use of Annotation properties

Annotation properties are the properties that provide information beyond that needed for reasoning.  They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. 

Typical annotation properties are names and descriptions.

Owl construct usage examples IM live conversion
Class An entity that is a class concept e.g. A snomed-ct concept or a general concept rdfs:Class
ObjectProperty 'hasSubject' (an observation has a subject that is a patient) rdf:Property
DataProperty 'dateOfBirth' (a patient record has a date of birth attribute owl:dataTypeProperty
annotationProperty 'description' (a concept has a description)
SubClassOf Patient is a subclass of a Person rdfs:subClassOf
Equivalent To Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance) rdfs:subClassOf


Sub property of has responsible practitioner is a subproperty of has responsible agent rdfs:subPropertyOf
Property chain is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of' owl:Property chain
Existential quantification ( ObjectSomeValuesFrom) Chest pain and

Finding site of - {some} thoracic structure

im:roleGroup
Object Intersection Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure rdfs:Subclass

+

role groups

DataType definition Date time is a restriction on a string with a regex that allows approximate dates
Property domain a property domain of has causative agent is allergic reaction rdfs:domain
Property range A property range of has causative agent is a substance rdfs:range
Annotation Meaning
rdfs:label The name or term for an entity
rdfs:comment the description of an entity


SHACL shapes - data model

The shapes constraint language, as in the semantic ontology, the language borrows the constructs from the W3C standard SHACL, which can also be represented in any of the RDF supporting languages such as TURTLE or JSON-LD.

Example

SHACL for part of Encounter record type data model, note that it is both a class and a shape so it is classified as a subclass of an event which means it inherits the properties of an event (such as effective date), but the super class "has concept" property has a range constrained to a London extension" which is the class of encounter types such as gp consultation.

im:Encounter
 a sh:NodeShape , owl:Class;
     rdfs:label "Encounter (record type)" .
     im:isA im:Event ;
     im:status im:Active;
     rdfs:subClassOf im:PatientEvent;
     
     rdfs:comment "An interaction between a patient (or on behalf of the patient) and a health professional or health provider. It includes consultations as well as care processes such as admission, discharges. It also includes the noting of a filing of a document or report.";
     
     sh:property 
          [sh:path im:additionalPractitioners;
           sh:class im:PractitionerInRole] , 
          [sh:path im:completionStatus;
           sh:class im:894281000252100] , 
          [sh:path im:duration;
           sh:minCount "1"^^xsd:integer;
           sh:class im:894281000252100] , 
          [sh:path im:linkedAppointment;
           sh:class im:Appointment] , 
          [sh:path im:concept;
           sh:maxCount "1"^^xsd:integer;
           sh:minCount "1"^^xsd:integer;
           sh:class im:1741000252102]
         ......

Query

As the IM itself is held as RDF quads (triple+ graphs) the IM is generally queried using SPARQL for graph query and Lucene query (OpenSearch) for text query.

However, in order to support real world queries, which are usually articulated in plain language, the IM employs a pragmatic Query DSL to make the query of the IM, or any health record store, easier to construct.

The query DSL can be seen as a "bridge" between the plain language of a user interface and the lower level queries in SQL or SPARQL that result from it. It is designed to hide much of the complexity involved in the underling query constructs which for health record query can result in hundreds of lines of obscure advanced SQL. It is designed to be easy for developers to use, can can be understood by informaticians by direct visualisation.

The DSL itself, in JSON form can be submitted via the APIs to the IM service or a data service that supports a model compliant data store.

As well as the DSL the IM provides reference software for SQL and SPARQL conversion

The query DSL is specified more fully in the article on information model query