Health Information modelling language - overview: Difference between revisions

From Endeavour Knowledge Base
No edit summary
No edit summary
 
(42 intermediate revisions by the same user not shown)
Line 1: Line 1:
This article describes the languages used for creating and maintaining the information models and ontologies used in health records, as well as the means by which health record queries can be defined in a system independent manner.
N.B Not to be confused with the [[Information model meta model|Information model meta model.]] which specifies the classes that hold the information model data, those classes described using the languages defined below.


The modelling language is an amalgam of, and small extension to, semantic web languages.
This article describes the languages used in the information model meta model. In other words, the underlying grammar and syntax used as the building bricks for the classes that make up the model, instances of those classes being objects that conform to the class properties.  
 
The language includes description logic, shape constraints, expression constraints, and a pragmatic approach to modelling query of real data.


Details on the W3C standard languages that make up the grammar are described below.
Details on the W3C standard languages that make up the grammar are described below.


It should be noted that these  are modelling languages, not the physical data schema or actual query. These are defined in the languages commensurate with the technology (e.g. sql)
In addtion,  


The main purpose of a modelling language is to exchange data and information about [[Discovery health information model|information models]] in a way that machines can understand. It is not expected that clinicians would use the languages directly. The use of standard languages ensures that the models are interoperable across many domains including non health care domains.
If a system can consume RDF in its two main syntaxes (turtle and JSON-LD) then the model can be easily exchanged.


The languages cover the following areas:
The main advantage of RDF and the W3C standards is that types and properties are given internationally unique identifiers which are both humanly readable and can be resolved via the world wide web protocols.


# An ontology, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontology is made up of the world's leading ontology Snomed-CT, with a London extension and supplemented with additional concepts for data modelling.
Thus, in the information model, all classes,  properties and value types (subjects and predicates and objects) are IRIs which are defined by ontological techniques.
# A data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by live systems that have published data to a data service that uses these models. Note that this data model is NOT a standard model but a collated set of entities and relationships bound to the concepts based on real data, that are mapped to a single model.
# A library of business specific concept and value sets, which are expression constraints on the ontology for the purpose of query
# A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
# A library of Queries for querying and extracting instance data from reference data or health records.
# A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model.


== Contributory languages ==
== Contributory languages ==
Health data can be conceptualised as a graph, and thus the model is a graph model.
Health data can be conceptualised as a graph, and thus the model is a graph model.


As the information model is a graph, and both classes and properties are uniquely identified, [[wikipedia:Resource_Description_Framework|RDF]] is the language used. As the technical community use Json as the main stream syntax for exchanging objects, the preferred syntax for the model classes and properties is [[wikipedia:JSON-LD|JSON-LD,]] with instances in plain [[wikipedia:JSON|JSON]]
RDF itself has limited grammar the modelling language uses the main stream semantic web grammars and vocabularies, these being RDFS, OWL and SHACL. Additional vocabularies are added to the IM to accommodate the shortfalls in vocabularies,
In addition the IM accommodates some languages required to use the main health ontology i,e Expression Constraint language and Snomed compositional grammar. Within the IM ECL is modelled as query and Snomed-CT compositional grammar is modelled as a Concept class.
Finally, as a means of bridging the gap between user visualisation of query definitions and the underlying query languages such as SPARQL and SQL, the IM uses a set of classes to model query definitions, using a form that maps directly to SPARQL, SQL, GRAPHQL.
When exchanging models using the language grammar both Json-LD and turtle are supported as well as the more specialised syntaxes such as owl functional syntax or expression constraint language.


The modelling language is an amalgam of the following languages:
The modelling language is an amalgam of the following languages:


* [https://www.w3.org/TR/REC-rdf-syntax/ RDF.] An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph. RDF Forms the statements describing the data. RDF in itself holds no semantics whatsoever. i.e. it is not practical to infer or validate or query based purely on an RDF structure. To use RDF it is necessary to provide semantic definitions for certain predicates and adopt certain conventions. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things. RDF can be represented using either TURTLE syntax or JSON-LD.
* [https://www.w3.org/TR/REC-rdf-syntax/ RDF.] An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph. RDF Forms the statements describing the data. RDF in itself holds no semantics whatsoever. i.e. it is not practical to infer or validate or query based purely on an RDF structure. To use RDF it is necessary to provide semantic definitions for certain predicates and adopt certain conventions. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things. RDF can be represented using either TURTLE syntax or JSON-LD.
* [https://www.w3.org/TR/rdf-schema/ RDFS]. This is the first of the semantic languages. It is used for the purposes of some of the ontology axioms such as subclasses, domains and ranges as well as the standard annotation properties such as 'label'
* [https://www.w3.org/TR/rdf-schema/ RDFS]. This is the first of the semantic languages. It is used for the purposes of some of the ontology axioms such as subclasses, domains and ranges as well as the standard annotation properties such as 'label
* [https://www.w3.org/TR/owl2-primer/ OWL2 DL.]  For the ontology. This brings with it more sophisticated description logic such as equivalent classes and existential quantifications and is used in the ontology and for defining things when an open world assumption is required.
* [https://www.w3.org/TR/shacl/ SHACL]. For the data models. Used for everything that defines the shape of data  or logical entities and attributes. Although SHACL is designed for validation of RDF, as SHACL describes what  things 'should be' it can be used as a data modelling language
* [https://www.w3.org/TR/sparql11-query/ SPARQL] Used as the logical means of querying model conformant data (not to be confused with the actual query language used which may be SQL)
* RML, an extension of R2RML, used to map between RDF and other formats such as RDBMS, JSON or CSV.


'''Example (OWL2)'''
*[https://www.w3.org/TR/shacl/ SHACL]. For the data models of types.  Used for everything that defines the shape of data  or logical entities and attributes. Although SHACL is designed for validation of RDF, as SHACL describes what  things 'should be' it can be used as a data modelling language


Consider a definition of a grandfather, in the first example the grandfather is an equivalent to a person who is male ''and'' has children who are people that must have children.
*[https://www.w3.org/TR/owl2-primer/ OWL2 DL.]  This is supported in the authoring phase, but is simplified within the model. This brings with it more sophisticated description logic such as equivalent classes and existential quantifications ,and is used in the ontology and for defining things when an open world assumption is required. This has contributed to the design of the IM languages but OWL is removed in the run time models with class expressions being replaced by RDFS subclass, and role groups.
*[https://confluence.ihtsdotools.org/display/DOCECL#:~:text=The%20Expression%20Constraint%20Language%20is,either%20precoordinated%20or%20postcoordinated%20expressions. ECL.] This is a specialised query language created for Snomed-CT, used  for simple concepts modelled as subtypes, role groups and roles, and is of great value in defining sets of concepts for the myriad of business purposes used in health.
*[https://confluence.ihtsdotools.org/display/DOCSCG/Compositional+Grammar+-+Specification+and+Guide SCG]. Snomed compositional grammar, created for Snomed-CT, which is a concise syntax for representing simple concepts modelled  as subtypes. role groups and roles and is a way of displaying concept definitions.


using the turtle language
 
 
'''Example  multiple syntaxes and grammars'''
 
Consider a definition of chest pain in several syntaxes. Note that the OWL definition is in a form prior to classification whereas the others use the post classified structure (so called inferred)
<div class="toccolours mw-collapsible mw-collapsed">
Chest pain in Manchester syntax, SCG, ECL, OWL FS, IM Json-LD:
<div class="mw-collapsible-content">
<syntaxhighlight lang="turtle" style="border:3px solid grey">
<syntaxhighlight lang="turtle" style="border:3px solid grey">
:Grandfather
# Definition of Chest pain in owl Manchester Syntax
  owl:EquivalentClass [
equivalentTo  sn:298705000 and sn:301366005 and (sn:363698007 sn:51185008)
      owl:intersectionOf
 
              :Person,                           
#In RDF turtle
              [owl:onProperty :hasGender;
sn:29857009
              owl:somValuesFrom :Male],
  rdfs:subClassOf
              [owl:onProperty :hasChild;
        sn:301366005 ,  
              owl:somValuesFrom  [owl:intersectionOf
        sn:298705000;
                                          :Person,               
  im:roleGroup [im:groupNumber "1"^^xsd:integer;
                                        [owl:onProperty :hasChild;
  sn:363698007 sn:51185008];
                                          owl:someValuesFrom :Person] ) ])     
  rdfs:label "Chest pain (finding)" .
.
 
</syntaxhighlight>
 
# In Snomed compositional grammar
=== 298705000 |Finding of region of thorax (finding)| +
     301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| }


JSON is a popular syntax currently and thus this is used as an alternative.
# When using ECL to retrieve chest pain
<<298705000 |Finding of region of thorax (finding)| and  
    (<<301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| })


JSON represents subjects , predicates and objects as object names and values with values being either literals or or objects.


JSON itself has no inherent mechanism of differentiating between different types of entities and therefore JSON-LD is used. In JSON-LD identifiers resolve initially to @id and the use of @context enables prefixed IRIs and aliases.
#When used in OL functional syntax
EquivalentClasses(
:29857009 |Chest pain (finding)|
ObjectIntersectionOf(
:22253000 |Pain (finding)|
ObjectSomeValuesFrom(
:609096000 |Role group (attribute)|
ObjectSomeValuesFrom(
:363698007 |Finding site (attribute)|
:51185008 |Thoracic structure (body structure)|
)
)
)
)
# In Json-LD


The above  Grandfather can be represented in JSON-LD (context not shown) as follows:<syntaxhighlight lang="json-ld" style="border:3px solid grey">
{
{"@id" : ":Grandfather",
  "@id" : "sct:29857009",
"owl:EquivalentClass" : [
  "rdfs:label" : "Chest pain (finding)",
            {"owl:intersectionOf" :[
  "im:definitionalStatus" : {"@id" : "im:1251000252106","name" : "Concept definition is sufficient (equivalent status)"},
                    { "@id": ":Person"},
  "rdfs:subClassOf" : [ {
                    { "owl:onProperty" : ":hasGender",
    "@id" : "sct:301366005",
                      "owl:somValuesFrom": {"@id":":Male"}},
    "name" : "Pain of truncal structure (finding)"
                   
  }, {
                      { "owl:onProperty" : ":hasChild",
    "@id" : "sct:298705000",
                        "owl:someValuesFrom" : {
    "name" : "Finding of region of thorax (finding)"
                          "owl:intersectionOf": [
  } ],
                                { "@id":"Person"},
  "im:roleGroup" : [ {
                                {"owl:onProperty" : ":hasChild",
    "im:groupNumber" : 1,
                                  "owl:someValuesFrom" : {"@id":":Person"}}]]}}
    "sct:363698007" : [ {
      "@id" : "sct:51185008",
      "name" : "Thoracic structure (body structure)"
    } ]
  } ]
}
</syntaxhighlight>
</syntaxhighlight>
</div>
</div> <div class="mw-collapsible-content">&nbsp;</div>


== Sublanguages and syntaxes ==
== Internal IM languages for IMAPI usage ==
An implementation of the IM as a terminology server or query library exists.


=== Foundation grammars and syntaxes - RDF, TURTLE and JSON-LD ===
This implementation uses the following mainstream languages
Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:
 
* Java, used as the main logical business end, server side and services the REST APIs used to exchange information with the IM server
* Javscript / TypeScript extension used for business logic that provides UI specific APIs the web applications


* A terse abbreviated language, TURTLE
*[https://www.w3.org/TR/sparql11-query/ SPARQL] Used as the logical means of querying model conformant data (not to be confused with the actual query language used which may be SQL). Used as the query language for the IM and mapped from IM Query Health queries would generally use SQL
*[https://opensearch.org/docs/latest/opensearch/query-dsl/index/ OpenSearch / Elastic.] Used for complex free text query for fining concepts using the AWS OpenSearch DSL (derivative of Lucene Query). Note that simple free text Lucene indexing is supported by the IM database engines and is used in combined graph/text query.
*[[Meta model class specification#Query .2FSet definition|IM Query.]] Not strictly a language but a class definition representing a scheme independent  way of defining sets (query results) including all the main health queries used by clinicians and analysts. 


* SPARQL for query
== Grammars and syntaxes ==


* JSON-LD representation, which can  used by systems that prefer JSON, wish to use standard approaches, and are able to resolve identifiers via the JSON-LD context structure.
=== Foundation syntaxes - RDF, TURTLE and JSON-LD ===
Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:


* A terse abbreviated language, TURTLE


* JSON-LD representation, which can  used by systems that prefer JSON (the majority) , and are able to resolve identifiers via the JSON-LD context structure.


'''Identifiers, aliasing  prefixes and context'''
'''Identifiers, aliasing  prefixes and context'''
Line 108: Line 150:
# MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.
# MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.


=== Ontology - OWL2 DL ===
=== OWL2 and RDFS ===
 
For the purposes of authoring and reasoning  the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL] , which itself is a sublanguage of the [https://www.w3.org/TR/owl2-syntax/ OWL2 language]
 
In addition to the open world assumption of OWL, RDFS constructs of domain and ranges (OWL DL) but are are used in a closed word manner as RDFS.
 
Within an information model instance itself the data relationships are held on their post inferred closed form i.e. inferred properties and relationships are explicitly stated using a normalisation process to eliminate duplications from super types.  In other words, whereas an ontology may be authored using the open world assumption, prior to population of the live IM, classifications and inheritance are resolved. This uses the same approach as followed by Snomed-CT, whereby the inferred relationship containing the inherited properties and the "isa" relationship are included explicitly.
 
In the live IM OWL Axioms are replaced with the RDFS standard terms and simplified. For example OWL existential quantifications are mapped to "role groups" in line with Snomed-CT.
 
'''Use of Annotation properties'''


For the purposes of reasoning the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL], which itself is a sublanguage of the [https://www.w3.org/TR/owl2-syntax/ OWL2 language]
Annotation properties are the properties that provide information beyond that needed for reasoning.&nbsp; They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand.&nbsp;


However, in addition some standard OWL2 DL axioms are used in order to provide a means of specifying additional relationships that are of value when defining relationships. The following table lists the main owl  types used and example for each.  Note that their aliases are used for brevity. Please refer to the OWL2 specification to describe their meanings
Typical annotation properties are names and descriptions.
{| class="wikitable"
{| class="wikitable"
|+
|+
!Owl construct
!Owl construct
!usage examples
!usage examples
!'''IM live conversion'''
|-
|-
|Class
|Class
|An entity that is a class concept e.g. A snomed-ct concept or a general concept
|An entity that is a class concept e.g. A snomed-ct concept or a general concept
|rdfs:Class
|-
|-
|ObjectProperty
|ObjectProperty
|'hasSubject' (an observation '''has a subject''' that is a patient)
|'hasSubject' (an observation '''has a subject''' that is a patient)
|rdf:Property
|-
|-
|DataProperty
|DataProperty
|'dateOfBirth'  (a patient record has a date of birth attribute
|'dateOfBirth'  (a patient record has a date of birth attribute
|owl:dataTypeProperty
|-
|-
|annotationProperty
|annotationProperty
|'description'  (a concept has a description)
|'description'  (a concept has a description)
|
|-
|-
|SubClassOf
|SubClassOf
|Patient is a subclass of a Person
|Patient is a subclass of a Person
|rdfs:subClassOf
|-
|-
|Equivalent To
|Equivalent To
|Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance)
|Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance)
|rdfs:subClassOf
<br />
|-
|-
|Disjoint with
|Sub property of
|Father is disjoint with Mother
|-
|Sub property of  
|has responsible practitioner is a subproperty of has responsible agent
|has responsible practitioner is a subproperty of has responsible agent
|rdfs:subPropertyOf
|-
|-
|Property chain  
|Property chain
|is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of'
|is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of'
|owl:Property chain
|-
|-
|Inverse property
|Existential quantification ( ObjectSomeValuesFrom)
|is subject of is inverse of has subject
|-
|Transitive property
|is child of is transitive
|-
|Existential quantification
|Chest pain and
|Chest pain and
Finding site of  - {some} thoracic structure
Finding site of  - {some} thoracic structure
|im:roleGroup
|-
|-
|Object Intersection
|Object Intersection
|Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure
|Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure
|-
|rdfs:Subclass
|Individual
 
|All chest pain subclasses but not the specific i''nstance of acute chest pain''
+
 
role groups
|-
|-
|DataType definition
|DataType definition
|Date time  is a restriction on a string with a regex that allows approximate dates
|Date time  is a restriction on a string with a regex that allows approximate dates
|
|-
|-
|Property domain
|Property domain
|a property domain of has causative agent is allergic reaction
|a property domain of has causative agent is allergic reaction
|rdfs:domain
|-
|-
|Property range
|Property range
|A property range of has causative agent is a substance
|A property range of has causative agent is a substance
|rdfs:range
|}
{| class="wikitable"
|+
!Annotation
!Meaning
|-
|rdfs:label
|The name or term for an entity
|-
|rdfs:comment
|the description of an entity
|-
|
|
|}
|}
'''Use of Annotation properties for original codes'''


Annotation properties are the properties that provide information beyond that needed for reasoning.&nbsp; They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. Annotation properties can also be used for implementation supporting properties such as release status, version control, authoring dates and times and so on.&nbsp;
=== SHACL shapes ===
SHACL is used as a means of specifying the "data model types" of health record entities and also the IM itself as described directly in the [[Information model meta model#Meta model class specification|meta model article]].


Typical annotation properties are names and descriptions. They are also used as meta data such as a status of a concept or the version of a document.
SHACL is used in its standard form and is not extended.


Many concepts are derived directly from source systems that used them as codes, or even free text.
=== OWL extension : data property expressions ===
Within health care, (and in common parlance), data properties are often used as syntactical short cuts to objects with qualifiers  and a literal value element.  


The concept indicates the source and original code or text (or combination) in the form actually entered into the source system. It should be noted that many systems do not record codes exactly as determined by an official classification or provide codes via mappings from an internal id.&nbsp; It is the codes or text used from the publishers perspective that&nbsp; is used as the source.
For example, the data property "Home telephone number" would be expected to simply contain a number. But a home telephone number also has a number of properties by implication, such as the fact that its usage is "home", and has a country and area code.


Thus in many cases, it is convenient to auto generate a code, which is then placed as the value of the “code” property in the concept, together with the scheme. From this, the provenance of the code can be inferred.  
OWL 2 has a known limitation (as described in the OWL specification itself) in respect of data property expressions. OWL2 can only define data property expressions as data property IRIs with annotations.  


Each code must have a scheme. A scheme may be an official scheme or&nbsp; proprietary scheme or a local scheme related to a particular sub system.
In many health care standards such as HL7 FHIR, these data properties are object properties with the objects having the "value" as one of its properties..
 
For example, here are some scheme/ code combinations
{| class="MsoTableGrid"
|-
| width="109" |<span><span>Scheme</span></span>
| width="316" |<span><span>Original Code/Text/Context</span></span>
| width="106" |<span><span>Concept code/ Auto code</span></span>
| width="224" |<span><span>Meaning</span></span>
|-
| width="109" |<span><span>Snomed-CT&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></span>
| width="316" |<span><span>47032000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></span>
| width="106" |<span><span>47032000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></span>
| width="224" |<span><span>Primary hydrocephaly</span></span>
|-
| width="109" |<span><span>EMIS- Read</span></span>
| width="316" |<span><span>H33-1&nbsp;&nbsp;</span></span>
| width="106" |<span><span>H33-1&nbsp;&nbsp;</span></span>
| width="224" |<span><span>Bronchial asthma</span></span>
|-
| width="109" |<span><span>EMIS – EMIS</span></span>
| width="316" |<span><span>EMISNQCO303</span></span>
| width="106" |<span><span>EMLOC_EMISNQCO303</span></span>
| width="224" |<span><span>Confirmed corona virus infection</span></span>
|-
| width="109" |<span><span>Barts/Cerner</span></span>
| width="316" |<span><span>Event/Order=687309281</span></span>
| width="106" |<span><span>BC_687309281</span></span>
| width="224" |<span><span>Tested for Coronavirus (misuse of code term in context)</span></span>
|-
| width="109" |<span><span>Barts/Cerner</span></span>
| width="316" |<span><span>Event/Order= 687309281/ResultTxt= SARS-CoV-2 RNA DETECTED</span></span>
| width="106" |<span><span>BC_dsdsdsdx7</span></span>
| width="224" |<span><span>Positive coronavirus result</span></span>


&nbsp;
For example, in FHIR  the patients home telephone number is carried explicitly as the property contact {property= telecom -> value =  {property use= Home, /property System= coding system,/ value = the actual number } } i.e. 3 ;levels of nesting.
|}Note that in the last example, the original code is actually text and has been contextualised as being from the Cerner event table, the order field having a value of 687309281 and the result text having a value of ResultTxt= SARS-CoV-2 RNA DETECTED


=== Data model - SHACL shapes ===
Whilst explicit modelling is vital for information exchanged between systems with different data models, if stored in this way, queries would underperform, so the actual systems usually store the home telephone number perhaps in  a field "home telephone"  in the patient table or a simple triple.
The shapes constraint language, as in the semantic ontology, the language borrows the constructs from the W3C standard SHACL, which can also be represented in any of the RDF supporting languages such as TURTLE or JSON-LD.


'''Example'''
To resolve the bridge between a complex object definition and simple data property the information model supports data property expressions (but without introducing a new language construct() as follows:


SHACL for part of Encounter record type data model, note that it is both a class and a shape so it is classified as a subclass of an event which means it inherits the properties of an event (such as effective date), but the super class "has concept" property has a range constrained to a London extension" which is the class of encounter types such as gp consultation.<syntaxhighlight lang="turtle">
# Simple data property against the class e.g. a "contact"
im:Encounter
# Patient's home telephone number modelled as a ''sub property'' "homeTelephoneNumber with is a sub property of "telephone number", which is itself a sub property of "contact".
  a sh:NodeShape , owl:Class;
# A standard RDFS property of the homeTelephone property entity - > "isDefinedBy" which points to a class expression which defines a home telephone number, (itself a subclass of a class expression TelephoneNumber) thus allowing all properties values to be "implicit but defined" as part of the ontology.
    rdfs:label "Encounter (record type)" .
    im:isA im:Event ;
    im:status im:Active;
    rdfs:subClassOf im:PatientEvent;
   
    rdfs:comment "An interaction between a patient (or on behalf of the patient) and a health professional or health provider. It includes consultations as well as care processes such as admission, discharges. It also includes the noting of a filing of a document or report.";
   
    sh:property
          [sh:path im:additionalPractitioners;
          sh:class im:PractitionerInRole] ,
          [sh:path im:completionStatus;
          sh:class im:894281000252100] ,
          [sh:path im:duration;
          sh:minCount "1"^^xsd:integer;
          sh:class im:894281000252100] ,
          [sh:path im:linkedAppointment;
          sh:class im:Appointment] ,
          [sh:path im:concept;
          sh:maxCount "1"^^xsd:integer;
          sh:minCount "1"^^xsd:integer;
          sh:class im:1741000252102]
        ......
       
</syntaxhighlight>


=== Concept Set definitions - RDF/ ECL ===
By this technique subsumption queries that look for home contacts or home telephone numbers or find numbers with US country codes will find the relevant field and the relevant sub pattern of a data property..
In line with expression constraint language used to define sets of concepts, defining a set is a query over the ontology resulting in a set of concepts to use in a subsequent query.


The information model uses RDF predicate to hold the definition the set meta data (e.g. the name of the set, the fact that it is a set etc) and uses SHACL predicates for modelling the boolean logic
Implementations would still need to parse numbers to properties if they stored numbers as simple numbers but these would be part of a data model map against the IM models definition.


'''Example - complex set'''
== Information model meta classes ==
See main article [[Information model meta model|Information model meta classes]]


Lets say a commissioner needs to know who the patients are that have had Covid vaccines.
Using the above languages this defines the classes used to model all health data.


Covid vaccines are recorded either as immunisation records, or medication records, or both.  To query the medication records, a set of vaccine medication concepts are searched for, these being stored in medication order record entries. Covid vaccines change every few weeks as new brands or strengths are released.


A definition of a covid vaccine is helpful, thus a concept set is defined.
<br /><syntaxhighlight lang="turtle">
<<39330711000001103          # is a Covid vaccine
OR                                            #or (
<<10363601000001109:          # is a uk product
                                                                      #and
    <<s10362601000001103 = 10362601000001103} }      #has vmp Covd vaccine)


</syntaxhighlight><br />
<br />

Latest revision as of 14:53, 5 January 2023

N.B Not to be confused with the Information model meta model. which specifies the classes that hold the information model data, those classes described using the languages defined below.

This article describes the languages used in the information model meta model. In other words, the underlying grammar and syntax used as the building bricks for the classes that make up the model, instances of those classes being objects that conform to the class properties.

Details on the W3C standard languages that make up the grammar are described below.

In addtion,

If a system can consume RDF in its two main syntaxes (turtle and JSON-LD) then the model can be easily exchanged.

The main advantage of RDF and the W3C standards is that types and properties are given internationally unique identifiers which are both humanly readable and can be resolved via the world wide web protocols.

Thus, in the information model, all classes, properties and value types (subjects and predicates and objects) are IRIs which are defined by ontological techniques.

Contributory languages

Health data can be conceptualised as a graph, and thus the model is a graph model.

As the information model is a graph, and both classes and properties are uniquely identified, RDF is the language used. As the technical community use Json as the main stream syntax for exchanging objects, the preferred syntax for the model classes and properties is JSON-LD, with instances in plain JSON

RDF itself has limited grammar the modelling language uses the main stream semantic web grammars and vocabularies, these being RDFS, OWL and SHACL. Additional vocabularies are added to the IM to accommodate the shortfalls in vocabularies,

In addition the IM accommodates some languages required to use the main health ontology i,e Expression Constraint language and Snomed compositional grammar. Within the IM ECL is modelled as query and Snomed-CT compositional grammar is modelled as a Concept class.

Finally, as a means of bridging the gap between user visualisation of query definitions and the underlying query languages such as SPARQL and SQL, the IM uses a set of classes to model query definitions, using a form that maps directly to SPARQL, SQL, GRAPHQL.

When exchanging models using the language grammar both Json-LD and turtle are supported as well as the more specialised syntaxes such as owl functional syntax or expression constraint language.

The modelling language is an amalgam of the following languages:

  • RDF. An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph. RDF Forms the statements describing the data. RDF in itself holds no semantics whatsoever. i.e. it is not practical to infer or validate or query based purely on an RDF structure. To use RDF it is necessary to provide semantic definitions for certain predicates and adopt certain conventions. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things. RDF can be represented using either TURTLE syntax or JSON-LD.
  • RDFS. This is the first of the semantic languages. It is used for the purposes of some of the ontology axioms such as subclasses, domains and ranges as well as the standard annotation properties such as 'label
  • SHACL. For the data models of types. Used for everything that defines the shape of data or logical entities and attributes. Although SHACL is designed for validation of RDF, as SHACL describes what things 'should be' it can be used as a data modelling language
  • OWL2 DL. This is supported in the authoring phase, but is simplified within the model. This brings with it more sophisticated description logic such as equivalent classes and existential quantifications ,and is used in the ontology and for defining things when an open world assumption is required. This has contributed to the design of the IM languages but OWL is removed in the run time models with class expressions being replaced by RDFS subclass, and role groups.
  • ECL. This is a specialised query language created for Snomed-CT, used for simple concepts modelled as subtypes, role groups and roles, and is of great value in defining sets of concepts for the myriad of business purposes used in health.
  • SCG. Snomed compositional grammar, created for Snomed-CT, which is a concise syntax for representing simple concepts modelled as subtypes. role groups and roles and is a way of displaying concept definitions.


Example multiple syntaxes and grammars

Consider a definition of chest pain in several syntaxes. Note that the OWL definition is in a form prior to classification whereas the others use the post classified structure (so called inferred)

Chest pain in Manchester syntax, SCG, ECL, OWL FS, IM Json-LD:

# Definition of Chest pain in owl Manchester Syntax
 equivalentTo  sn:298705000 and sn:301366005 and (sn:363698007 sn:51185008)

#In RDF turtle
sn:29857009
   rdfs:subClassOf 
         sn:301366005 , 
         sn:298705000;
   im:roleGroup [im:groupNumber "1"^^xsd:integer;
   sn:363698007 sn:51185008];
   rdfs:label "Chest pain (finding)" .


# In Snomed compositional grammar
=== 298705000 |Finding of region of thorax (finding)| + 
    301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| }

# When using ECL to retrieve chest pain
<<298705000 |Finding of region of thorax (finding)| and 
    (<<301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| })


#When used in OL functional syntax
EquivalentClasses(
	:29857009 |Chest pain (finding)|
	ObjectIntersectionOf(
		:22253000 |Pain (finding)|
		ObjectSomeValuesFrom(
			:609096000 |Role group (attribute)|
			ObjectSomeValuesFrom(
				:363698007 |Finding site (attribute)|
				:51185008 |Thoracic structure (body structure)|
			)
		)
	)
)
# In Json-LD

{
  "@id" : "sct:29857009",
  "rdfs:label" : "Chest pain (finding)",
  "im:definitionalStatus" : {"@id" : "im:1251000252106","name" : "Concept definition is sufficient (equivalent status)"},
  "rdfs:subClassOf" : [ {
    "@id" : "sct:301366005",
    "name" : "Pain of truncal structure (finding)"
  }, {
    "@id" : "sct:298705000",
    "name" : "Finding of region of thorax (finding)"
  } ],
  "im:roleGroup" : [ {
    "im:groupNumber" : 1,
    "sct:363698007" : [ {
      "@id" : "sct:51185008",
      "name" : "Thoracic structure (body structure)"
    } ]
  } ]
}
 

Internal IM languages for IMAPI usage

An implementation of the IM as a terminology server or query library exists.

This implementation uses the following mainstream languages

  • Java, used as the main logical business end, server side and services the REST APIs used to exchange information with the IM server
  • Javscript / TypeScript extension used for business logic that provides UI specific APIs the web applications
  • SPARQL Used as the logical means of querying model conformant data (not to be confused with the actual query language used which may be SQL). Used as the query language for the IM and mapped from IM Query Health queries would generally use SQL
  • OpenSearch / Elastic. Used for complex free text query for fining concepts using the AWS OpenSearch DSL (derivative of Lucene Query). Note that simple free text Lucene indexing is supported by the IM database engines and is used in combined graph/text query.
  • IM Query. Not strictly a language but a class definition representing a scheme independent way of defining sets (query results) including all the main health queries used by clinicians and analysts.

Grammars and syntaxes

Foundation syntaxes - RDF, TURTLE and JSON-LD

Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:

  • A terse abbreviated language, TURTLE
  • JSON-LD representation, which can used by systems that prefer JSON (the majority) , and are able to resolve identifiers via the JSON-LD context structure.

Identifiers, aliasing prefixes and context

Concepts are identified and referenced by the use of International resource identifiers (IRIs).

Identifiers are universal and presented in one of the following forms:

  1. Full IRI (International resource identifier) which is the fully resolved identifier encompassed by <>
  2. Abbreviated IRI a Prefix followed by a ":" followed by the local name which is resolved to a full IRI
  3. Aliases. The core language tokens (that are themselves concepts) have aliases for ease of use. For example rdfs:subClassOf is aliased to subClassOf,

There is of course nothing to stop applications using their own aliases and when used with JSON-LD @context may be used to enable the use of aliases.

Data is considered to be linked across the world, which means that IRIs are the main identifiers. However, IRIs can be unwieldy to use and some of the languages such as GRAPH-QL do not use them. Furthermore, when used in JSON, (the main exchange syntax via APIs) they can cause significant bloat. Also, identifiers such as codes or terms have often been created for local use in local single systems and in isolation are ambiguous.

To create linked data from local identifiers or vocabulary, the concept of Context is applied. The main form of context in use are:

  1. PREFIX declaration for IRIs, which enable the use of abbreviated IRIs. This approach is used in OWL, RDF turtle, SHACL and Discovery itself.
  2. VOCABULAR CONTEXT declaration for both IRIs and other tokens. This approach is used in JSON-LD which converts local JSON properties and objects into linked data identifiers via the @context keyword. This enables applications that know their context to use simple identifiers such as aliases.
  3. MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.

OWL2 and RDFS

For the purposes of authoring and reasoning the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile OWL EL , which itself is a sublanguage of the OWL2 language

In addition to the open world assumption of OWL, RDFS constructs of domain and ranges (OWL DL) but are are used in a closed word manner as RDFS.

Within an information model instance itself the data relationships are held on their post inferred closed form i.e. inferred properties and relationships are explicitly stated using a normalisation process to eliminate duplications from super types. In other words, whereas an ontology may be authored using the open world assumption, prior to population of the live IM, classifications and inheritance are resolved. This uses the same approach as followed by Snomed-CT, whereby the inferred relationship containing the inherited properties and the "isa" relationship are included explicitly.

In the live IM OWL Axioms are replaced with the RDFS standard terms and simplified. For example OWL existential quantifications are mapped to "role groups" in line with Snomed-CT.

Use of Annotation properties

Annotation properties are the properties that provide information beyond that needed for reasoning.  They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. 

Typical annotation properties are names and descriptions.

Owl construct usage examples IM live conversion
Class An entity that is a class concept e.g. A snomed-ct concept or a general concept rdfs:Class
ObjectProperty 'hasSubject' (an observation has a subject that is a patient) rdf:Property
DataProperty 'dateOfBirth' (a patient record has a date of birth attribute owl:dataTypeProperty
annotationProperty 'description' (a concept has a description)
SubClassOf Patient is a subclass of a Person rdfs:subClassOf
Equivalent To Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance) rdfs:subClassOf


Sub property of has responsible practitioner is a subproperty of has responsible agent rdfs:subPropertyOf
Property chain is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of' owl:Property chain
Existential quantification ( ObjectSomeValuesFrom) Chest pain and

Finding site of - {some} thoracic structure

im:roleGroup
Object Intersection Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure rdfs:Subclass

+

role groups

DataType definition Date time is a restriction on a string with a regex that allows approximate dates
Property domain a property domain of has causative agent is allergic reaction rdfs:domain
Property range A property range of has causative agent is a substance rdfs:range
Annotation Meaning
rdfs:label The name or term for an entity
rdfs:comment the description of an entity

SHACL shapes

SHACL is used as a means of specifying the "data model types" of health record entities and also the IM itself as described directly in the meta model article.

SHACL is used in its standard form and is not extended.

OWL extension : data property expressions

Within health care, (and in common parlance), data properties are often used as syntactical short cuts to objects with qualifiers and a literal value element.

For example, the data property "Home telephone number" would be expected to simply contain a number. But a home telephone number also has a number of properties by implication, such as the fact that its usage is "home", and has a country and area code.

OWL 2 has a known limitation (as described in the OWL specification itself) in respect of data property expressions. OWL2 can only define data property expressions as data property IRIs with annotations.

In many health care standards such as HL7 FHIR, these data properties are object properties with the objects having the "value" as one of its properties..

For example, in FHIR the patients home telephone number is carried explicitly as the property contact {property= telecom -> value = {property use= Home, /property System= coding system,/ value = the actual number } } i.e. 3 ;levels of nesting.

Whilst explicit modelling is vital for information exchanged between systems with different data models, if stored in this way, queries would underperform, so the actual systems usually store the home telephone number perhaps in a field "home telephone" in the patient table or a simple triple.

To resolve the bridge between a complex object definition and simple data property the information model supports data property expressions (but without introducing a new language construct() as follows:

  1. Simple data property against the class e.g. a "contact"
  2. Patient's home telephone number modelled as a sub property "homeTelephoneNumber with is a sub property of "telephone number", which is itself a sub property of "contact".
  3. A standard RDFS property of the homeTelephone property entity - > "isDefinedBy" which points to a class expression which defines a home telephone number, (itself a subclass of a class expression TelephoneNumber) thus allowing all properties values to be "implicit but defined" as part of the ontology.

By this technique subsumption queries that look for home contacts or home telephone numbers or find numbers with US country codes will find the relevant field and the relevant sub pattern of a data property..

Implementations would still need to parse numbers to properties if they stored numbers as simple numbers but these would be part of a data model map against the IM models definition.

Information model meta classes

See main article Information model meta classes

Using the above languages this defines the classes used to model all health data.