Health Information modelling language - overview: Difference between revisions

From Endeavour Knowledge Base
No edit summary
No edit summary
 
(209 intermediate revisions by the same user not shown)
Line 1: Line 1:
N.B Not to be confused with the [[Information model meta model|Information model meta model.]] which specifies the classes that hold the information model data, those classes described using the languages defined below.


The Discovery Information Modelling Language is a [[wikipedia:Mixed_language|mixed language]]  that subsumes 5 standard [[wikipedia:Sublanguage|sublanguages]], brought together under a common grammar and an optional syntax.
This article describes the languages used in the information model meta model. In other words, the underlying grammar and syntax used as the building bricks for the classes that make up the model, instances of those classes being objects that conform to the class properties.  


The purpose of the language is to define a health information information model in a way that supports implementations of the model using different data base technologies and query languages.
Details on the W3C standard languages that make up the grammar are described below.
[[File:Sublanguages.png|thumb]]
Each sublanguage is based on the grammar of  a single recognised standards based language, the language having been selected as the ones that are the closest fit to the information requirements that the Discovery health information model is designed to support. A single grammar and optional single syntax enables the model to operate in an integrated manner, but at the same time enables the sublanguages to be represented in their native standard languages.


Like [https://www.nlm.nih.gov/research/umls/index.html UMLS], the sublanguages use a common link in cross reference, a [[Discovery semantic ontology language|concept,]] which is identified with a unique identifier (Internationalised resource identifier - IRI) . a concept is usually named and defined semantically, and forms the means of traversing the model from different starting points to different end points, for different purposes. 
In addtion,  


== Why another language? ==
If a system can consume RDF in its two main syntaxes (turtle and JSON-LD) then the model can be easily exchanged.
There are hundreds of computer languages and thousands of natural languages, many of which are accepted as "standards" in many communities. Why have another?


Within the health informatics community, a historical separation has evolved between two modelling camps, those that model semantics of concepts via an ontology (aka terminologists), and those that model data structures for storing and transmitting data (aka structuralists). This separation reflects the difference in purpose, the difference in mindset, and the difference in skills required by the different disciplines. Furthermore, across many industry wide specialisms, similar fundamental requirements have been approached using specialised languages that appear to overlap and even conflict.  
The main advantage of RDF and the W3C standards is that types and properties are given internationally unique identifiers which are both humanly readable and can be resolved via the world wide web protocols.


A problem with this separation occurs at the points of overlap. Different camps model their tokens and vocabularies in different ways, both from a grammar and syntax perspective.  
Thus, in the information model, all classes, properties and value types (subjects and predicates and objects) are IRIs which are defined by ontological techniques.


For example, in health care it is possible to model a surgical operation as a data structure with a body site attribute. ([https://www.hl7.org/fhir/procedure.html FHIR R4 procedure]) does precisely this. It is equally possible to model a procedure by including the body site either as a qualifier of  a type of procedure, or as part of the procedure definition itself.  Snomed-CT does precisely this.  Both approaches can use the same concept for the body site itself, but they would use separate property concepts for the property of "has body site" itself.  This separation of approach can lead to massive divergences. Taking the structuralist approach and extending it results in archetypes of the kind modelled by [https://www.openehr.org/ OpenEHR]. Taking the ontological approach further leads to complex nested expressions which are nigh on impenetrable.  
== Contributory languages ==
Health data can be conceptualised as a graph, and thus the model is a graph model.


Health record query can be achieved via the use of a standard language such as SPARQL or a specialised form of query such as AQL. However, when querying the attributes of a user as part of an attribute based access control policy, a completely different way of representing query may be used.
As the information model is a graph, and both classes and properties are uniquely identified, [[wikipedia:Resource_Description_Framework|RDF]] is the language used. As the technical community use Json as the main stream syntax for exchanging objects, the preferred syntax for the model classes and properties is [[wikipedia:JSON-LD|JSON-LD,]] with instances in plain [[wikipedia:JSON|JSON]]


Having a grammar and syntax that encompasses both semantics and structure, and makes the use of the common overlapping concepts much easier to manage. Having a common syntax for query definition means that a rule in an ABAC policy can use the same syntax as a health record query. Having a common message format in line within interoperability standard such as FHIR makes sure that the data is never locked in an  information silo. A classic structural concept such as an encounter record and its semantic definition, can be seamlessly integrated.
RDF itself has limited grammar the modelling language uses the main stream semantic web grammars and vocabularies, these being RDFS, OWL and SHACL. Additional vocabularies are added to the IM to accommodate the shortfalls in vocabularies,


Selecting one language is not an option. For example, it is possible to model data in OWL2 DL by extensive use of complex OWL constructs including functional properties, property domains, ranges, precise cardinality. It is also possible to model query as OWL expressions, except for function parameters. However, the purpose of OWL is to support reasoning, and reasoners use the [[wikipedia:Open-world_assumption|open world assumption.]]. Data models and data query uses a [[wikipedia:Closed-world_assumption|closed world assumption]] and query languages are declarative in nature i.e. instructions as to what to do. Using OWL for purposes other than reasoning and or classification is like using English to prove Pythagoras theorem.  
In addition the IM accommodates some languages required to use the main health ontology i,e Expression Constraint language and Snomed compositional grammar. Within the IM ECL is modelled as query and Snomed-CT compositional grammar is modelled as a Concept class.


Bringing the languages together, at least as a temporary measure to solve a particular set of information requirements, seems worthwhile. Hence a new language.
Finally, as a means of bridging the gap between user visualisation of query definitions and the underlying query languages such as SPARQL and SQL, the IM uses a set of classes to model query definitions, using a form that maps directly to SPARQL, SQL, GRAPHQL.


== Ontology sublanguage ==
When exchanging models using the language grammar both Json-LD and turtle are supported as well as the more specialised syntaxes such as owl functional syntax or expression constraint language.


''Main article''  [[Discovery semantic ontology language]]
The modelling language is an amalgam of the following languages:


The semantic ontology language is part of the Discovery information modelling language.
* [https://www.w3.org/TR/REC-rdf-syntax/ RDF.] An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph. RDF Forms the statements describing the data. RDF in itself holds no semantics whatsoever. i.e. it is not practical to infer or validate or query based purely on an RDF structure. To use RDF it is necessary to provide semantic definitions for certain predicates and adopt certain conventions. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things. RDF can be represented using either TURTLE syntax or JSON-LD.
* [https://www.w3.org/TR/rdf-schema/ RDFS]. This is the first of the semantic languages. It is used for the purposes of some of the ontology axioms such as subclasses, domains and ranges as well as the standard annotation properties such as 'label


The grammar for the semantic ontology language used for the Discovery ontology is  [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL], which is  limited profile of OWL DL. The language used for data modelling and [[Value sets|value set]] modelling is [https://www.w3.org/TR/owl2-syntax/ OWL2 DL] as the more expressive constructs are required.  
*[https://www.w3.org/TR/shacl/ SHACL]. For the data models of types. Used for everything that defines the shape of data or logical entities and attributes. Although SHACL is designed for validation of RDF, as SHACL describes what things 'should be' it can be used as a data modelling language


As such the ontology supports the OWL2 syntaxes such as the Functional syntax and Manchester syntax, but also supports the Discovery JSON based syntax, as part of the full information modelling language.  
*[https://www.w3.org/TR/owl2-primer/ OWL2 DL.]  This is supported in the authoring phase, but is simplified within the model. This brings with it more sophisticated description logic such as equivalent classes and existential quantifications ,and is used in the ontology and for defining things when an open world assumption is required. This has contributed to the design of the IM languages but OWL is removed in the run time models with class expressions being replaced by RDFS subclass, and role groups.
*[https://confluence.ihtsdotools.org/display/DOCECL#:~:text=The%20Expression%20Constraint%20Language%20is,either%20precoordinated%20or%20postcoordinated%20expressions. ECL.] This is a specialised query language created for Snomed-CT, used  for simple concepts modelled as subtypes, role groups and roles, and is of great value in defining sets of concepts for the myriad of business purposes used in health.
*[https://confluence.ihtsdotools.org/display/DOCSCG/Compositional+Grammar+-+Specification+and+Guide SCG]. Snomed compositional grammar, created for Snomed-CT, which is a concise syntax for representing simple concepts modelled  as subtypes. role groups and roles and is a way of displaying concept definitions.


Together with the query language, OWL2 DL makes the language compatible also with [https://confluence.ihtsdotools.org/display/DOCECL/Expression+Constraint+Language+-+Specification+and+Guide Expression constraint language] which is used as the standard for specifying Snomed-CT expression query 


Ontology purists will notice that modelling a data model in OWL2 is in fact a breach of the fundamental  [[wikipedia:Open-world_assumption|open world assumption]] view of the world taken in ontologies and instead applies the https://en.wikipedia.org/wiki/Closed-world_assumption view instead. Consequently, a data model would normally be used independently of DL


Furthermore, as data models are modelled for business purposes, and semantic models are modelled for reasoning purposes, a style that connects the two via the use of an object property  "is type" is used.
'''Example  multiple syntaxes and grammars'''


== Data definition (query) sublanguage ==
Consider a definition of chest pain in several syntaxes. Note that the OWL definition is in a form prior to classification whereas the others use the post classified structure (so called inferred)
<div class="toccolours mw-collapsible mw-collapsed">
Chest pain in Manchester syntax, SCG, ECL, OWL FS, IM Json-LD:
<div class="mw-collapsible-content">
<syntaxhighlight lang="turtle" style="border:3px solid grey">
# Definition of Chest pain in owl Manchester Syntax
equivalentTo  sn:298705000 and sn:301366005 and (sn:363698007 sn:51185008)


Data models, and concept definitions and objects are modelled using the Graph paradigm. As a result, all content can be viewed as [[wikipedia:Semantic_triple|semantic triples]] consisting of subject predicate and object.  
#In RDF turtle
sn:29857009
  rdfs:subClassOf
        sn:301366005 ,  
        sn:298705000;
  im:roleGroup [im:groupNumber "1"^^xsd:integer;
  sn:363698007 sn:51185008];
  rdfs:label "Chest pain (finding)" .


A standard language for querying triples ( [https://www.w3.org/TR/sparql11-query/ SPARQL]) exists. This is a very extensive language, albeit less expressive than SQL. However the majority of interoperable health queries can be expressed in a fairly limited subset of SPARQL and therefore a subset of SPARQL is selected as the means of modelling data definitions and query in Discovery.


It should be noted though that actual query is likely to be implemented in SQL and thus an interpreter is needed. However, as a result of the data maps (accessed via the data mapping language), and the restricted subset of SPARQL in use, SQL can be auto- generated from the query language.
# In Snomed compositional grammar
=== 298705000 |Finding of region of thorax (finding)| +
    301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| }


== Data mapping sublanguage ==
# When using ECL to retrieve chest pain
<<298705000 |Finding of region of thorax (finding)| and
    (<<301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| })


This part of the language is used to define mappings between the data model and an actual schema to enable query and filers to automatically cope with the ever extending ontology and data properties.&nbsp;


The language can be used to auto generate starter schemas for implementation i.e. schemas that will then be optimised for real world use.
#When used in OL functional syntax
EquivalentClasses(
:29857009 |Chest pain (finding)|
ObjectIntersectionOf(
:22253000 |Pain (finding)|
ObjectSomeValuesFrom(
:609096000 |Role group (attribute)|
ObjectSomeValuesFrom(
:363698007 |Finding site (attribute)|
:51185008 |Thoracic structure (body structure)|
)
)
)
)
# In Json-LD


the main use case for he mapping sublanguage is data transformation. This uses techniques such as [[wikipedia:Object-relational_mapping|Object relational mapping]] and therefore the transform instructions in the form of maps, follow this approach. There is no single standard for ORM maps but best practice of the kind supported by open source utilities such as Hibernate is followed:
{
  "@id" : "sct:29857009",
  "rdfs:label" : "Chest pain (finding)",
  "im:definitionalStatus" : {"@id" : "im:1251000252106","name" : "Concept definition is sufficient (equivalent status)"},
  "rdfs:subClassOf" : [ {
    "@id" : "sct:301366005",
    "name" : "Pain of truncal structure (finding)"
  }, {
    "@id" : "sct:298705000",
    "name" : "Finding of region of thorax (finding)"
  } ],
  "im:roleGroup" : [ {
    "im:groupNumber" : 1,
    "sct:363698007" : [ {
      "@id" : "sct:51185008",
      "name" : "Thoracic structure (body structure)"
    } ]
  } ]
}
</syntaxhighlight>
</div>
</div> <div class="mw-collapsible-content">&nbsp;</div>


== Attribute based access control language ==
== Internal IM languages for IMAPI usage ==
&nbsp;The standard [[wikipedia:XACML|XACML]] specifies a language that may be used to implement ABAC. XACML includes a set of grammatical concepts such as policy sets, policies, rules, combination rules, targets, obligations, effects and so on with many and variable sophisticated tokens and functions used to build the policy rules. XACML has its own XML syntax that can be used directly.
An implementation of the IM as a terminology server or query library exists.


This language is somewhat disconnected with the other standards in terms of syntax and approach to vocab. Consequently Discovery uses a J[[Discovery ABAC language|SON profile of XACML]] as its ABAC language which itself models the attributes as OWL properties, and uses SPARQL as its rule representation.
This implementation uses the following mainstream languages


&nbsp;
* Java, used as the main logical business end, server side and services the REST APIs used to exchange information with the IM server
* Javscript / TypeScript extension used for business logic that provides UI specific APIs the web applications
 
*[https://www.w3.org/TR/sparql11-query/ SPARQL] Used as the logical means of querying model conformant data (not to be confused with the actual query language used which may be SQL). Used as the query language for the IM and mapped from IM Query Health queries would generally use SQL
*[https://opensearch.org/docs/latest/opensearch/query-dsl/index/ OpenSearch / Elastic.] Used for complex free text query for fining concepts using the AWS OpenSearch DSL (derivative of Lucene Query). Note that simple free text Lucene indexing is supported by the IM database engines and is used in combined graph/text query.
*[[Meta model class specification#Query .2FSet definition|IM Query.]] Not strictly a language but a class definition representing a scheme independent  way of defining sets (query results) including all the main health queries used by clinicians and analysts. 
 
== Grammars and syntaxes ==
 
=== Foundation syntaxes - RDF, TURTLE and JSON-LD ===
Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:
 
* A terse abbreviated language, TURTLE
 
* JSON-LD representation, which can  used by systems that prefer JSON (the majority) , and are able to resolve identifiers via the JSON-LD context structure.
 
'''Identifiers, aliasing  prefixes and context'''
 
Concepts are identified and referenced by the use of International resource identifiers (IRIs).
 
Identifiers are universal and presented in one of the following forms:
 
# Full IRI (International resource identifier) which is the fully resolved identifier encompassed by <>
# Abbreviated IRI  a Prefix followed by a ":" followed by  the local name which is resolved  to a full IRI
#Aliases. The core language tokens (that are themselves concepts) have aliases for ease of use. For example rdfs:subClassOf is aliased to subClassOf,
 
There is of course nothing to stop applications using their own aliases and when used with JSON-LD @context may be used to enable the use of aliases.
 
Data is considered to be linked across the world, which means that IRIs are the main identifiers. However, IRIs can be unwieldy to use and some of the languages such as GRAPH-QL do not use them. Furthermore, when used in JSON, (the main exchange syntax via APIs) they can cause significant bloat. Also, identifiers  such as codes or terms have often been created for local use in local single systems and in isolation are ambiguous.
 
To  create linked data from local identifiers or vocabulary, the concept of Context is applied. The main form of context in use are:
 
# PREFIX declaration for IRIs, which enable the use of abbreviated IRIs. This approach is used in OWL, RDF turtle, SHACL and Discovery itself.
# VOCABULAR CONTEXT declaration for both IRIs and other tokens. This approach is used in JSON-LD which converts local JSON properties and objects into linked data identifiers via the @context keyword. This enables applications that know their context to use simple identifiers such as aliases.
# MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.
 
=== OWL2 and RDFS ===
 
For the purposes of authoring and reasoning  the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL] , which itself is a sublanguage of the [https://www.w3.org/TR/owl2-syntax/ OWL2 language]
 
In addition to the open world assumption of OWL, RDFS constructs of domain and ranges (OWL DL) but are are used in a closed word manner as RDFS.
 
Within an information model instance itself the data relationships are held on their post inferred closed form i.e. inferred properties and relationships are explicitly stated using a normalisation process to eliminate duplications from super types.  In other words, whereas an ontology may be authored using the open world assumption, prior to population of the live IM, classifications and inheritance are resolved. This uses the same approach as followed by Snomed-CT, whereby the inferred relationship containing the inherited properties and the "isa" relationship are included explicitly.
 
In the live IM OWL Axioms are replaced with the RDFS standard terms and simplified. For example OWL existential quantifications are mapped to "role groups" in line with Snomed-CT.
 
'''Use of Annotation properties'''
 
Annotation properties are the properties that provide information beyond that needed for reasoning.&nbsp; They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand.&nbsp;
 
Typical annotation properties are names and descriptions.
{| class="wikitable"
|+
!Owl construct
!usage examples
!'''IM live conversion'''
|-
|Class
|An entity that is a class concept e.g. A snomed-ct concept or a general concept
|rdfs:Class
|-
|ObjectProperty
|'hasSubject' (an observation '''has a subject''' that is a patient)
|rdf:Property
|-
|DataProperty
|'dateOfBirth'  (a patient record has a date of birth attribute
|owl:dataTypeProperty
|-
|annotationProperty
|'description'  (a concept has a description)
|
|-
|SubClassOf
|Patient is a subclass of a Person
|rdfs:subClassOf
|-
|Equivalent To
|Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance)
|rdfs:subClassOf
<br />
|-
|Sub property of
|has responsible practitioner is a subproperty of has responsible agent
|rdfs:subPropertyOf
|-
|Property chain
|is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of'
|owl:Property chain
|-
|Existential quantification ( ObjectSomeValuesFrom)
|Chest pain and
Finding site of  - {some} thoracic structure
|im:roleGroup
|-
|Object Intersection
|Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure
|rdfs:Subclass
 
+
 
role groups
|-
|DataType definition
|Date time  is a restriction on a string with a regex that allows approximate dates
|
|-
|Property domain
|a property domain of has causative agent is allergic reaction
|rdfs:domain
|-
|Property range
|A property range of has causative agent is a substance
|rdfs:range
|}
{| class="wikitable"
|+
!Annotation
!Meaning
|-
|rdfs:label
|The name or term for an entity
|-
|rdfs:comment
|the description of an entity
|-
|
|
|}
 
=== SHACL shapes ===
SHACL is used as a means of specifying the "data model types" of health record entities and also the IM itself as described directly in the [[Information model meta model#Meta model class specification|meta model article]].
 
SHACL is used in its standard form and is not extended.
 
=== OWL extension : data property expressions ===
Within health care, (and in common parlance), data properties are often used as syntactical short cuts to objects with qualifiers  and a literal value element.
 
For example, the data property "Home telephone number" would be expected to simply contain a number. But a home telephone number also has a number of properties by implication, such as the fact that its usage is "home", and has a country and area code.
 
OWL 2 has a known limitation (as described in the OWL specification itself) in respect of data property expressions. OWL2 can only define data property expressions as data property IRIs with annotations.
 
In many health care standards such as HL7 FHIR, these data properties are object properties with the objects having the "value" as one of its properties..
 
For example, in FHIR  the patients home telephone number is carried explicitly as the property contact {property= telecom -> value =  {property use= Home, /property System= coding system,/ value = the actual number } } i.e. 3 ;levels of nesting.
 
Whilst explicit modelling is vital for information exchanged between systems with different data models, if stored in this way, queries would underperform, so the actual systems usually store the home telephone number perhaps in  a field "home telephone"  in the patient table or a simple triple.
 
To resolve the bridge between a complex object definition and simple data property the information model supports data property expressions (but without introducing a new language construct() as follows:
 
# Simple data property against the class e.g. a "contact"
# Patient's home telephone number modelled as a ''sub property'' "homeTelephoneNumber with is a sub property of "telephone number", which is itself a sub property of "contact".
# A standard RDFS  property of the homeTelephone property entity - > "isDefinedBy" which points to a class expression which defines a home telephone number, (itself a subclass of a class expression TelephoneNumber) thus allowing all properties values to be "implicit but defined" as part of the ontology.
 
By this technique subsumption queries that look for home contacts or home telephone numbers or find numbers with US country codes will find the relevant field and the relevant sub pattern of a data property..
 
Implementations would still need to parse numbers to properties if they stored numbers as simple numbers but these would be part of a data model map against the IM models definition.
 
== Information model meta classes ==
See main article [[Information model meta model|Information model meta classes]]
 
Using the above languages this defines the classes used to model all health data.
 
 
 
<br />

Latest revision as of 14:53, 5 January 2023

N.B Not to be confused with the Information model meta model. which specifies the classes that hold the information model data, those classes described using the languages defined below.

This article describes the languages used in the information model meta model. In other words, the underlying grammar and syntax used as the building bricks for the classes that make up the model, instances of those classes being objects that conform to the class properties.

Details on the W3C standard languages that make up the grammar are described below.

In addtion,

If a system can consume RDF in its two main syntaxes (turtle and JSON-LD) then the model can be easily exchanged.

The main advantage of RDF and the W3C standards is that types and properties are given internationally unique identifiers which are both humanly readable and can be resolved via the world wide web protocols.

Thus, in the information model, all classes, properties and value types (subjects and predicates and objects) are IRIs which are defined by ontological techniques.

Contributory languages

Health data can be conceptualised as a graph, and thus the model is a graph model.

As the information model is a graph, and both classes and properties are uniquely identified, RDF is the language used. As the technical community use Json as the main stream syntax for exchanging objects, the preferred syntax for the model classes and properties is JSON-LD, with instances in plain JSON

RDF itself has limited grammar the modelling language uses the main stream semantic web grammars and vocabularies, these being RDFS, OWL and SHACL. Additional vocabularies are added to the IM to accommodate the shortfalls in vocabularies,

In addition the IM accommodates some languages required to use the main health ontology i,e Expression Constraint language and Snomed compositional grammar. Within the IM ECL is modelled as query and Snomed-CT compositional grammar is modelled as a Concept class.

Finally, as a means of bridging the gap between user visualisation of query definitions and the underlying query languages such as SPARQL and SQL, the IM uses a set of classes to model query definitions, using a form that maps directly to SPARQL, SQL, GRAPHQL.

When exchanging models using the language grammar both Json-LD and turtle are supported as well as the more specialised syntaxes such as owl functional syntax or expression constraint language.

The modelling language is an amalgam of the following languages:

  • RDF. An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph. RDF Forms the statements describing the data. RDF in itself holds no semantics whatsoever. i.e. it is not practical to infer or validate or query based purely on an RDF structure. To use RDF it is necessary to provide semantic definitions for certain predicates and adopt certain conventions. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things. RDF can be represented using either TURTLE syntax or JSON-LD.
  • RDFS. This is the first of the semantic languages. It is used for the purposes of some of the ontology axioms such as subclasses, domains and ranges as well as the standard annotation properties such as 'label
  • SHACL. For the data models of types. Used for everything that defines the shape of data or logical entities and attributes. Although SHACL is designed for validation of RDF, as SHACL describes what things 'should be' it can be used as a data modelling language
  • OWL2 DL. This is supported in the authoring phase, but is simplified within the model. This brings with it more sophisticated description logic such as equivalent classes and existential quantifications ,and is used in the ontology and for defining things when an open world assumption is required. This has contributed to the design of the IM languages but OWL is removed in the run time models with class expressions being replaced by RDFS subclass, and role groups.
  • ECL. This is a specialised query language created for Snomed-CT, used for simple concepts modelled as subtypes, role groups and roles, and is of great value in defining sets of concepts for the myriad of business purposes used in health.
  • SCG. Snomed compositional grammar, created for Snomed-CT, which is a concise syntax for representing simple concepts modelled as subtypes. role groups and roles and is a way of displaying concept definitions.


Example multiple syntaxes and grammars

Consider a definition of chest pain in several syntaxes. Note that the OWL definition is in a form prior to classification whereas the others use the post classified structure (so called inferred)

Chest pain in Manchester syntax, SCG, ECL, OWL FS, IM Json-LD:

# Definition of Chest pain in owl Manchester Syntax
 equivalentTo  sn:298705000 and sn:301366005 and (sn:363698007 sn:51185008)

#In RDF turtle
sn:29857009
   rdfs:subClassOf 
         sn:301366005 , 
         sn:298705000;
   im:roleGroup [im:groupNumber "1"^^xsd:integer;
   sn:363698007 sn:51185008];
   rdfs:label "Chest pain (finding)" .


# In Snomed compositional grammar
=== 298705000 |Finding of region of thorax (finding)| + 
    301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| }

# When using ECL to retrieve chest pain
<<298705000 |Finding of region of thorax (finding)| and 
    (<<301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| })


#When used in OL functional syntax
EquivalentClasses(
	:29857009 |Chest pain (finding)|
	ObjectIntersectionOf(
		:22253000 |Pain (finding)|
		ObjectSomeValuesFrom(
			:609096000 |Role group (attribute)|
			ObjectSomeValuesFrom(
				:363698007 |Finding site (attribute)|
				:51185008 |Thoracic structure (body structure)|
			)
		)
	)
)
# In Json-LD

{
  "@id" : "sct:29857009",
  "rdfs:label" : "Chest pain (finding)",
  "im:definitionalStatus" : {"@id" : "im:1251000252106","name" : "Concept definition is sufficient (equivalent status)"},
  "rdfs:subClassOf" : [ {
    "@id" : "sct:301366005",
    "name" : "Pain of truncal structure (finding)"
  }, {
    "@id" : "sct:298705000",
    "name" : "Finding of region of thorax (finding)"
  } ],
  "im:roleGroup" : [ {
    "im:groupNumber" : 1,
    "sct:363698007" : [ {
      "@id" : "sct:51185008",
      "name" : "Thoracic structure (body structure)"
    } ]
  } ]
}
 

Internal IM languages for IMAPI usage

An implementation of the IM as a terminology server or query library exists.

This implementation uses the following mainstream languages

  • Java, used as the main logical business end, server side and services the REST APIs used to exchange information with the IM server
  • Javscript / TypeScript extension used for business logic that provides UI specific APIs the web applications
  • SPARQL Used as the logical means of querying model conformant data (not to be confused with the actual query language used which may be SQL). Used as the query language for the IM and mapped from IM Query Health queries would generally use SQL
  • OpenSearch / Elastic. Used for complex free text query for fining concepts using the AWS OpenSearch DSL (derivative of Lucene Query). Note that simple free text Lucene indexing is supported by the IM database engines and is used in combined graph/text query.
  • IM Query. Not strictly a language but a class definition representing a scheme independent way of defining sets (query results) including all the main health queries used by clinicians and analysts.

Grammars and syntaxes

Foundation syntaxes - RDF, TURTLE and JSON-LD

Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:

  • A terse abbreviated language, TURTLE
  • JSON-LD representation, which can used by systems that prefer JSON (the majority) , and are able to resolve identifiers via the JSON-LD context structure.

Identifiers, aliasing prefixes and context

Concepts are identified and referenced by the use of International resource identifiers (IRIs).

Identifiers are universal and presented in one of the following forms:

  1. Full IRI (International resource identifier) which is the fully resolved identifier encompassed by <>
  2. Abbreviated IRI a Prefix followed by a ":" followed by the local name which is resolved to a full IRI
  3. Aliases. The core language tokens (that are themselves concepts) have aliases for ease of use. For example rdfs:subClassOf is aliased to subClassOf,

There is of course nothing to stop applications using their own aliases and when used with JSON-LD @context may be used to enable the use of aliases.

Data is considered to be linked across the world, which means that IRIs are the main identifiers. However, IRIs can be unwieldy to use and some of the languages such as GRAPH-QL do not use them. Furthermore, when used in JSON, (the main exchange syntax via APIs) they can cause significant bloat. Also, identifiers such as codes or terms have often been created for local use in local single systems and in isolation are ambiguous.

To create linked data from local identifiers or vocabulary, the concept of Context is applied. The main form of context in use are:

  1. PREFIX declaration for IRIs, which enable the use of abbreviated IRIs. This approach is used in OWL, RDF turtle, SHACL and Discovery itself.
  2. VOCABULAR CONTEXT declaration for both IRIs and other tokens. This approach is used in JSON-LD which converts local JSON properties and objects into linked data identifiers via the @context keyword. This enables applications that know their context to use simple identifiers such as aliases.
  3. MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.

OWL2 and RDFS

For the purposes of authoring and reasoning the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile OWL EL , which itself is a sublanguage of the OWL2 language

In addition to the open world assumption of OWL, RDFS constructs of domain and ranges (OWL DL) but are are used in a closed word manner as RDFS.

Within an information model instance itself the data relationships are held on their post inferred closed form i.e. inferred properties and relationships are explicitly stated using a normalisation process to eliminate duplications from super types. In other words, whereas an ontology may be authored using the open world assumption, prior to population of the live IM, classifications and inheritance are resolved. This uses the same approach as followed by Snomed-CT, whereby the inferred relationship containing the inherited properties and the "isa" relationship are included explicitly.

In the live IM OWL Axioms are replaced with the RDFS standard terms and simplified. For example OWL existential quantifications are mapped to "role groups" in line with Snomed-CT.

Use of Annotation properties

Annotation properties are the properties that provide information beyond that needed for reasoning.  They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. 

Typical annotation properties are names and descriptions.

Owl construct usage examples IM live conversion
Class An entity that is a class concept e.g. A snomed-ct concept or a general concept rdfs:Class
ObjectProperty 'hasSubject' (an observation has a subject that is a patient) rdf:Property
DataProperty 'dateOfBirth' (a patient record has a date of birth attribute owl:dataTypeProperty
annotationProperty 'description' (a concept has a description)
SubClassOf Patient is a subclass of a Person rdfs:subClassOf
Equivalent To Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance) rdfs:subClassOf


Sub property of has responsible practitioner is a subproperty of has responsible agent rdfs:subPropertyOf
Property chain is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of' owl:Property chain
Existential quantification ( ObjectSomeValuesFrom) Chest pain and

Finding site of - {some} thoracic structure

im:roleGroup
Object Intersection Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure rdfs:Subclass

+

role groups

DataType definition Date time is a restriction on a string with a regex that allows approximate dates
Property domain a property domain of has causative agent is allergic reaction rdfs:domain
Property range A property range of has causative agent is a substance rdfs:range
Annotation Meaning
rdfs:label The name or term for an entity
rdfs:comment the description of an entity

SHACL shapes

SHACL is used as a means of specifying the "data model types" of health record entities and also the IM itself as described directly in the meta model article.

SHACL is used in its standard form and is not extended.

OWL extension : data property expressions

Within health care, (and in common parlance), data properties are often used as syntactical short cuts to objects with qualifiers and a literal value element.

For example, the data property "Home telephone number" would be expected to simply contain a number. But a home telephone number also has a number of properties by implication, such as the fact that its usage is "home", and has a country and area code.

OWL 2 has a known limitation (as described in the OWL specification itself) in respect of data property expressions. OWL2 can only define data property expressions as data property IRIs with annotations.

In many health care standards such as HL7 FHIR, these data properties are object properties with the objects having the "value" as one of its properties..

For example, in FHIR the patients home telephone number is carried explicitly as the property contact {property= telecom -> value = {property use= Home, /property System= coding system,/ value = the actual number } } i.e. 3 ;levels of nesting.

Whilst explicit modelling is vital for information exchanged between systems with different data models, if stored in this way, queries would underperform, so the actual systems usually store the home telephone number perhaps in a field "home telephone" in the patient table or a simple triple.

To resolve the bridge between a complex object definition and simple data property the information model supports data property expressions (but without introducing a new language construct() as follows:

  1. Simple data property against the class e.g. a "contact"
  2. Patient's home telephone number modelled as a sub property "homeTelephoneNumber with is a sub property of "telephone number", which is itself a sub property of "contact".
  3. A standard RDFS property of the homeTelephone property entity - > "isDefinedBy" which points to a class expression which defines a home telephone number, (itself a subclass of a class expression TelephoneNumber) thus allowing all properties values to be "implicit but defined" as part of the ontology.

By this technique subsumption queries that look for home contacts or home telephone numbers or find numbers with US country codes will find the relevant field and the relevant sub pattern of a data property..

Implementations would still need to parse numbers to properties if they stored numbers as simple numbers but these would be part of a data model map against the IM models definition.

Information model meta classes

See main article Information model meta classes

Using the above languages this defines the classes used to model all health data.