Health Information modelling language - overview: Difference between revisions

From Endeavour Knowledge Base
No edit summary
 
(83 intermediate revisions by the same user not shown)
Line 1: Line 1:
This article describes a language used for modelling data used in health records as well as the means by which health record queries can be defined in a system independent manner.
N.B Not to be confused with the [[Information model meta model|Information model meta model.]] which specifies the classes that hold the information model data, those classes described using the languages defined below.


For those familiar with the semantic web languages, it is safe to assume that the language described herein is simply a dialect (or profile) of the standard W3C recommendations of RDF, RDFS, OWL2, SHACL and SPARQL. The profile is designed to simplify implementations of the model and to constrain it’s features to ensure relevance and optimum performance.
This article describes the languages used in the information model meta model. In other words, the underlying grammar and syntax used as the building bricks for the classes that make up the model, instances of those classes being objects that conform to the class properties.  


The language is designed to support an information model conceived as a graph model and thus implementation of the model and related health data are represented logically as a graph.
Details on the W3C standard languages that make up the grammar are described below.


In addtion,


== Purpose background and rationale ==
If a system can consume RDF in its two main syntaxes (turtle and JSON-LD) then the model can be easily exchanged.
Question: Yet another language? Surely not.


Answer: No, not at all. What can be stated though is that it is not a health domain language. It is a language used commonly across all sectors across the world, applied to the health and social care domain.  
The main advantage of RDF and the W3C standards is that types and properties are given internationally unique identifiers which are both humanly readable and can be resolved via the world wide web protocols.


One of the ambitions of the language is to free the representation of health data from the modelling silos within healthcare itself. This is based on the principle that the only difference between health data and other data is the vocabulary and ontologies used.
Thus, in the information model, all classes,  properties and value types (subjects and predicates and objects) are IRIs which are defined by ontological techniques.


The following sections first describe the purpose of a modelling language, the background to the Endeavour /Discovery approach and the rationale behind the approach adopted.
== Contributory languages ==
Health data can be conceptualised as a graph, and thus the model is a graph model.


Subsequently the sections break down the various aspects of the language at ever increasing granularity, emphasising the relationship between the language fragments and the languages from which they are derived, resulting in the definition of the grammar of the language. The relationship between the language and a model store is also described.
As the information model is a graph, and both classes and properties are uniquely identified, [[wikipedia:Resource_Description_Framework|RDF]] is the language used. As the technical community use Json as the main stream syntax for exchanging objects, the preferred syntax for the model classes and properties is [[wikipedia:JSON-LD|JSON-LD,]] with instances in plain [[wikipedia:JSON|JSON]]


=== Purpose of the language. ===
RDF itself has limited grammar the modelling language uses the main stream semantic web grammars and vocabularies, these being RDFS, OWL and SHACL. Additional vocabularies are added to the IM to accommodate the shortfalls in vocabularies,
The main purpose of a modelling language is to exchange data and information about [[Discovery health information model|information models]] in a way that both machines and humans can understand. A language must be able to describe the types of structures that make up an information model as can be seen on the right. Diagrams and pictures are all very well, but they cannot be used by machines.


It is necessary to support both human and machine readability so that a model can be validated both by humans and computers.  Humans can read diagrams or text. Machines can read data bits. The two forms can be brought together as a stream of characters forming a language.
In addition the IM accommodates some languages required to use the main health ontology i,e Expression Constraint language and Snomed compositional grammar. Within the IM ECL is modelled as query and Snomed-CT compositional grammar is modelled as a Concept class.


A purely human based language would be ambiguous, as all human languages are. A language that is both can be used to promote a shared understanding of often complex structures whilst enabling machines to process data in a consistent way.  
Finally, as a means of bridging the gap between user visualisation of query definitions and the underlying query languages such as SPARQL and SQL, the IM uses a set of classes to model query definitions, using a form that maps directly to SPARQL, SQL, GRAPHQL.


It is almost always the case that a very precise machine readable language is hard for humans to follow and that a human understandable language is hard to compute consistently. As a compromise, many languages are presented in a variety of grammars and syntaxes, each targeted at different readers. The language in this article all adopt a multi-grammar approach in line with this dual purpose.
When exchanging models using the language grammar both Json-LD and turtle are supported as well as the more specialised syntaxes such as owl functional syntax or expression constraint language.  


Multi grammars are the norm in computer languages. Different software implementations use different technologies and different grammars are needed. The crucial point is that the different grammars mean the same thing in the end, just presented in different ways with different syntaxes.
The modelling language is an amalgam of the following languages:


=== Contributory languages ===
* [https://www.w3.org/TR/REC-rdf-syntax/ RDF.] An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph. RDF Forms the statements describing the data. RDF in itself holds no semantics whatsoever. i.e. it is not practical to infer or validate or query based purely on an RDF structure. To use RDF it is necessary to provide semantic definitions for certain predicates and adopt certain conventions. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things. RDF can be represented using either TURTLE syntax or JSON-LD.
An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph.
* [https://www.w3.org/TR/rdf-schema/ RDFS]. This is the first of the semantic languages. It is used for the purposes of some of the ontology axioms such as subclasses, domains and ranges as well as the standard annotation properties such as 'label


The world standard approach to a language that models graphs is RDF, which considers a graph to be a series of interconnected triples, a triple consisting of the language grammar of subject, predicate and object. Thus the modelling language uses RDF as its fundamental basis, and can therefore be presented in the RDF grammars. The common grammars used in this article include TURTLE (terse RDF Triple language) and JSON-LD (json linked data) which enables simple JSON identifiers to be contextualised in a way that one set of terms can map directly to internationally defined terms or IRIs.
*[https://www.w3.org/TR/shacl/ SHACL]. For the data models of types.  Used for everything that defines the shape of data  or logical entities and attributes. Although SHACL is designed for validation of RDF, as SHACL describes what  things 'should be' it can be used as a data modelling language


RDF in itself holds no semantics whatsoever i.e. it is not practical to infer or validate or query based purely on an RDF structure. To use RDF it is necessary to provide semantic definitions for certain predicates and adopt certain conventions. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things.
*[https://www.w3.org/TR/owl2-primer/ OWL2 DL.]  This is supported in the authoring phase, but is simplified within the model. This brings with it more sophisticated description logic such as equivalent classes and existential quantifications ,and is used in the ontology and for defining things when an open world assumption is required. This has contributed to the design of the IM languages but OWL is removed in the run time models with class expressions being replaced by RDFS subclass, and role groups.
*[https://confluence.ihtsdotools.org/display/DOCECL#:~:text=The%20Expression%20Constraint%20Language%20is,either%20precoordinated%20or%20postcoordinated%20expressions. ECL.] This is a specialised query language created for Snomed-CT, used  for simple concepts modelled as subtypes, role groups and roles, and is of great value in defining sets of concepts for the myriad of business purposes used in health.
*[https://confluence.ihtsdotools.org/display/DOCSCG/Compositional+Grammar+-+Specification+and+Guide SCG]. Snomed compositional grammar, created for Snomed-CT, which is a concise syntax for representing simple concepts modelled  as subtypes. role groups and roles and is a way of displaying concept definitions.


The three aspects alluded to above are covered by the logical inclusion of W3C semantic based languages, described further in the sublanguages section of this article.




'''Example  multiple syntaxes and grammars'''


== What the language must do==
Consider a definition of chest pain in several syntaxes. Note that the OWL definition is in a form prior to classification whereas the others use the post classified structure (so called inferred)
Health data can be conceptualised as a graph, and thus the model of health data is a graph model. 
<div class="toccolours mw-collapsible mw-collapsed">
 
Chest pain in Manchester syntax, SCG, ECL, OWL FS, IM Json-LD:
A graph is considered as a set of nodes and edges. For a graph to be valid, a node must have at least one edge, and an edge must be connected to at least two nodes. Thus the smallest practical graph must have at least 3 entities.
<div class="mw-collapsible-content">
 
<syntaxhighlight lang="turtle" style="border:3px solid grey">
The language must be both human and machine readable in text form.
# Definition of Chest pain in owl Manchester Syntax
equivalentTo  sn:298705000 and sn:301366005 and (sn:363698007 sn:51185008)


The language must use the recognisable plain language characters in UTF-8. For human readability the characters read from left to right and for machine readability a graph is a character stream from beginning to end.
#In RDF turtle
sn:29857009
  rdfs:subClassOf
        sn:301366005 ,
        sn:298705000;
  im:roleGroup [im:groupNumber "1"^^xsd:integer;
  sn:363698007 sn:51185008];
  rdfs:label "Chest pain (finding)" .


It is hard to produce a machine readable language that humans can understand.


Consequently two grammars are developed, one for human legibility and the other for optimised machine processing. However, both must be human and machine readable and translators are a prerequisite.
# In Snomed compositional grammar
=== 298705000 |Finding of region of thorax (finding)| +
    301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| }


A model presented in the human legible grammar must be translatable directly to the machine representation without loss of semantics. In the ensuing paragraphs the human optimised grammar is illustrated but in the final language specification both are presented side by side to illustrate semantic translatability.
# When using ECL to retrieve chest pain
<<298705000 |Finding of region of thorax (finding)| and
    (<<301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| })


'''Semantic triples'''


Consider a health record about a person holding the date of birth.
#When used in OL functional syntax
EquivalentClasses(
:29857009 |Chest pain (finding)|
ObjectIntersectionOf(
:22253000 |Pain (finding)|
ObjectSomeValuesFrom(
:609096000 |Role group (attribute)|
ObjectSomeValuesFrom(
:363698007 |Finding site (attribute)|
:51185008 |Thoracic structure (body structure)|
)
)
)
)
# In Json-LD


<nowiki>http://...../dateOfBirth</nowiki> is a property that holds a value that is a date of birth. Subjects and objects may also have identifiers which may or may not be meaningful. As subjects and objects operate as nodes, and nodes require edges, predicates thus assume the role of edges in a graph.
{
 
  "@id" : "sct:29857009",
To make sense of the language, subjects require constructs that include predicate object lists, object lists, and objects which can themselves be subjects defined by predicates. Put together with the terse language requirement, the grammar used in the human oriented language is simplified form of TURTLE. TURTLE itself is directly compatible with all of the Discovery grammars. The following snip illustrates the main TURTLE structures together with the punctuation:
  "rdfs:label" : "Chest pain (finding)",
<pre style="background-color:#fcfaee">
  "im:definitionalStatus" : {"@id" : "im:1251000252106","name" : "Concept definition is sufficient (equivalent status)"},
Subject1
  "rdfs:subClassOf" : [ {
  Predicate1 Object1;
    "@id" : "sct:301366005",
  Predicate2 Object2;                # predicate object list separated by ';'
    "name" : "Pain of truncal structure (finding)"
  Predicate3 Object3,  
  }, {
              Object4,  
    "@id" : "sct:298705000",
              Object5;              #object list
    "name" : "Finding of region of thorax (finding)"
  Predicate4 [                        #anonymous object with predicates enclosed []
  } ],
                Predicate5 Object6;
  "im:roleGroup" : [ {
                Predicate6 Object7
    "im:groupNumber" : 1,
              ]
    "sct:363698007" : [ {
.                                      # end of sentence
      "@id" : "sct:51185008",
 
      "name" : "Thoracic structure (body structure)"
Subject2 Predicate1 Object1.            #Simple triple                   
    } ]
</pre>
  } ]
 
}
 
'''Semantic context'''
 
The interpretation of a structure is often dependent on the preceding predicate. Because the language is semantically constrained to the profiles of the sublanguages, certain punctuations can be semantically interpreted in context. For example, as the semantic axioms incorporates OWL EL but not OWL DL, Object Intersection (and) is supported but not Object union. In other words, in certain contexts there are ANDS but not ORS.
 
This allows for the use of a collection construct, for example in the following equivalent definitions of a grandfather, in the first example the grandfather is an equivalent to an intersection of a person and someone who is male and has children, and the second one is an intersection of a person, something that is male, and someone that has children. Both interpretations assume AND as the operator because OR is not supported at this point in  OWL EL.
<syntaxhighlight lang="turtle" style="border:3px solid grey">
Grandfather
  isEquivalentTo
              Person,                             #class 1
              [ hasGender Male;                  #class 2
                hasChild Person,                 #class 2.1
                          [hasChild Person] ]     #class 2.2 
.
</syntaxhighlight>
</syntaxhighlight>
</div>
</div> <div class="mw-collapsible-content">&nbsp;</div>


Which can also be illustrated as:
== Internal IM languages for IMAPI usage ==
An implementation of the IM as a terminology server or query library exists.


<syntaxhighlight lang="turtle" style="border:3px solid grey">
This implementation uses the following mainstream languages


or Grandfather
* Java, used as the main logical business end, server side and services the REST APIs used to exchange information with the IM server
  isEquivalentTo Person,                            #class 1
* Javscript / TypeScript extension used for business logic that provides UI specific APIs the web applications
                  hasGender Male,                   #class  2
                  hasChild Person,                 #class 3
                            [hasChild Person]    #class 4
].
</syntaxhighlight>


=== Machine oriented grammar ===
*[https://www.w3.org/TR/sparql11-query/ SPARQL] Used as the logical means of querying model conformant data (not to be confused with the actual query language used which may be SQL). Used as the query language for the IM and mapped from IM Query Health queries would generally use SQL
JSON is a popular syntax currently and thus this is used as an alternative.
*[https://opensearch.org/docs/latest/opensearch/query-dsl/index/ OpenSearch / Elastic.] Used for complex free text query for fining concepts using the AWS OpenSearch DSL (derivative of Lucene Query). Note that simple free text Lucene indexing is supported by the IM database engines and is used in combined graph/text query.
*[[Meta model class specification#Query .2FSet definition|IM Query.]] Not strictly a language but a class definition representing a scheme independent  way of defining sets (query results) including all the main health queries used by clinicians and analysts.


JSON represents subjects , predicates and objects as object names and values with values being either literals or or objects.
== Grammars and syntaxes ==


JSON itself has no inherent mechanism of differentiating between different types of entities and therefor JSON-LD is used. In JSON-LD identifiers resolve initially to @id and the use of @context enables prefixed IRIs and aliases.
=== Foundation syntaxes - RDF, TURTLE and JSON-LD ===
 
The above  Grandfather can be represented in JSON-LD (context not shown) as follows:<syntaxhighlight lang="json-ld" style="border:3px solid grey">
{"@id" : "Grandfather",
"EquivalentTo" :[{ "@id":"Person"},
                  {"hasGender": {"@id":"Male"}},
                  {"hasChild": [{"@id":"Person"},
                                {"hasChild" : {"@id":"Person"}}]]}}
</syntaxhighlight>Which is equivalent to the verbose version 2 syntax showing explicit intersections and classes defined only by object property<syntaxhighlight lang="json" line="1" style="border:3px solid grey">
{"iri" : "Grandfather",
"EquivalentTo" :[
    {"Intersection":[
      { "Class": {"iri":"Person"}},
      {"ObjectPropertyValue": {
          "Property": {"iri":"hasGender"},
            "ValueType": {"iri":"Male"}
            }},
      {"ObjectPropertyValue": {
          "Property":{"iri":"hasChild"},
            "Expression":{
                "Intersection":[
                  {"Class":{"iri":"Person"}},
                  {"ObjectPropertyValue":{
                      "Property":{"iri": "hasChild"},
                      "ValueType": {"iri":"Person"}} ] } } ]}
</syntaxhighlight><br />
 
== Language sublanguage grammar and syntax ==
 
=== Sub languages ===
The Discovery language, as a mixed language, has its own grammars , but in addition a number of W3C recommended languages can be used in their respective grammars and syntaxes for the elements of interest.  This enables multiple levels of interoperability. For example, the information model contains an OWL2 ontology and can therefore be accessed via any of the OWL2 syntaxes. The information model also contains data models in the form of shapes and can therefore be accessed via SHACL. Queries can be exchanged via SPARQL and expression constraints by ECL.
 
For those who want a consistent syntax encompassing the entire model then Discovery grammar can be used and this is supported via a simplified TURTle, TURTLE itself and JSON-LD as well as JSON serialised java classes that reflect some standard object oriented objects.
 
Many specialised sublanguages overlap with others in a way that makes them difficult to translate to each other.  For example, in ECL, the reserved word MINUS (used to exclude certain subclasses from a superclass) , ''could'' be mapped to the much more obscure OWL2 syntax that requires the modelling of class IRIs "punned" as individual IRIs, in order to properly exclude instances when generating lists of concepts. Likewise it could be mapped to SPARQL. The Discovery information model helps resolve this problem by supporting the sub languages directly.
 
The following sublanguages are supported in TURTLE, JSON-LD or OWL Functional syntax:
 
*[https://www.w3.org/TR/owl2-primer/ OWL2], which is used for semantic definition and inference. In line with convention, only OWL2 EL is used and thus existential quantification and object intersection can be assumed in its treating of class expressions and axioms.  The open world assumption inherent in OWL means it is very powerful for [[Subsumption test|subsumption]] testing but cannot be used for constraints without abuse of the grammar.
 
* [https://www.w3.org/TR/shacl/ SHACL,] which is used for data modelling constraint definitions. SHACL can also include OWL constructs but its main emphasis is on cardinality and value constraints. It is an ideal approach for defining logical schemas, and because SHACL uses IRIs and shares conventions with other W3C recommended languages it can be integrated with the other two aspects. Furthermore, as some validation rules require quite advanced processing SHACL can also include query fragments.
 
* [https://www.w3.org/TR/sparql11-query/ SPARQ]L, [https://graphql.org/ GRAPHQL] are both used for query. SPARQL forms the basis of interoperable query. GRAPH QL, when presented in JSON-LD is a pragmatic approach to extracting graph results via APIs as its type and directive systems enables  properties to operating as functions or methods.  SPARQL is a more standard W3C query language for graphs but suffers from its own in built flexibility (and an ambiguous issue with subqueries)  making it hard to produce consistent results. Consequently a pragmatic SPARQL profile is supported. The degree of SPARQL is included to the extent that it can be easily interpreted into SQL or other query languages. SPARQL with entailment regimes are in effect SPARQL query with OWL support.
*[https://www.w3.org/TR/REC-rdf-syntax/ RDF] itself. RDF triples can be used to hold objects themselves and an information model will hold many objects which are instances of the classes as defined above (e.g. value sets and other instances)
 
The information modelling services used by Discovery can interoperate using the above sub-languages, but Discovery also includes a language superset making it easy to integrate. For example it is easy to mix OWL axioms with data model shape constraints as well as value sets  without forcing a misinterpretation of axioms.
 
=== Foundation grammars and syntaxes ===
Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:
Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:


* A terse abbreviated language, TURTLE
* A terse abbreviated language, TURTLE


* SPARQL for query
* JSON-LD representation, which can  used by systems that prefer JSON (the majority) , and are able to resolve identifiers via the JSON-LD context structure.


* JSON-LD representation, which can  used by systems that prefer JSON, wish to use standard approaches,  and are able to resolve identifiers via the JSON-LD context structure.
'''Identifiers, aliasing prefixes and context'''
*Proprietary JSON based object serializable grammar. This directly maps to the internal class structures used in Discovery and can be used by client applications that have a strong contract with a server.


=== Identifiers aliasing and context ===
Concepts are identified and referenced by the use of International resource identifiers (IRIs).  
Concepts are identified and referenced by the use of International resource identifiers (IRIs).  


Line 186: Line 150:
# MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.
# MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.


The following is an example of the use of the prefix directives for IRIs and to define aliases for some of the owl and rdfs tokens <pre style="background-color:#fcfaee">
=== OWL2 and RDFS ===
@prefix :  <http://www.DiscoveryDataService.org/InformationModel/Ontology#>.
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh:  <http://www.w3.org/ns/shacl#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#>.
@prefix sn: <ttp://snomed.info/sct#>.


For the purposes of authoring and reasoning  the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL] , which itself is a sublanguage of the [https://www.w3.org/TR/owl2-syntax/ OWL2 language]


rdfs:subClassOf :alias ("subClassOf").
In addition to the open world assumption of OWL, RDFS constructs of domain and ranges (OWL DL) but are are used in a closed word manner as RDFS.


owl:Class :alias ("Class").
Within an information model instance itself the data relationships are held on their post inferred closed form i.e. inferred properties and relationships are explicitly stated using a normalisation process to eliminate duplications from super types. In other words, whereas an ontology may be authored using the open world assumption, prior to population of the live IM, classifications and inheritance are resolved. This uses the same approach as followed by Snomed-CT, whereby the inferred relationship containing the inherited properties and the "isa" relationship are included explicitly.


:alias
In the live IM OWL Axioms are replaced with the RDFS standard terms and simplified. For example OWL existential quantifications are mapped to "role groups" in line with Snomed-CT.
  a owl:DatatypeProperty;
  <http://www.DiscoveryDataService.org/InformationModel/Ontology#alias>
  ("alias").


rdfs:label
'''Use of Annotation properties'''
    a owl:annotationProperty;
    :alias ("name" "label") .


owl:DataTypeProperty
Annotation properties are the properties that provide information beyond that needed for reasoning.&nbsp; They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand.&nbsp;
    a owl:DataTypeProperty;
    :alias ("dataProperty") .
</pre>As the aliases are defined they can now be used in an abbreviated Turtle syntax, as well as the standard turtle syntax. For example, the following Snomed-CT code is defined as a class and a subclass of another Snomed-CT code. <pre style="background-color:#fcfaee">
sn:407708003
a Class;
name : "Sample appearance (observable entity)";
subClassOf sn:407708003.
</pre>


A JSON-LD equivalent of the above uses context to cover prefixes and aliases in a simpler manner <syntaxhighlight lang="jsonld">
Typical annotation properties are names and descriptions.
"@context" : {
 
    "alias" : {
      "@id" : "http://www.DiscoveryDataService.org/InformationModel/Ontology#alias",
      "@type" : "@id"
    },
    "@base" : "http://www.DiscoveryDataService.org/InformationModel/Ontology#",
    "" : "http://www.DiscoveryDataService.org/InformationModel/Ontology#",
    "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "sh" : "http://www.w3.org/ns/shacl#",
    "owl" : "http://www.w3.org/2002/07/owl#",
    "xsd" : "http://www.w3.org/2001/XMLSchema#",
    "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
  }
</syntaxhighlight>Resulting in the standard JSON-ld context based approach: <syntaxhighlight lang="jsonld">
{"@id" : "sn:12345",
    "@type" : "owl:Class",
    "subClassOf" : "sn:34568"
  }
</syntaxhighlight>
 
=== Ontology structures and vocabulary ===
 
For the purposes of reasoning  the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL], which itself is a sublanguage of the [https://www.w3.org/TR/owl2-syntax/ OWL2 language]
 
However, in addition some standard OWL2 DL axioms are used in order to provide a means of specifying additional relationships that are of value when defining relationships. The following table lists the main owl  types used and example for each.  Note that their aliases are used for brevity. Please refer to the OWL2 specification to describe their meanings
{| class="wikitable"
{| class="wikitable"
|+
|+
!Owl construct
!Owl construct
!usage examples
!usage examples
!'''IM live conversion'''
|-
|-
|Class
|Class
|An entity that is a class concept e.g. A snomed-ct concept or a general concept
|An entity that is a class concept e.g. A snomed-ct concept or a general concept
|rdfs:Class
|-
|-
|ObjectProperty
|ObjectProperty
|'hasSubject' (an observation '''has a subject''' that is a patient)
|'hasSubject' (an observation '''has a subject''' that is a patient)
|rdf:Property
|-
|-
|DataProperty
|DataProperty
|'dateOfBirth'  (a patient record has a date of birth attribute
|'dateOfBirth'  (a patient record has a date of birth attribute
|owl:dataTypeProperty
|-
|-
|annotationProperty
|annotationProperty
|'description'  (a concept has a description)
|'description'  (a concept has a description)
|
|-
|-
|SubClassOf
|SubClassOf
|Patient is a subclass of a Person
|Patient is a subclass of a Person
|rdfs:subClassOf
|-
|-
|Equivalent To
|Equivalent To
|Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance)
|Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance)
|rdfs:subClassOf
<br />
|-
|-
|Disjoint with
|Sub property of
|Father is disjoint with Mother
|-
|Sub property of  
|has responsible practitioner is a subproperty of has responsible agent
|has responsible practitioner is a subproperty of has responsible agent
|rdfs:subPropertyOf
|-
|-
|Property chain  
|Property chain
|is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of'
|is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of'
|owl:Property chain
|-
|-
|Inverse property
|Existential quantification ( ObjectSomeValuesFrom)
|is subject of is inverse of has subject
|-
|Transitive property
|is child of is transitive
|-
|Existential quantification
|Chest pain and
|Chest pain and
Finding site of  - {some} thoracic structure
Finding site of  - {some} thoracic structure
|im:roleGroup
|-
|-
|Object Intersection
|Object Intersection
|Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure
|Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure
|-
|rdfs:Subclass
|Individual
 
|All chest pain subclasses but not the specific i''nstance of acute chest pain''
+
 
role groups
|-
|-
|DataType definition
|DataType definition
|Date time  is a restriction on a string with a regex that allows approximate dates
|Date time  is a restriction on a string with a regex that allows approximate dates
|
|-
|-
|Property domain
|Property domain
|a property domain of has causative agent is allergic reaction
|a property domain of has causative agent is allergic reaction
|rdfs:domain
|-
|-
|Property range
|Property range
|A property range of has causative agent is a substance
|A property range of has causative agent is a substance
|rdfs:range
|}
|}
<br />
{| class="wikitable"
=== Use of Annotation properties for original codes===
|+
Annotation properties are the properties that provide information beyond that needed for reasoning.&nbsp; They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. Annotation properties can also be used for implementation supporting properties such as release status, version control, authoring dates and times and so on.&nbsp;
!Annotation
 
!Meaning
Typical annotation properties are names and descriptions. They are also used as meta data such as a status of a concept or the version of a document.
 
Many concepts are derived directly from source systems that used them as codes, or even free text.
 
The concept indicates the source and original code or text (or combination) in the form actually entered into the source system. It should be noted that many systems do not record codes exactly as determined by an official classification or provide codes via mappings from an internal id.&nbsp; It is the codes or text used from the publishers perspective that&nbsp; is used as the source.
 
Thus in many cases, it is convenient to auto generate a code, which is then placed as the value of the “code” property in the concept, together with the scheme. From this, the provenance of the code can be inferred.
 
Each code must have a scheme. A scheme may be an official scheme or&nbsp; proprietary scheme or a local scheme related to a particular sub system.
 
For example, here are some scheme/ code combinations
{| class="MsoTableGrid"
|-
|-
| width="109" |<span><span>Scheme</span></span>
|rdfs:label
| width="316" |<span><span>Original Code/Text/Context</span></span>
|The name or term for an entity
| width="106" |<span><span>Concept code/ Auto code</span></span>
| width="224" |<span><span>Meaning</span></span>
|-
|-
| width="109" |<span><span>Snomed-CT&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></span>
|rdfs:comment
| width="316" |<span><span>47032000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></span>
|the description of an entity
| width="106" |<span><span>47032000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></span>
| width="224" |<span><span>Primary hydrocephaly</span></span>
|-
|-
| width="109" |<span><span>EMIS- Read</span></span>
|
| width="316" |<span><span>H33-1&nbsp;&nbsp;</span></span>
|
| width="106" |<span><span>H33-1&nbsp;&nbsp;</span></span>
|}
| width="224" |<span><span>Bronchial asthma</span></span>
|-
| width="109" |<span><span>EMIS – EMIS</span></span>
| width="316" |<span><span>EMISNQCO303</span></span>
| width="106" |<span><span>EMLOC_EMISNQCO303</span></span>
| width="224" |<span><span>Confirmed corona virus infection</span></span>
|-
| width="109" |<span><span>Barts/Cerner</span></span>
| width="316" |<span><span>Event/Order=687309281</span></span>
| width="106" |<span><span>BC_687309281</span></span>
| width="224" |<span><span>Tested for Coronavirus (misuse of code term in context)</span></span>
|-
| width="109" |<span><span>Barts/Cerner</span></span>
| width="316" |<span><span>Event/Order= 687309281/ResultTxt= SARS-CoV-2 RNA DETECTED</span></span>
| width="106" |<span><span>BC_dsdsdsdx7</span></span>
| width="224" |<span><span>Positive coronavirus result</span></span>


&nbsp;
=== SHACL shapes ===
|}Note that in the last example, the original code is actually text and has been contextualised as being from the Cerner event table, the order field having a value of 687309281 and the result text having a value of ResultTxt= SARS-CoV-2 RNA DETECTED
SHACL is used as a means of specifying the "data model types" of health record entities and also the IM itself as described directly in the [[Information model meta model#Meta model class specification|meta model article]].


=== Shape structures and vocabulary  ===
SHACL is used in its standard form and is not extended.
As in the semantic ontology, the language borrows the constructs from the W3C standard SHACL, which can also be represented in any of the RDF supporting languages such as TURTLE or JSON-LD.


=== OWL extension : data property expressions ===
Within health care, (and in common parlance), data properties are often used as syntactical short cuts to objects with qualifiers  and a literal value element.


=== Data mapping ===
For example, the data property "Home telephone number" would be expected to simply contain a number. But a home telephone number also has a number of properties by implication, such as the fact that its usage is "home", and has a country and area code.


This part of the language is used to define mappings between the data model and an actual schema to enable query and filers to automatically cope with the ever extending ontology and data properties.&nbsp;
OWL 2 has a known limitation (as described in the OWL specification itself) in respect of data property expressions. OWL2 can only define data property expressions as data property IRIs with annotations.  


This is part of the semantic ontology but uses the idea of context (described later on).
In many health care standards such as HL7 FHIR, these data properties are object properties with the objects having the "value" as one of its properties..


=== Query ===
For example, in FHIR  the patients home telephone number is carried explicitly as the property contact {property= telecom -> value =   {property use= Home, /property System= coding system,/ value = the actual number } } i.e. 3 ;levels of nesting.
It is fair to say that data modelling and semantic ontology is useless without the means of query.


The current approach to the specification of query uses the GRAPHQL approach with type extensions and directive extensions.
Whilst explicit modelling is vital for information exchanged between systems with different data models, if stored in this way, queries would underperform, so the actual systems usually store the home telephone number perhaps in  a field "home telephone"  in the patient table or a simple triple.


Graph QL , (despite its name) is not in itself a query language but a way of representing the graph like structure of a underlying model that has been built using OWL. GRAPH QL has a very simple class property representation, is ideal for REST APIs and results are JSON objects in line with the approach taken by the above Discovery syntax.
To resolve the bridge between a complex object definition and simple data property the information model supports data property expressions (but without introducing a new language construct() as follows:


Nevertheless, GRAPHQL considers properties to be functions (high order logic) and therefore properties can accept parameters. For example, a patient's average systolic blood pressure reading could be considered a property with a single parameter being a list of the last 3 blood pressure readings. Parameters are types and types can be created and extended.
# Simple data property against the class e.g. a "contact"
# Patient's home telephone number modelled as a ''sub property'' "homeTelephoneNumber with is a sub property of "telephone number", which is itself a sub property of "contact".
# A standard RDFS  property of the homeTelephone property entity - > "isDefinedBy" which points to a class expression which defines a home telephone number, (itself a subclass of a class expression TelephoneNumber) thus allowing all properties values to be "implicit but defined" as part of the ontology.


In addition GRAPHQL supports the idea of extensions of directives which further extend the grammar.
By this technique subsumption queries that look for home contacts or home telephone numbers or find numbers with US country codes will find the relevant field and the relevant sub pattern of a data property..


Thus GRAPHQL capability is extended by enabling property parameters as types to support such things as filtering, sorting and limiting in the same way as an.y other query language by modelling types passed as parameters. Subqueries are then supported in the same way.  
Implementations would still need to parse numbers to properties if they stored numbers as simple numbers but these would be part of a data model map against the IM models definition.


GRAPHQL itself is used when the enquirer is familiar with the local logical schema i.e. understands the available types and fields. In order to support semantic web concepts an extension to GRAPHQL, GRAPHQL-LD is used, which is essentially GRAPH-QL with JSON-LD context.
== Information model meta classes ==
See main article [[Information model meta model|Information model meta classes]]


GRAPH QL-LD  has been chosen over SPARQL for reasons of simplicity and many now consider GRAPHQL to be a de-facto standard. However, this is an ongoing consideration.
Using the above languages this defines the classes used to model all health data.


=== ABAC language ===
''Main article : [[Attribute based access control|ABAC Language]]''


The Discovery attribute based access control language is presented as a pragmatic JSON based profile of the XACML language, modified to use the information model query language (SPARQL) to define policy rules. ABAC attributes are defined in the semantic ontology in the same way as all other classes and properties.


The language is used to support some of the data access authorisation processes as described in the specification - [[Identity Authentication Authorisation|Identity, authentication and authorisation]] .
<br />
 
This article specifies the scope of the language , the grammar and the syntax, together with examples. Whilst presented as a JSON syntax, in line with other components of the information modelling language, the syntax can also be accessed via the ABAC xml schema which includes the baseline Information model XSD schema on the Endeavour GitHub, and example content viewed in the information manager data files folder<br />

Latest revision as of 14:53, 5 January 2023

N.B Not to be confused with the Information model meta model. which specifies the classes that hold the information model data, those classes described using the languages defined below.

This article describes the languages used in the information model meta model. In other words, the underlying grammar and syntax used as the building bricks for the classes that make up the model, instances of those classes being objects that conform to the class properties.

Details on the W3C standard languages that make up the grammar are described below.

In addtion,

If a system can consume RDF in its two main syntaxes (turtle and JSON-LD) then the model can be easily exchanged.

The main advantage of RDF and the W3C standards is that types and properties are given internationally unique identifiers which are both humanly readable and can be resolved via the world wide web protocols.

Thus, in the information model, all classes, properties and value types (subjects and predicates and objects) are IRIs which are defined by ontological techniques.

Contributory languages

Health data can be conceptualised as a graph, and thus the model is a graph model.

As the information model is a graph, and both classes and properties are uniquely identified, RDF is the language used. As the technical community use Json as the main stream syntax for exchanging objects, the preferred syntax for the model classes and properties is JSON-LD, with instances in plain JSON

RDF itself has limited grammar the modelling language uses the main stream semantic web grammars and vocabularies, these being RDFS, OWL and SHACL. Additional vocabularies are added to the IM to accommodate the shortfalls in vocabularies,

In addition the IM accommodates some languages required to use the main health ontology i,e Expression Constraint language and Snomed compositional grammar. Within the IM ECL is modelled as query and Snomed-CT compositional grammar is modelled as a Concept class.

Finally, as a means of bridging the gap between user visualisation of query definitions and the underlying query languages such as SPARQL and SQL, the IM uses a set of classes to model query definitions, using a form that maps directly to SPARQL, SQL, GRAPHQL.

When exchanging models using the language grammar both Json-LD and turtle are supported as well as the more specialised syntaxes such as owl functional syntax or expression constraint language.

The modelling language is an amalgam of the following languages:

  • RDF. An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph. RDF Forms the statements describing the data. RDF in itself holds no semantics whatsoever. i.e. it is not practical to infer or validate or query based purely on an RDF structure. To use RDF it is necessary to provide semantic definitions for certain predicates and adopt certain conventions. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things. RDF can be represented using either TURTLE syntax or JSON-LD.
  • RDFS. This is the first of the semantic languages. It is used for the purposes of some of the ontology axioms such as subclasses, domains and ranges as well as the standard annotation properties such as 'label
  • SHACL. For the data models of types. Used for everything that defines the shape of data or logical entities and attributes. Although SHACL is designed for validation of RDF, as SHACL describes what things 'should be' it can be used as a data modelling language
  • OWL2 DL. This is supported in the authoring phase, but is simplified within the model. This brings with it more sophisticated description logic such as equivalent classes and existential quantifications ,and is used in the ontology and for defining things when an open world assumption is required. This has contributed to the design of the IM languages but OWL is removed in the run time models with class expressions being replaced by RDFS subclass, and role groups.
  • ECL. This is a specialised query language created for Snomed-CT, used for simple concepts modelled as subtypes, role groups and roles, and is of great value in defining sets of concepts for the myriad of business purposes used in health.
  • SCG. Snomed compositional grammar, created for Snomed-CT, which is a concise syntax for representing simple concepts modelled as subtypes. role groups and roles and is a way of displaying concept definitions.


Example multiple syntaxes and grammars

Consider a definition of chest pain in several syntaxes. Note that the OWL definition is in a form prior to classification whereas the others use the post classified structure (so called inferred)

Chest pain in Manchester syntax, SCG, ECL, OWL FS, IM Json-LD:

# Definition of Chest pain in owl Manchester Syntax
 equivalentTo  sn:298705000 and sn:301366005 and (sn:363698007 sn:51185008)

#In RDF turtle
sn:29857009
   rdfs:subClassOf 
         sn:301366005 , 
         sn:298705000;
   im:roleGroup [im:groupNumber "1"^^xsd:integer;
   sn:363698007 sn:51185008];
   rdfs:label "Chest pain (finding)" .


# In Snomed compositional grammar
=== 298705000 |Finding of region of thorax (finding)| + 
    301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| }

# When using ECL to retrieve chest pain
<<298705000 |Finding of region of thorax (finding)| and 
    (<<301366005 |Pain of truncal structure (finding)| :
            { 363698007 |Finding site (attribute)| = 51185008 |Thoracic structure (body structure)| })


#When used in OL functional syntax
EquivalentClasses(
	:29857009 |Chest pain (finding)|
	ObjectIntersectionOf(
		:22253000 |Pain (finding)|
		ObjectSomeValuesFrom(
			:609096000 |Role group (attribute)|
			ObjectSomeValuesFrom(
				:363698007 |Finding site (attribute)|
				:51185008 |Thoracic structure (body structure)|
			)
		)
	)
)
# In Json-LD

{
  "@id" : "sct:29857009",
  "rdfs:label" : "Chest pain (finding)",
  "im:definitionalStatus" : {"@id" : "im:1251000252106","name" : "Concept definition is sufficient (equivalent status)"},
  "rdfs:subClassOf" : [ {
    "@id" : "sct:301366005",
    "name" : "Pain of truncal structure (finding)"
  }, {
    "@id" : "sct:298705000",
    "name" : "Finding of region of thorax (finding)"
  } ],
  "im:roleGroup" : [ {
    "im:groupNumber" : 1,
    "sct:363698007" : [ {
      "@id" : "sct:51185008",
      "name" : "Thoracic structure (body structure)"
    } ]
  } ]
}
 

Internal IM languages for IMAPI usage

An implementation of the IM as a terminology server or query library exists.

This implementation uses the following mainstream languages

  • Java, used as the main logical business end, server side and services the REST APIs used to exchange information with the IM server
  • Javscript / TypeScript extension used for business logic that provides UI specific APIs the web applications
  • SPARQL Used as the logical means of querying model conformant data (not to be confused with the actual query language used which may be SQL). Used as the query language for the IM and mapped from IM Query Health queries would generally use SQL
  • OpenSearch / Elastic. Used for complex free text query for fining concepts using the AWS OpenSearch DSL (derivative of Lucene Query). Note that simple free text Lucene indexing is supported by the IM database engines and is used in combined graph/text query.
  • IM Query. Not strictly a language but a class definition representing a scheme independent way of defining sets (query results) including all the main health queries used by clinicians and analysts.

Grammars and syntaxes

Foundation syntaxes - RDF, TURTLE and JSON-LD

Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:

  • A terse abbreviated language, TURTLE
  • JSON-LD representation, which can used by systems that prefer JSON (the majority) , and are able to resolve identifiers via the JSON-LD context structure.

Identifiers, aliasing prefixes and context

Concepts are identified and referenced by the use of International resource identifiers (IRIs).

Identifiers are universal and presented in one of the following forms:

  1. Full IRI (International resource identifier) which is the fully resolved identifier encompassed by <>
  2. Abbreviated IRI a Prefix followed by a ":" followed by the local name which is resolved to a full IRI
  3. Aliases. The core language tokens (that are themselves concepts) have aliases for ease of use. For example rdfs:subClassOf is aliased to subClassOf,

There is of course nothing to stop applications using their own aliases and when used with JSON-LD @context may be used to enable the use of aliases.

Data is considered to be linked across the world, which means that IRIs are the main identifiers. However, IRIs can be unwieldy to use and some of the languages such as GRAPH-QL do not use them. Furthermore, when used in JSON, (the main exchange syntax via APIs) they can cause significant bloat. Also, identifiers such as codes or terms have often been created for local use in local single systems and in isolation are ambiguous.

To create linked data from local identifiers or vocabulary, the concept of Context is applied. The main form of context in use are:

  1. PREFIX declaration for IRIs, which enable the use of abbreviated IRIs. This approach is used in OWL, RDF turtle, SHACL and Discovery itself.
  2. VOCABULAR CONTEXT declaration for both IRIs and other tokens. This approach is used in JSON-LD which converts local JSON properties and objects into linked data identifiers via the @context keyword. This enables applications that know their context to use simple identifiers such as aliases.
  3. MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.

OWL2 and RDFS

For the purposes of authoring and reasoning the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile OWL EL , which itself is a sublanguage of the OWL2 language

In addition to the open world assumption of OWL, RDFS constructs of domain and ranges (OWL DL) but are are used in a closed word manner as RDFS.

Within an information model instance itself the data relationships are held on their post inferred closed form i.e. inferred properties and relationships are explicitly stated using a normalisation process to eliminate duplications from super types. In other words, whereas an ontology may be authored using the open world assumption, prior to population of the live IM, classifications and inheritance are resolved. This uses the same approach as followed by Snomed-CT, whereby the inferred relationship containing the inherited properties and the "isa" relationship are included explicitly.

In the live IM OWL Axioms are replaced with the RDFS standard terms and simplified. For example OWL existential quantifications are mapped to "role groups" in line with Snomed-CT.

Use of Annotation properties

Annotation properties are the properties that provide information beyond that needed for reasoning.  They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. 

Typical annotation properties are names and descriptions.

Owl construct usage examples IM live conversion
Class An entity that is a class concept e.g. A snomed-ct concept or a general concept rdfs:Class
ObjectProperty 'hasSubject' (an observation has a subject that is a patient) rdf:Property
DataProperty 'dateOfBirth' (a patient record has a date of birth attribute owl:dataTypeProperty
annotationProperty 'description' (a concept has a description)
SubClassOf Patient is a subclass of a Person rdfs:subClassOf
Equivalent To Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance) rdfs:subClassOf


Sub property of has responsible practitioner is a subproperty of has responsible agent rdfs:subPropertyOf
Property chain is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of' owl:Property chain
Existential quantification ( ObjectSomeValuesFrom) Chest pain and

Finding site of - {some} thoracic structure

im:roleGroup
Object Intersection Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure rdfs:Subclass

+

role groups

DataType definition Date time is a restriction on a string with a regex that allows approximate dates
Property domain a property domain of has causative agent is allergic reaction rdfs:domain
Property range A property range of has causative agent is a substance rdfs:range
Annotation Meaning
rdfs:label The name or term for an entity
rdfs:comment the description of an entity

SHACL shapes

SHACL is used as a means of specifying the "data model types" of health record entities and also the IM itself as described directly in the meta model article.

SHACL is used in its standard form and is not extended.

OWL extension : data property expressions

Within health care, (and in common parlance), data properties are often used as syntactical short cuts to objects with qualifiers and a literal value element.

For example, the data property "Home telephone number" would be expected to simply contain a number. But a home telephone number also has a number of properties by implication, such as the fact that its usage is "home", and has a country and area code.

OWL 2 has a known limitation (as described in the OWL specification itself) in respect of data property expressions. OWL2 can only define data property expressions as data property IRIs with annotations.

In many health care standards such as HL7 FHIR, these data properties are object properties with the objects having the "value" as one of its properties..

For example, in FHIR the patients home telephone number is carried explicitly as the property contact {property= telecom -> value = {property use= Home, /property System= coding system,/ value = the actual number } } i.e. 3 ;levels of nesting.

Whilst explicit modelling is vital for information exchanged between systems with different data models, if stored in this way, queries would underperform, so the actual systems usually store the home telephone number perhaps in a field "home telephone" in the patient table or a simple triple.

To resolve the bridge between a complex object definition and simple data property the information model supports data property expressions (but without introducing a new language construct() as follows:

  1. Simple data property against the class e.g. a "contact"
  2. Patient's home telephone number modelled as a sub property "homeTelephoneNumber with is a sub property of "telephone number", which is itself a sub property of "contact".
  3. A standard RDFS property of the homeTelephone property entity - > "isDefinedBy" which points to a class expression which defines a home telephone number, (itself a subclass of a class expression TelephoneNumber) thus allowing all properties values to be "implicit but defined" as part of the ontology.

By this technique subsumption queries that look for home contacts or home telephone numbers or find numbers with US country codes will find the relevant field and the relevant sub pattern of a data property..

Implementations would still need to parse numbers to properties if they stored numbers as simple numbers but these would be part of a data model map against the IM models definition.

Information model meta classes

See main article Information model meta classes

Using the above languages this defines the classes used to model all health data.