Health Information modelling language - overview: Difference between revisions

From Endeavour Knowledge Base
Line 242: Line 242:
=== Ontology axiom vocabulary ===
=== Ontology axiom vocabulary ===


The Discovery semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL], which itself is a sublanguage of the [https://www.w3.org/TR/owl2-syntax/ OWL2 language]
For the purposes of reasoning  the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile [https://www.w3.org/TR/owl2-profiles/#OWL_2_EL OWL EL], which itself is a sublanguage of the [https://www.w3.org/TR/owl2-syntax/ OWL2 language]


As such the specification of the language itself is delegated to the relevant  W3C standard specification and is not repeated here.
However, in addition some standard OWL2 DL axioms are used in order to provide a means of specifying additional relationships that are of value when defining relationships. The following table lists the main owl types used and example for eachNote that their aliases are used for brevity. Please refer to the OWL2 specification to describe their meanings
 
{| class="wikitable"
Within Discovery, the semantic ontology language sits as a sublanguage to the [[Data modelling language|Discovery information modelling language]] as a whole.   
|+
 
!Owl construct
It's purpose is to specify  the meaning of things used in the information model, and in particular for the purposes of authoring concepts likely to be used in [[Subsumption test|subsumption testing]] ,which is a process pivotal to health query[[File:Ontology.jpg|The 4 main constructs of the ontology|link=https://wiki.discoverydataservice.org/File:Ontology.jpg|alt=|thumb]]
!usage examples
 
|-
Ontologies have been used in health information systems for many years and more recently, the emergence of Snomed-CT as the de-facto health terminology has illustrated the potential power of description logic which underpins OWL. Snomed-CT itself uses OWL EL.
|Class
 
|An entity that is a class concept e.g. A snomed-ct concept or a general concept
The Ontology language describes four main structural types which are used in semantics, the concept itself, a class or property expression, and the axioms which makes statements about the concept which purport to be true.
|-
 
|ObjectProperty
These are illustrated to the right, the syntax of these being represented in any of the OWL syntaxes or the Discovery information modelling language syntax. OWL itself does not use the abstract 'concept' and expresses things as classes, object properties, data properties, annotation properties, data types and individuals. It is included here as a simplification.
|'hasSubject' (an observation '''has a subject''' that is a patient)
 
|-
 
|DataProperty
 
|'dateOfBirth' (a patient record has a date of birth attribute
== Why not OWL DL? ==
|-
There are several reasons why Discovery supports OWL EL but not OWL DL. In order to examine the reasons, a number of the constructs are highlighted and discussed.
|annotationProperty
 
|'description(a concept has a description)
==== Object Union reasoning and uncertainty issue ====
|-
In an ontology that applies [[wikipedia:Occam's_razor|occam's razor]] to its definitions, Object unions are likely to be used. When things can be defined as either/ or, it usually convenient to do so. For example, a brother in law may be either the brother of someone's spouse, the husband of someones sibling, or the husband of a sibling of a spouse.
|SubClassOf
 
|Patient is a subclass of a Person
One problem with the use of Object Union is that it can cause Reasoners to struggle due to 'OR' branching when producing inference hierarchies. Even a few object unions can cause a reasoner to slow.
|-
 
|Equivalent To
Another problem with the use of union is that it can provide instances of 'the enquirer uncertainty' problem. This is a set of common problems that some enquirers come across when the queries they author do not seem to find the results they were expecting. Here is an example:
|Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance)
 
|-
Let's say that an ontology has authored pneumonia as being caused by either a bacteria, or a virus.<pre>
|Disjoint with
SubClassOf (Pneumonia
|Father is disjoint with Mother
    ObjectIntersectionOf( Disease
|-
          ObjectSomeValueFrom (isCausedBy ObjectUnionOf(Virus Bacteria)))) </pre>
|Sub property of
Patient Jack, who gets flu, develops pneumonia, but it is not clear whether it is a viral pneumonia or secondary bacterial pneumonia. An entry is made into the record saying he has pneumonia
|has responsible practitioner is a subproperty of has responsible agent
<pre>
|-
Observation 1->  has subject- > Patient 1
|Property chain
Observation 1 -> has observed concept -> Pneumonia
|is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of'
</pre>
|-
 
|Inverse property
Dr Smith decides to examine cases of viral diseases and looks for patient's with diseases caused by a virus. First a value set is created with a member being defined as "things that are diseases and caused by viruses":
|is subject of is inverse of has subject
<syntaxhighlight lang="JSON">
|-
{"ValueSet": {
|Transitive property
    "iri": ":VSET_ViralDiseases",
|is child of is transitive
    "Member": [{
|-
        "Intersection":[
|Existential quantification
            {"Class":":SN_Disease"},
|Chest pain and
            { "ObjectSome": {"Property":"isCausedBy","Value":":Virus"}}}]}]}}
Finding site of - {some} thoracic structure
</syntaxhighlight>
|-
Then the query having generated the list of viral diseases
|Object Intersection
 
|Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure
<syntaxhighlight lang="JSON">
|-
{"Query": {
|Individual
    "iri": ":Q_ViralDiseases",
|All chest pain subclasses but not the specific i''nstance of acute chest pain''
    "Match":"(:im__Patient)-[:im__isSubjectOf]->(ob:im__Observation)",
|-
    "Where" :"ob.im__hasObservedConcept IN :VSET_ViralDiseases" }}
|DataType definition
</syntaxhighlight>
|Date time  is a restriction on a string with a regex that allows approximate dates
 
|-
 
|Property domain
Jack is the name of patient 1 (and therefore is a patient). He is the subject of observation 1, and the observation 1 has an observed concept of pneumonia that may be caused by a virus or bacteria. Dr Smith may assume therefore that his query would pick Jack up, as he ''may'' have a disease caused by a virus. However, it is not clear that Jack's pneumonia was caused by a virus or not and therefore would ''not'' be in the result set. A logician would see this clearly but Dr Smith may believe that 'either/ors' should work in favour of inclusion (at least in this case).
|a property domain of has causative agent is allergic reaction
 
|-
To avoid this issue, the ontology could have been authored as follows:<pre>
|Property range
SubClassOf (Pneumonia
|A property range of has causative agent is a substance
  ObjectIntersectionOf (Disease
|}
    ObjectSomeValuesFrom(isCausedBy InfectiousAgent))
SubClassOf(Virus InfectiousAgent)
SubClassOf(Bacteria InfectiousAgent)</pre>
Now when Dr Smith runs the query for a diseases cause by a virus he would still not find Jack. However, it is clear from the ontology that pneumonia is not specific enough and does not imply any inclusion of more specific concepts. In other words Dr Smith cannot complain, there is no uncertainty.
 
Object Union is not supported in the semantic ontology.
 
====Property Chain problem====
Property chains are attractive concepts when defining properties. For example, an Aunt is the sister of a parent of someone. The property chain is modelled as a sub property of Aunt.<pre>
SubObjectPropertyOf( ObjectPropertyChain( a:hasParent a:hasSister ) a:hasAunt )
</pre>Nice and neat. However, consider the modelling of a third cousin....<pre>
SubObjectPropertyOf(ObjectPropertyChain(a:hasGreatGrandParent a:isSiblingOf a:isGreatGrandParentOf) a:hasThirdCousin)
</pre>And as a great grand parent is a sub property of "ancestor" which is a transitive property it is possible to model these to just about any level of granularity. The result is that reasoners die when faced with many property chains of this kind. And the question is, to what purpose?
 
Discovery data modelling language also includes query. It is possible to model a query for an aunt as a query containing a property path.
 
<syntaxhighlight lang="json">
{
"Query":{ "iri": "qr: AllThirdCousins",
      "Match": "Person-[:im__isSisterOf]->(:im__RelatedPerson)
          -[:im__isParentOf]->(thirdcousin:im__RelatedPerson)",
      "Select": "thirdcousin"}
}
</syntaxhighlight>
 
And thus the same objective is achieved without recourse to a reasoner as this is a simple graph traversal.
 
Therefore property chains are not supported in the ontology
<br />
<br />
==Additional language constructs to OWL==
Given the full specification of OWL as available elsewhere, this section is limited to the additional elements of the semantic ontology language that help bring together the ontology and other parts of the model.
===The Concept ===
The fundamental persistent atomic unit of the Discovery information model is the concept.
A concept is defined as an ‘abstract idea’ or ‘general understanding of something’ and this meaning is preserved in the modelling language. It is one of the few abstract classes in the information model. This means that there is no actual object of 'type concept' unless it is also a type of some subtype of a concept.
In order to use a concept, a concrete type is required and there are 4 main types: Class, Object Property, Data property, and Data type. There are a few ancillary types also such as annotation properties.  Thus the things which are described within the information model, if they are supposed to persist permanently, are either classes, object properties, data properties, or data types.
Discovery follows the OWL2 language convention when defining these types. The concept itself sits "above" the OWL "Top Thing" class as it encompasses all these types. For convenience  a concept has a number of annotation properties. These do not give the concept more meaning, but are used to make them understandable and to indicate where the concepts came from.
The main feature of a concept is that, like OWL2 entities, &nbsp;it has a unique identifier in the form of an  I'''RI. '''When using the IRI the&nbsp;<span><span>full IRI or an abbreviated IRI is used and more commonly the latter.</span></span>&nbsp;
A concept&nbsp; also comes with a fixed set of annotation properties that can be relied on to be present or have null values:
<span>'''Status&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;'''</span><span><span>A status concept representing the status of the concept in terms if its activity status&nbsp; &nbsp;</span></span><span><span>e.g. Active or inactive</span></span>
<span><span>'''Name&nbsp; &nbsp; &nbsp;&nbsp;'''&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;This is the full name of the concept (or preferred term in Snomed-CT.) In OWL2 this is a label annotation</span></span>
<span>'''Description&nbsp; &nbsp; &nbsp;'''A plain language meaning of the concept, and how it may be used</span>
<span>'''Original code&nbsp; '''If the concept has a code, the code assigned to this concept by the original creator, e,g, a Snomed-CT, READ2, ICD10, OPCS or local code or auto generated code</span>
<span>'''Code scheme&nbsp;&nbsp;'''If the concept has a code, the code scheme assigned to this code, the scheme itself being an IRI</span><br />
=== Use of Annotation properties===
=== Use of Annotation properties===
Annotation properties are the properties that provide information beyond that needed for reasoning.&nbsp; They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. Annotation properties can also be used for implementation supporting properties such as release status, version control, authoring dates and times and so on.&nbsp;
Annotation properties are the properties that provide information beyond that needed for reasoning.&nbsp; They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. Annotation properties can also be used for implementation supporting properties such as release status, version control, authoring dates and times and so on.&nbsp;


Typical annotation properties are names and descriptions. They are also used as meta data such as a status of a concept or the version of a document.
Typical annotation properties are names and descriptions. They are also used as meta data such as a status of a concept or the version of a document.
===Use of Identifier&nbsp;IRIs and codes===
Each concept within the information model has a unique '''identifier''' which is immutable once published.
The identifier is in the form of an '''IRI'''.
The nature and use of the IRI aligns with the W3C OWL2 specification (which itself aligns with W3C SPARQL). The use of the IRI is quite flexible and can be presented in one of the following ways:
==== Absolute IRI====
This may be used in any document or message that defines or references the concept.For example, in OWL functional syntax
<pre>Declaration (Class(<http://www.discoverydataservive.org/InformationModel#DM_Encounter
      <http://www.DiscoveryDataServive.org/InformationModel#DM_Encounter></pre>
In Discovery JSON syntax
<syntaxhighlight lang="json">
{"Class": [{"iri": "http://www.discoverydataservive.org/InformationModel#DM_Encounter"}]}
</syntaxhighlight>
====Abbreviated IRI====
This may be used in any document that declares a prefix within the document in order to resolve the IRI.&nbsp;For example in OWL functional syntax
<pre>Prefix(im:=<http://www.discoverydataservice.org/InformationModel>
Declaration(Class(im:RM_Encounter))</pre>
In Discovery syntax
<syntaxhighlight lang="json">
{"prefix": ":",
  "iri": "http://www.discoverydataservice.org/InformationModel"}
{"Class": { "iri"; ":DM_Encounter"}
</syntaxhighlight>
<br />
==== Identifier format and hints====
From a machine perspective, Identifiers contain no useful information as to their meaning or purpose and cannot be parsed except to resolve them by one of the above mechanisms.
However, as ontologies arrive with many codes and formats, some of which overlap, it useful to have a convention to differentiate the source of the concept even within the bounds of the namespace.
N.B. the differentiation MUST NOT be relied on in business logic as there is no semantics within an identifier IRI.
In order to provide a unique identifier within a baseline IRI the following are examples of how it could be done:
RM_hasSubject&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RM_ hints that it is a Discovery data model relationship
R2_H33&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; R2_ hints that the concept is a Read 2 code
SN_47032000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SN_ hints that the concept is a Snomed concept


&nbsp;
====Original codes and text ====
====Original codes and text ====
Many concepts are derived directly from source systems that used them as codes, or even free text.
Many concepts are derived directly from source systems that used them as codes, or even free text.

Revision as of 12:55, 2 January 2021

Please note. The information in this section represents a specification of work in progress. Actual implementations using the language have partial implementation of the grammars and syntaxes described here.

Purpose background and rationale

Question: Yet another language? Surely not.

Answer: No, or at least, not quite.

The following sections first describe the purpose of a modelling language, the background to the Discovery approach and the rationale behind the approach adopted.

Subsequently the sections break down the various aspects of the language at ever increasing granularity, emphasising the relationship between the language fragments and the languages from which they are derived, resulting in the definition of the grammar of the language. The relationship between the language and a model store is also described.

Purpose of the language.

The main purpose of a modelling language is to exchange data and information about information models in a way that both machines and humans can understand. A language must be able to describe the types of structures that make up an information model as can be seen on the right. Diagrams and pictures are all very well, but they cannot be used by machines.

It is necessary to support both human and machine readability so that a model can be validated both by humans and computers. Humans can read diagrams or text. Machines can read data bits. The two forms can be brought together as a stream of characters forming a language.

A purely human based language would be ambiguous, as all human languages are. A language that is both can be used to promote a shared understanding of often complex structures whilst enabling machines to process data in a consistent way.

It is almost always the case that a very precise machine readable language is hard for humans to follow and that a human understandable language is hard to compute consistently. As a compromise, many languages are presented in a variety of grammars and syntaxes, each targeted at different readers. The languages in this article all adopt a multi-grammar approach in line with this dual purpose.

Contributory languages

An information model can be modelled as a Graph i.e. a set of nodes and edges (nodes and relationships, nodes and properties). Likewise, health data can be modelled as a graph conforming to the information model graph.

The world standard approach to a language that models graphs is RDF, which considers a graph to be a series of interconnected triples, a triple consisting of the language grammar of subject, predicate and object. Thus the Discovery modelling language uses RDF as its fundamental basis, and can therefore be presented in the RDF grammars. The common grammars used in this article include TURTLE (terse RDF Triple language) and the more machine friendly JSON-LD (json linked data) which enables simple JSON identifiers to be contextualised in a way that one set of terms can map directly to internationally defined terms or IRIs.

RDF in itself holds no semantics whatsoever i.e. no one can infer or validate or query based purely on an RDF structure. To use it It is necessary to provide semantic definitions for certain predicates and adopt certain conventions to imply usage of the constructs. In providing those semantic definitions, the predicates themselves can then be used to semantically define many other things.

The three aspects alluded to above are covered by the logical inclusion of fragments, or profiles of a set of W3C semantic based languages described further in the sublanguages section of this article.

The language requirements and solutions

This section describes the approach to the design of the grammars of the Discovery super set languages, including how the sublanguages are incorporated into a single whole, without loss of meaning.

The requirements divided into paragraphs and subsections which follow a set of solution decisions which start from the basic fundamental requirements , and end with the final grammar specification referenced in a separate article. The decisions made are as follows:

Data as a Graph

Health data can be conceptualised as a graph and thus the model of health data must be a graph model.

A graph is considered as a set of nodes and edges. For a graph to be valid, a node must have at least one edge, and an edge must be connected to at least two nodes. Thus the smallest graph must have at least 3 entities.

Consequently the language used must support graph concepts and the selected languages are based on graph concepts

Human and machine readable

The language must be both human and machine readable in text form, whether or not

The language the recognisable plain language characters in UTF-8. For human readability the characters read from left to right and for machine readability a graph is a character stream from beginning to end.

Optimised human legibility and optimised machine readability

These two are impossible to reconcile in a single grammar. Consequently two grammars are developed, one for human legibility and the other for optimised machine processing. However, both must be human and machine readable and translators are a pre-requisite.

Semantic translatability

A model presented in the human legible grammar must be translatable directly to the machine representation without loss of semantics. In the ensuing paragraphs the human optimised grammar is illustrated but in the final language specification both are presented side by side to illustrate semantic translatability.

Human oriented grammar

A language based grammatical approach is taken, with the English language sentence being the basis, namely the modelling of data via sentences consisting of subject, predicate and object in that order. Legibility (and machine parsing) is assisted by punctuation.

A terse approach is taken to grammar i.e. avoiding ambiguous flowery language. Thus RDF triples form the basis of the model.

Semantic triples

Predicates form the basis of semantic interpretation and predicates are used as atomic entities that have identifiers. Predicate identifiers are recognisably related to their meaning but are given further definition via prose for background. For example a predicate <http://...../dateOfBirth>" is a property that holds a value that is a data of birth. Subjects and objects may also have identifiers which may or may not be meaningful. As subjects and objects operate as nodes, and nodes require edges, predicates thus assume the role of edges in a graph.

To make sense of the language, subjects require constructs that include predicate object lists, object lists, and objects which can themselves subjects defined by predicates. Put together with the terse language requirement, the grammar used in the human oriented language is TURTLE. The following snip illustrates the main TURTLE structures together with the punctuation:

Subject1
   Predicate1 Object1;
   Predicate2 Object2;                 # predicate object list separated by ';'
   Predicate3 (Object3, 
               Object4, 
               Object5);               #object list enclosed by '()'
   Predicate4 [                        #anonymous object with predicates enclosed []
                Predicate5 Object6;
                Predicate6 Object7
              ]
.                                       # end of sentence

Subject2 Predicate1 Object1.            #Simple triple                     


Semantic context

The interpretation of a structure is often dependent on the preceding predicate. Because the language is semantically constrained to the profiles of the sublanguages, certain punctuations can be semantically interpreted in context. For example, as the language incorporates OWL EL but not OWL DL, Object Intersection (and) is supported but not Object union. In other words, in certain contexts there are ANDS but not ORS.

This allows for the use of a collection construct, for example in the following equivalent definitions of a grandfather, in the first example the grandfather is an equivalent to an intersection of a person and someone who is male and has children, and the second one is an intersection of a person, something that is male, and someone that has children. Both interpretations assume AND as the operator because OR is not supported at this point in OWL EL.

Grandfather
   isEquivalentTo (
              Person,                             #class 1
              [ hasGender Male;                   #class 2
                hasChild (Person,                 #class 2.1
                          hasChild Person)        #class 2.2  ].

Which can also be illustrated as:

or Grandfather
  isEquivalentTo (Person,                            #class 1
                   hasGender Male,                   #class  2
                   hasChild (Person,                 #class 3
                             hasChild Person)        #class 4
 ].

Machine oriented grammar

JSON is a popular syntax currently and thus this is used as an alternative.

JSON represents subjects , predicates and objects as object names and values with values being either literals or or objects.

JSON itself has no inherent mechanism of differentiating between different types of entities and therefor JSON-LD is used. In JSON-LD identifiers resolve initially to @id and the use of @context enables prefixed IRIs and aliases.

The above Grandfather can be represented in JSON-LD (context not shown) as follows:

{"@id" : "Grandfather",
 "EquivalentTo" :[{ "@id":"Person"},
                  {"hasGender": {"@id":"Male"}},
                  {"hasChild": [{"@id":"Person"},
                                {"hasChild" : {"@id":"Person"}}]]}}

Which is equivalent to the verbose version 2 syntax showing explicit intersections and classes defined only by object property

{"iri" : "Grandfather",
 "EquivalentTo" :[
    {"Intersection":[
       { "Class": {"iri":"Person"}},
       {"ObjectPropertyValue": {
           "Property": {"iri":"hasGender"},
            "ValueType": {"iri":"Male"}
            }},
      {"ObjectPropertyValue": {
           "Property":{"iri":"hasChild"},
            "Expression":{
                "Intersection":[ 
                   {"Class":{"iri":"Person"}},
                   {"ObjectPropertyValue":{
                      "Property":{"iri": "hasChild"},
                      "ValueType": {"iri":"Person"}} ] } } ]}


Language sublanguage grammar and syntax

Sub languages

The Discovery language, as a mixed language, has its own grammars , but in addition a number of W3C recommended languages can be used in their respective grammars and syntaxes for the elements of interest. This enables multiple levels of interoperability. For example, the information model contains an OWL2 ontology and can therefore be accessed via any of the OWL2 syntaxes. The information model also contains data models in the form of shapes and can therefore be accessed via SHACL. Queries can be exchanged via SPARQL and expression constraints by ECL.

For those who want a consistent syntax encompassing the entire model then Discovery grammar can be used and this is supported via TURTLE and JSON-LD as well as JSON serialised java classes.

Many specialised sublanguages overlap with others in a way that makes them difficult to translate to each other. For example, in ECL, the reserved word MINUS (used to exclude certain subclasses from a superclass) , could be mapped to the much more obscure OWL2 syntax that requires the modelling of class IRIs "punned" as individual IRIs, in order to properly exclude instances when generating lists of concepts. Likewise it could be mapped to SPARQL. The Discovery information model helps resolve this problem by supporting the sub languages directly.

The following sublanguages are supported in TURTLE, JSON-LD or OWL Functional syntax:

  • OWL2, which is used for semantic definition and inference. In line with convention, only OWL2 EL is used and thus existential quantification and object intersection can be assumed in its treating of class expressions and axioms. The open world assumption inherent in OWL means it is very powerful for subsumption testing but cannot be used for constraints without abuse of the grammar.
  • SHACL, which is used for data modelling constraint definitions. SHACL can also include OWL constructs but its main emphasis is on cardinality and value constraints. It is an ideal approach for defining logical schemas, and because SHACL uses IRIs and shares conventions with other W3C recommended languages it can be integrated with the other two aspects. Furthermore, as some validation rules require quite advanced processing SHACL can also include query fragments.
  • SPARQL, GRAPHQL are both used for query. SPARQL forms the basis of interoperable query. GRAPH QL, when presented in JSON-LD is a pragmatic approach to extracting graph results via APIs as its type and directive systems enables properties to operating as functions or methods. SPARQL is a more standard W3C query language for graphs but suffers from its own in built flexibility (and an ambiguous issue with subqueries) making it hard to produce consistent results. Consequently a pragmatic SPARQL profile is supported. The degree of SPARQL is included to the extent that it can be easily interpreted into SQL or other query languages. SPARQL with entailment regimes are in effect SPARQL query with OWL support.
  • RDF itself. RDF triples can be used to hold objects themselves and an information model will hold many objects which are instances of the classes as defined above (e.g. value sets and other instances)

The information modelling services used by Discovery can interoperate using the above sub-languages, but Discovery also includes a language superset making it easy to integrate. For example it is easy to mix OWL axioms with data model shape constraints as well as value sets without forcing a misinterpretation of axioms.

Foundation grammars and syntaxes

Discovery language has its own Grammars built on the foundations of the W3C RDF grammars:

  • A terse abbreviated language, TURTLE
  • JSON-LD representation, which can used by systems that prefer JSON, wish to use standard approaches, and are able to resolve identifiers via the JSON-LD context structure.
  • Proprietary JSON based object serializable grammar. This directly maps to the internal class structures used in Discovery and can be used by client applications that have a strong contract with a server.

Identifiers aliasing and context

Concepts are identified and referenced by the use if International resource identifiers (IRIS).

Identifiers are universal and presented in one of the following forms:

  1. Full IRI (International resource identifier) which is the fully resolved identifier encompassed by <>
  2. Abbreviated IRI a Prefix followed by a ":" followed by the local name which is resolved to a full IRI
  3. Aliases. The core language tokens (that are themselves concepts) have aliases for ease of use. For example rdfs:subClassOf is aliased to subClassOf,

There is of course nothing to stop applications using their own aliases and when used with JSON-LD @context may be used to enable the use of aliases.

Data is considered to be linked across the world, which means that IRIs are the main identifiers. However, IRIs can be unwieldy to use and some of the languages such as GRAPH-QL do not use them. Furthermore, when used in JSON, (the main exchange syntax via APIs) they can cause significant bloat. Also, identifiers such as codes or terms have often been created for local use in local single systems and in isolation are ambiguous.

To create linked data from local identifiers or vocabulary, the concept of Context is applied. The main form of context in use are:

  1. PREFIX declaration for IRIs, which enable the use of abbreviated IRIs. This approach is used in OWL, RDF turtle, SHACL and Discovery itself.
  2. VOCABULAR CONTEXT declaration for both IRIs and other tokens. This approach is used in JSON-LD which converts local JSON properties and objects into linked data identifiers via the @context keyword. This enables applications that know their context to use simple identifiers such as aliases.
  3. MAPPING CONTEXT definitions for system level vocabularies. This provides sufficient context to uniquely identify a local code or term by including details such as the health care provider, the system and the table within a system. In essence a specialised class with the various property values making up the context.

The following is an example of the use of the prefix directives for IRIs and to define aliases for some of the owl and rdfs tokens

@prefix :  <http://www.DiscoveryDataService.org/InformationModel/Ontology#>.
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh:   <http://www.w3.org/ns/shacl#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#>.
@prefix sn: <ttp://snomed.info/sct#>.


rdfs:subClassOf :alias ("subClassOf").

owl:Class  :alias ("Class").

:alias
   a owl:DatatypeProperty;
   <http://www.DiscoveryDataService.org/InformationModel/Ontology#alias> 
   ("alias").

rdfs:label
    a owl:annotationProperty;
    :alias ("name" "label") .

owl:DataTypeProperty
    a owl:DataTypeProperty;
    :alias ("dataProperty") .

As the aliases are defined they can now be used in an abbreviated Turtle syntax, as well as the standard turtle syntax. For example, the following Snomed-CT code is defined as a class and a subclass of another Snomed-CT code.

sn:407708003 a Class; name : "Sample appearance (observable entity)"; subClassOf sn:407708003.

A JSON-LD equivalent of the above uses context to cover prefixes and aliases in a simpler manner

"@context" : {
   
    "alias" : {
      "@id" : "http://www.DiscoveryDataService.org/InformationModel/Ontology#alias",
      "@type" : "@id"
    },
    "@base" : "http://www.DiscoveryDataService.org/InformationModel/Ontology#",
    "" : "http://www.DiscoveryDataService.org/InformationModel/Ontology#",
    "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "sh" : "http://www.w3.org/ns/shacl#",
    "owl" : "http://www.w3.org/2002/07/owl#",
    "xsd" : "http://www.w3.org/2001/XMLSchema#",
    "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
  }

Resulting in the standard JSON-ld context based approach:

 {"@id" : "sn:12345",
    "@type" : "owl:Class",
    "subClassOf" : "sn:34568"
  }

Ontology axiom vocabulary

For the purposes of reasoning the semantic ontology axiom and class expression vocabulary uses the tokens and structure from the OWL2 profile OWL EL, which itself is a sublanguage of the OWL2 language

However, in addition some standard OWL2 DL axioms are used in order to provide a means of specifying additional relationships that are of value when defining relationships. The following table lists the main owl types used and example for each. Note that their aliases are used for brevity. Please refer to the OWL2 specification to describe their meanings

Owl construct usage examples
Class An entity that is a class concept e.g. A snomed-ct concept or a general concept
ObjectProperty 'hasSubject' (an observation has a subject that is a patient)
DataProperty 'dateOfBirth' (a patient record has a date of birth attribute
annotationProperty 'description' (a concept has a description)
SubClassOf Patient is a subclass of a Person
Equivalent To Adverse reaction to Atenolol is equivalent to An adverse reaction to a drug AND has causative agent of Atenolol (substance)
Disjoint with Father is disjoint with Mother
Sub property of has responsible practitioner is a subproperty of has responsible agent
Property chain is sibling of'/ 'is parent of' / 'has parent' is a sub property chain of 'is first cousin of'
Inverse property is subject of is inverse of has subject
Transitive property is child of is transitive
Existential quantification Chest pain and

Finding site of - {some} thoracic structure

Object Intersection Chest pain is equivalent to pain of truncal structure AND finding in region of thorax AND finding site of thoracic structure
Individual All chest pain subclasses but not the specific instance of acute chest pain
DataType definition Date time is a restriction on a string with a regex that allows approximate dates
Property domain a property domain of has causative agent is allergic reaction
Property range A property range of has causative agent is a substance


Use of Annotation properties

Annotation properties are the properties that provide information beyond that needed for reasoning.  They form no part in the ontological reasoning, but without them, the information model would be impossible for most people to understand. Annotation properties can also be used for implementation supporting properties such as release status, version control, authoring dates and times and so on. 

Typical annotation properties are names and descriptions. They are also used as meta data such as a status of a concept or the version of a document.

Original codes and text

Many concepts are derived directly from source systems that used them as codes, or even free text.

As noted above, the concept class also contains properties indicating the source or origin of the concept.

The concept should therefore indicate the source and original code or text (or combination) in the form actually entered into the source system. It should be noted that many systems do not record codes exactly as determined by an official classification or provide codes via mappings from an internal id.  It is the codes or text used from the publishers perspective that  is used as the source.

Thus in many cases, it is convenient to auto generate a code, which is then placed as the value of the “code” property in the concept, together with the scheme. From this, the provenance of the code can be inferred.

Each code must have a scheme. A scheme may be an official scheme or  proprietary scheme or a local scheme related to a particular sub system.

For example, here are some scheme/ code combinations

Scheme Original Code/Text/Context Concept code/ Auto code Meaning
Snomed-CT            47032000         47032000         Primary hydrocephaly
EMIS- Read H33-1   H33-1   Bronchial asthma
EMIS – EMIS EMISNQCO303 EMLOC_EMISNQCO303 Confirmed corona virus infection
Barts/Cerner Event/Order=687309281 BC_687309281 Tested for Coronavirus (misuse of code term in context)
Barts/Cerner Event/Order= 687309281/ResultTxt= SARS-CoV-2 RNA DETECTED BC_dsdsdsdx7 Positive coronavirus result

 

                                          

Note that in the last example, the original code is actually text and has been contextualised as being from the Cerner event table, the order field having a value of 687309281 and the result text having a value of ResultTxt= SARS-CoV-2 RNA DETECTED

 

 

 

 

 


 



OWL2, like Snomed-CT, forms the logical basis for semantic definition and axioms for inferencing .OWL2 subsets of Discovery are available in the Discovery syntaxes or the OWL 2 syntaxes.

In its usual use, OWL2 EL is used for reasoning and classification via the use of the Open world assumption. In effect this means that OWL2 can be used to infer X from Y which forms the basis of most subsumption or entailment queries in healthcare.

Note. In theory, OWL2 DL can also used to model property domains and ranges so that then may be used as editorial policies. Where classic OWL2 DL normally models domains of a property in order to infer the class of a certain entity, one can use the same grammar for use in editorial policies i.e. only certain properties are allowed for certain classes. However, this represents a misuse of the OWL grammar i.e. use of the syntax to mean something else. Therefore SHACL is used for editorial policies.

For example, where OWL2 may say that one of the domains of a causative agent is an allergy (i.e.an unknown class with a property of causative agent is likely to be an allergy), in the modelling the editorial policy states that an allergy can only have properties that are allowed via the property domain. Thus the Snomed MRCM could be modelled in OWL2 DL. However, the SHACL construct of targetObjectOf and targetSubject Of are used as a constraint.

Thus only existential quantification and object Object intersections are use for reasoning. Cardinality is likewise not required.

The ontology in theory supports the OWL2 syntaxes such as the Functional syntax and Manchester syntax, but can be represented by JSON-LD or the Discovery JSON based syntax, as part of the full information modelling language. Of particular value is the Inverse property of axiom as this can then be used when examining data model properties.

Together with the query language, OWL2 makes the language compatible also with Expression constraint language which is used as the standard for specifying Snomed-CT expression query.

The ontologies that are modelled are considered as modular ontologies. it is not expected that one "mega ontology" would be authored but that there would be maximum sharing of concept definitions (known as axioms) which results in a super ontology of modular ontologies.

Data modelling as shapes

Data models , model classes and properties according to business purposes. This is a different approach to the open world assumption of semantic ontologies.

To illustrate the difference, take the modelling of a human being or person.

From a semantic perspective a person being could be said to be an equivalent to an animal with a certain set of DNA (nuclear or mitochondrial) and perhaps including the means of growth or perhaps being defined at some point before, at the start of, or sometime after the embryonic phase. One would normally just state that a person is an instance of a homo sapiens and that homo sapiens is a species of.... etc.

From a data model perspective we may wish to model a record of a person. We could say that a certain shape is "a record of" a person. In SHACL this is referred to as "targetClass". The shape will have one date of birth, one current gender, and perhaps a main residence. Cardinality is expected.

SCHACL is used inherently, although consideration is given to its community cousin Shex.

The difference is between the open and close world and the model of the person is a constraint on the possible (unlimited) properties of a person.

A particular data model is a particular business oriented perspective on a set of concepts. As there are potentially thousands of different perspectives (e.g. a GP versus a geneticist) there are potentially unlimited number of data models. All the data models modelled in Discovery share the same atomic concepts and same semantic ontological definitions across ontologies where possible, but where not, mapping relationships are used.

The binding of a data model to its property values is based on a business specific model. For example a standard FHIR resource will map directly to the equivalent data model class, property and value set, whose meaning is defined in the semantic ontology, but the same data may be carried in a non FHIR resource without loss of interoperability.

A common approach to modelling and use of a standard approach to ontology, together with modularisation, means that any sending or receiving machine which uses concepts from the semantic ontology can adopt full semantic interoperability. If both machines use the same data model for the same business, the data may presented in the same relationship, but if the two machines use different data models for different businesses they may present the data in different ways, but without any loss of meaning or query capability.

The integration between data model shapes and ontological concepts makes the information model very powerful and is the singe most important contributor to semantic interoperability,

Data mapping

This part of the language is used to define mappings between the data model and an actual schema to enable query and filers to automatically cope with the ever extending ontology and data properties. 

This is part of the semantic ontology but uses the idea of context (described later on).

Query

It is fair to say that data modelling and semantic ontology is useless without the means of query.

The current approach to the specification of query uses the GRAPHQL approach with type extensions and directive extensions.

Graph QL , (despite its name) is not in itself a query language but a way of representing the graph like structure of a underlying model that has been built using OWL. GRAPH QL has a very simple class property representation, is ideal for REST APIs and results are JSON objects in line with the approach taken by the above Discovery syntax.

Nevertheless, GRAPHQL considers properties to be functions (high order logic) and therefore properties can accept parameters. For example, a patient's average systolic blood pressure reading could be considered a property with a single parameter being a list of the last 3 blood pressure readings. Parameters are types and types can be created and extended.

In addition GRAPHQL supports the idea of extensions of directives which further extend the grammar.

Thus GRAPHQL capability is extended by enabling property parameters as types to support such things as filtering, sorting and limiting in the same way as an.y other query language by modelling types passed as parameters. Subqueries are then supported in the same way.

GRAPHQL itself is used when the enquirer is familiar with the local logical schema i.e. understands the available types and fields. In order to support semantic web concepts an extension to GRAPHQL, GRAPHQL-LD is used, which is essentially GRAPH-QL with JSON-LD context.

GRAPH QL-LD has been chosen over SPARQL for reasons of simplicity and many now consider GRAPHQL to be a de-facto standard. However, this is an ongoing consideration.

ABAC language

Main article : ABAC Language

The Discovery attribute based access control language is presented as a pragmatic JSON based profile of the XACML language, modified to use the information model query language (SPARQL) to define policy rules. ABAC attributes are defined in the semantic ontology in the same way as all other classes and properties.

The language is used to support some of the data access authorisation processes as described in the specification - Identity, authentication and authorisation .

This article specifies the scope of the language , the grammar and the syntax, together with examples. Whilst presented as a JSON syntax, in line with other components of the information modelling language, the syntax can also be accessed via the ABAC xml schema which includes the baseline Information model XSD schema on the Endeavour GitHub, and example content viewed in the information manager data files folder