Health Information modelling language - overview

From Endeavour Knowledge Base
Revision as of 09:43, 25 May 2020 by DavidStables (talk | contribs)

The Discovery Data Modelling Language (or DIMOL for short) is a mixed language that subsumes 5 sublanguages, brought together under a common grammar and an optional syntax.

The purpose of the language is to define a health information information model in a way that supports implementations of the model using different data base technologies and query languages.

Sublanguages.png

Each sublanguage is either a single recognised standards based language, or is an amalgam of other known languages or structural concepts, these having been selected as the ones that are the closest fit to the information requirements that the Discovery health information model is designed to support. A single grammar and optional single syntax enables the model to operate in an integrated manner, but at the same time enables the sublanguages to be represented in their native standard languages.

Like UMLS, the sublanguages use a common link in cross reference, a concept, which is identified with a unique identifier (Internationalised resource identifier - IRI) . a concept is usually named and defined semantically, and forms the means of traversing the model from different starting points to different end points, for different purposes.

Why another language?

There are hundreds of computer languages and thousands of natural languages, many of which are accepted as "standards" in many communities. Why have another?

Within the health informatics community, a historical separation has evolved between two modelling camps, those that model semantics of concepts via an ontology (aka terminologists), and those that model data structures for storing and transmitting data (aka structuralists). This separation reflects the difference in purpose, the difference in mindset, and the difference in skills required by the different disciplines.

A problem with this separation occurs at the points of overlap between the two: e.g. properties and values. Both camps model their classes with properties and ranges (the allowable values of properties). In many cases, properties can be modelled either within the semantic space, or within the structural space, or both.

For example, it is possible to model a surgical operation as a data structure with a body site attribute. (FHIR R4 procedure) does precisely this. It is equally possible to model a procedure by including the body site either as a qualifier of a type of procedure, or as part of the procedure definition itself. Snomed-CT does precisely this. Both approaches can use the same concept for the body site itself, but they would use separate property concepts for the property of "has body site" itself. This separation of approach can lead to massive divergences. Taking the structuralist approach and extending it results in archetypes of the kind modelled by OpenEHR. Taking the ontological approach further leads to complex nested expressions which are nigh on impenetrable.

Having a grammar and syntax that encompasses both semantics and structure, makes the use of the common overlapping concepts much easier to manage. A classic structural concept such as an encounter record and its semantic definition, can be seamlessly integrated. The use of a current language is one option.

For example, it is possible to model data in OWL2 DL by extensive use of complex OWL constructs including functional properties, property domains, ranges, precise cardinality. It is also possible to model query as OWL expressions, except for function parameters. However, the purpose of OWL is to support reasoning, and reasoners use the open world assumption.. Data models and data query uses a closed world assumption and query languages are declarative in nature i.e. instructions as to what to do. Using OWL for purposes other than reasoning is like using English to prove Pythagoras theorem.

Thus is seems sensible to keep these separate. However, in doing so the separation continues and as time goes on there is some divergence. Bringing them together, at least as a temporary measure to solve a particular set of information requirements, seems worthwhile. Hence a new language.

Semantic ontology

The semantic ontology language is part of the Discovery information modelling language.

The language used for the Discovery ontology is an OWL2 DL  profile with some limitations making it OWL2 RL with the addition of exact cardinality (for use in the closed world data model), and Object union in ranges for use in value sets.

An ontology is concerned with the meaning of things and defines the meaning of things in a way that allows inferences to be made about other things. An ontology designed with a language such as OWL2 allows machines to make the sophisticated inferences required to support query and decision support. OWL2 itself is based on an underlying Description Logic variation which have underpinned machine reasoning for decades.

Ontologies have been used in health information systems for many years and more recently, the emergence of Snomed-CT as the de-facto health terminology has illustrated the potential power of description logic which underpins OWL.

Discovery implementation supports the official OWL2 functional syntax but also provides a  JSON based syntax, “Discovery Syntax” which absorbs other language constructs for other parts of the model, and thus  can also be used for the language of all of the IM components.

The Ontology language describes four main structural types which cut across the information model content parts as described above, which are:

Ontology.jpg

 

Concepts classes and properties

main article : Concepts classes and properties form the building blocks used by the language and are declared in an ontology.

Value sets

Main article : Value sets

The language supports three forms for representing value sets; OWL2 Discovery syntax, OWL2 functional syntax, and Expression constraint language

Data model

This part of the language is used to define a data model. Data model content may be automatically generated from the OWL2 representation of the data model, with the result that a much simpler syntax can be used. The data modelling language relies on the OWL2 language for managing subclasses, sub properties, domains, ranges, cardinality. Thus in Discovery the language is really just a simplified output syntax from the more sophisticated description logic.

Ontology purists will notice that modelling a data model in OWL2 is in fact a breach of the fundamental  open world assumption view of the world taken in ontologies and instead applies the https://en.wikipedia.org/wiki/Closed-world_assumption view instead. Conseqently, a data model would normally be used independently of DL reasoners, and therefore a syntax that seperates the two is provided.

Data set definition language

This language is designed to define the building of data sets from an underlying data model and semantic ontology, supplemented with value sets defined using the value set class, itself modelled in OWL2 syntax.

 

Schema implementation mapping

This part of the language is used to define mappings between the data model and an actual schema to enable query and filers to automatically cope with the ever extending ontolofy and data properties. 

The language can be used to auto generate starter schemas for implementation i.e. schemas that will then be optimised for real world use.

One significant mapping capability is the use of the entity subtype attribute which is the means by which a relational model with triple extensions can cope with any extension to a core data model entity via subtyping or subcomponent constraining.