Health Information modelling language - overview

From Endeavour Knowledge Base
Revision as of 08:40, 4 August 2020 by DavidStables (talk | contribs)

The Discovery Information Modelling Language is a mixed language that subsumes 5 standard sublanguages, brought together under a common grammar and an optional syntax.

The purpose of the language is to define a health information information model in a way that supports implementations of the model using different data base technologies and query languages.

Sublanguages.png

Each sublanguage is based on the grammar of a single recognised standards based language, the language having been selected as the ones that are the closest fit to the information requirements that the Discovery health information model is designed to support. A single grammar and optional single syntax enables the model to operate in an integrated manner, but at the same time enables the sublanguages to be represented in their native standard languages.

Like UMLS, the sublanguages use a common link in cross reference, a concept, which is identified with a unique identifier (Internationalised resource identifier - IRI) . a concept is usually named and defined semantically, and forms the means of traversing the model from different starting points to different end points, for different purposes.

Why another language?

There are hundreds of computer languages and thousands of natural languages, many of which are accepted as "standards" in many communities. Why have another?

Within the health informatics community, a historical separation has evolved between two modelling camps, those that model semantics of concepts via an ontology (aka terminologists), and those that model data structures for storing and transmitting data (aka structuralists). This separation reflects the difference in purpose, the difference in mindset, and the difference in skills required by the different disciplines. Furthermore, across many industry wide specialisms, similar fundamental requirements have been approached using specialised languages that appear to overlap and even conflict.

A problem with this separation occurs at the points of overlap. Different camps model their tokens and vocabularies in different ways, both from a grammar and syntax perspective.

For example, in health care it is possible to model a surgical operation as a data structure with a body site attribute. (FHIR R4 procedure) does precisely this. It is equally possible to model a procedure by including the body site either as a qualifier of a type of procedure, or as part of the procedure definition itself. Snomed-CT does precisely this. Both approaches can use the same concept for the body site itself, but they would use separate property concepts for the property of "has body site" itself. This separation of approach can lead to massive divergences. Taking the structuralist approach and extending it results in archetypes of the kind modelled by OpenEHR. Taking the ontological approach further leads to complex nested expressions which are nigh on impenetrable.

Health record query can be achieved via the use of a standard language such as SPARQL or a specialised form of query such as AQL. However, when querying the attributes of a user as part of an attribute based access control policy, a completely different way of representing query may be used.

Having a grammar and syntax that encompasses both semantics and structure, and makes the use of the common overlapping concepts much easier to manage. Having a common syntax for query definition means that a rule in an ABAC policy can use the same syntax as a health record query. Having a common message format in line within interoperability standard such as FHIR makes sure that the data is never locked in an information silo. A classic structural concept such as an encounter record and its semantic definition, can be seamlessly integrated.

Selecting one language is not an option. For example, it is possible to model data in OWL2 DL by extensive use of complex OWL constructs including functional properties, property domains, ranges, precise cardinality. It is also possible to model query as OWL expressions, except for function parameters. However, the purpose of OWL is to support reasoning, and reasoners use the open world assumption.. Data models and data query uses a closed world assumption and query languages are declarative in nature i.e. instructions as to what to do. Using OWL for purposes other than reasoning and or classification is like using English to prove Pythagoras theorem.

Bringing the languages together, at least as a temporary measure to solve a particular set of information requirements, seems worthwhile. Hence a new language.

Ontology sublanguage

Main article Discovery semantic ontology language

The semantic ontology language is part of the Discovery information modelling language.

The grammar for the semantic ontology language used for the Discovery ontology is OWL EL, which is limited profile of OWL DL. The language used for data modelling and value set modelling is OWL2 DL as the more expressive constructs are required.

As such the ontology supports the OWL2 syntaxes such as the Functional syntax and Manchester syntax, but also supports the Discovery JSON based syntax, as part of the full information modelling language.

Together with the query language, OWL2 DL makes the language compatible also with Expression constraint language which is used as the standard for specifying Snomed-CT expression query

Ontology purists will notice that modelling a data model in OWL2 is in fact a breach of the fundamental  open world assumption view of the world taken in ontologies and instead applies the https://en.wikipedia.org/wiki/Closed-world_assumption view instead. Consequently, a data model would normally be used independently of DL

Furthermore, as data models are modelled for business purposes, and semantic models are modelled for reasoning purposes, a style that connects the two via the use of an object property "is type" is used.

Data definition (query) sublanguage

Data models, and concept definitions and objects are modelled using the Graph paradigm. As a result, all content can be viewed as semantic triples consisting of subject predicate and object.

A standard language for querying triples ( SPARQL) exists. This is a very extensive language, albeit less expressive than SQL. However the majority of interoperable health queries can be expressed in a fairly limited subset of SPARQL and therefore a subset of SPARQL is selected as the means of modelling data definitions and query in Discovery.

It should be noted though that actual query is likely to be implemented in SQL and thus an interpreter is needed. However, as a result of the data maps (accessed via the data mapping language), and the restricted subset of SPARQL in use, SQL can be auto- generated from the query language.

Data mapping sublanguage

This part of the language is used to define mappings between the data model and an actual schema to enable query and filers to automatically cope with the ever extending ontology and data properties. 

The language can be used to auto generate starter schemas for implementation i.e. schemas that will then be optimised for real world use.

the main use case for he mapping sublanguage is data transformation. This uses techniques such as Object relational mapping and therefore the transform instructions in the form of maps, follow this approach. There is no single standard for ORM maps but best practice of the kind supported by open source utilities such as Hibernate is followed:

Attribute based access control language

 The standard XACML specifies a language that may be used to implement ABAC. XACML includes a set of grammatical concepts such as policy sets, policies, rules, combination rules, targets, obligations, effects and so on with many and variable sophisticated tokens and functions used to build the policy rules. XACML has its own XML syntax that can be used directly.

This language is somewhat disconnected with the other standards in terms of syntax and approach to vocab. Consequently Discovery uses a JSON profile of XACML as its ABAC language which itself models the attributes as OWL properties, and uses SPARQL as its rule representation.