Health Information modelling language - overview

From Endeavour Knowledge Base
Revision as of 09:05, 8 December 2020 by DavidStables (talk | contribs)

Background and rationale

Yet another language?

No. The health information modelling languages described herein are in fact a small number of open standard languages, configured and represented to make the models easier to use and more useful in the real world.

It is not necessary to understand the languages used in order to understand the modelling, but for those who have an interest, and have a technical aptitude, and do not want to need to grips with the fundamentals of logic the best places to start are with OWL2, GRAPHQL and ABAC. For those who want to get to grips with underlying logic, the best place to start is First order Logic, Description logic, High order logic and an understanding of at least one programming language like C# Java, Java script, Python etc + any query language such as SQL.

The Discovery Information Modelling Language is a JSON based syntax that includes the grammar from 3 or more standards based languages.

The purpose of the language is to help build a health information information model in a way that supports implementations of the model using different data base technologies and query languages.

The underlying philosophy behind the language is described in the article : Information modelling language - philosophy

Sublanguages.png

This article focuses on the description of the language itself.

Each sublanguage is based on the grammar of a single recognised standards based language, the language having been selected as the ones that are the closest fit to the information requirements that the Discovery health information model is designed to support. A single grammar and optional single syntax enables the model to operate in an integrated manner, but at the same time enables the sublanguages to be represented in their native standard languages.

Like UMLS, the sublanguages use a common link in cross reference, a concept, which is identified with a unique identifier (Internationalised resource identifier - IRI) . a concept is usually named and defined semantically, and forms the means of traversing the model from different starting points to different end points, for different purposes.

Ontology sublanguage

Main article Discovery semantic ontology language

The semantic ontology language is part of the Discovery information modelling language.

The grammar for the semantic ontology language used for the Discovery ontology is OWL EL, which is limited profile of OWL DL. The language used for data modelling and value set modelling is OWL2 DL as the more expressive constructs are required.

As such the ontology supports the OWL2 syntaxes such as the Functional syntax and Manchester syntax, but also supports the Discovery JSON based syntax, as part of the full information modelling language.

Together with the query language, OWL2 DL makes the language compatible also with Expression constraint language which is used as the standard for specifying Snomed-CT expression query

Ontology purists will notice that modelling a data model in OWL2 is in fact a breach of the fundamental  open world assumption view of the world taken in ontologies and instead applies the https://en.wikipedia.org/wiki/Closed-world_assumption view instead. Consequently, a data model would normally be used independently of DL

Furthermore, as data models are modelled for business purposes, and semantic models are modelled for reasoning purposes, a style that connects the two via the use of an object property "is type" is used.

Data definition (query) sublanguage

Data models, and concept definitions and objects are modelled using the Graph paradigm. As a result, all content can be viewed as semantic triples consisting of subject predicate and object.

A standard language for querying triples ( SPARQL) exists. This is a very extensive language, albeit less expressive than SQL. However the majority of interoperable health queries can be expressed in a fairly limited subset of SPARQL and therefore a subset of SPARQL is selected as the means of modelling data definitions and query in Discovery.

It should be noted though that actual query is likely to be implemented in SQL and thus an interpreter is needed. However, as a result of the data maps (accessed via the data mapping language), and the restricted subset of SPARQL in use, SQL can be auto- generated from the query language.

Data mapping sublanguage

This part of the language is used to define mappings between the data model and an actual schema to enable query and filers to automatically cope with the ever extending ontology and data properties. 

The language can be used to auto generate starter schemas for implementation i.e. schemas that will then be optimised for real world use.

the main use case for he mapping sublanguage is data transformation. This uses techniques such as Object relational mapping and therefore the transform instructions in the form of maps, follow this approach. There is no single standard for ORM maps but best practice of the kind supported by open source utilities such as Hibernate is followed:

Attribute based access control language

Main article : Discovery ABAC language 

The standard XACML specifies a language that may be used to implement ABAC. XACML includes a set of grammatical concepts such as policy sets, policies, rules, combination rules, targets, obligations, effects and so on with many and variable sophisticated tokens and functions used to build the policy rules. XACML has its own XML syntax that can be used directly.

This language is somewhat disconnected with the other standards in terms of syntax and approach to vocab. Consequently Discovery uses a JSON profile of XACML as its ABAC language which itself models the attributes as OWL properties, and uses SPARQL as its rule representation.