Health Information modelling language - overview: Difference between revisions

Revision as of 12:40, 16 May 2020

DRAFT : These pages are being built from the information modelling language specification version 4.

This article provides an outline of the Discovery information modelling language, used to provide a machine and human readable information model.

The language is a single language but can be categorised into the Discovery ontology language and a set of language elements that describe the data model, value sets and data sets.

The language is broadly divided along the lines of the information model itself i.e. a language for ontology, value set, data model, data set and query. However, there is significant overlap between the areas of the language, which can be seen to be a whole.

Background to the language and standards

The modelling language draws heavily on 3 main standards; The W3C OWL2 DL, W3C SPARQL languages and UML.

Language ideas have been supplemented by CYPHER (property graph) and SQL, as that is likely to be the run time query language commonly used.

The approach to syntax has been to offer a standard language where users prefer, but also to have a simple JSON based syntax (Discovery syntax) that many will find easier to follow. There are converters to standard languages available where relevant.

Semantic ontology

The semantic ontology language is part of the Discovery information modelling language.

The language used for the Discovery ontology is an OWL2 DL profile with some limitations making it OWL2 RL with the addition of exact cardinality (for use in the closed world data model), and Object union in ranges for use in value sets.

An ontology is concerned with the meaning of things and defines the meaning of things in a way that allows inferences to be made about other things. An ontology designed with a language such as OWL2 allows machines to make the sophisticated inferences required to support query and decision support. OWL2 itself is based on an underlying Description Logic variation which have underpinned machine reasoning for decades.

Ontologies have been used in health information systems for many years and more recently, the emergence of Snomed-CT as the de-facto health terminology has illustrated the potential power of description logic which underpins OWL.

Discovery implementation supports the official OWL2 functional syntax but also provides a JSON based syntax, “Discovery Syntax” which absorbs other language constructs for other parts of the model, and thus can also be used for the language of all of the IM components.

The Ontology language describes four main structural types which cut across the information model content parts as described above, which are:

Concepts classes and properties

main article : Concepts classes and properties form the building blocks used by the language and are declared in an ontology.

Value sets

Main article : Value sets

The language supports three forms for representing value sets; OWL2 Discovery syntax, OWL2 functional syntax, and Expression constraint language

Data model

This part of the language is used to define a data model. Data model content may be automatically generated from the OWL2 representation of the data model, with the result that a much simpler syntax can be used. The data modelling language relies on the OWL2 language for managing subclasses, sub properties, domains, ranges, cardinality. Thus in Discovery the language is really just a simplified output syntax from the more sophisticated description logic.

Ontology purists will notice that modelling a data model in OWL2 is in fact a breach of the fundamental open world assumption view of the world taken in ontologies and instead applies the https://en.wikipedia.org/wiki/Closed-world_assumption view instead. Conseqently, a data model would normally be used independently of DL reasoners, and therefore a syntax that seperates the two is provided.

Data set definition language

This language is designed to define the building of data sets from an underlying data model and semantic ontology, supplemented with value sets defined using the value set class, itself modelled in OWL2 syntax.

Schema implementation mapping

This part of the language is used to define mappings between the data model and an actual schema to enable query and filers to automatically cope with the ever extending ontolofy and data properties.

The language can be used to auto generate starter schemas for implementation i.e. schemas that will then be optimised for real world use.

One significant mapping capability is the use of the entity subtype attribute which is the means by which a relational model with triple extensions can cope with any extension to a core data model entity via subtyping or subcomponent constraining.

@@ Line 8: / Line 8: @@
 The language is broadly divided along the lines of the information model itself i.e. a language for ontology, value set, data model, data set and query. However, there is significant overlap between the areas of the language, which can be seen to be a whole.
-<nowiki>=Background to the language and standards=</nowiki>
+== Background to the language and standards ==
+The modelling language draws heavily on 3 main standards; The W3C OWL2 DL, W3C SPARQL languages and UML.
-= Semantic ontology language =
+Language ideas have been supplemented by CYPHER (property graph) and SQL, as that is likely to be the run time query language commonly used.
+The approach to syntax has been to offer a standard language where users prefer, but also to have a simple JSON based syntax (Discovery syntax)  that many will find easier to follow. There are converters to standard languages available where relevant.
+== Semantic ontology ==
 The semantic ontology language is part of the Discovery information modelling language.
@@ Line 26: / Line 31: @@
 &nbsp;
-== Concepts classes and properties ==
+=== Concepts classes and properties ===
 ''main article&nbsp;:&nbsp;''[[Concepts_classes_and_properties|Concepts classes and properties]] form the building blocks used by the language and are declared in an ontology.
-= Data modelling language =
+== Value sets ==
+''Main article :'' [[Value sets]]
+The language supports three forms for representing value sets;  OWL2 Discovery syntax, [https://www.w3.org/TR/owl2-syntax/ OWL2 functional syntax], and [https://confluence.ihtsdotools.org/display/DOCECL/Expression+Constraint+Language+-+Specification+and+Guide Expression constraint language]
+== Data model ==
 This part of the language is used to define a data model. Data model content may be automatically generated from the OWL2 representation of the data model, with the result that a much simpler syntax can be used. The data modelling language relies on the OWL2 language for managing subclasses, sub properties, domains, ranges, cardinality. Thus in Discovery the language is really just a simplified output syntax from the more sophisticated description logic.
@@ Line 36: / Line 46: @@
 Ontology purists will notice that modelling a data model in OWL2 is in fact a breach of the fundamental &nbsp;[https://en.wikipedia.org/wiki/Open-world_assumption open world assumption]&nbsp;view of the world taken in ontologies and instead applies the&nbsp;[https://en.wikipedia.org/wiki/Closed-world_assumption https://en.wikipedia.org/wiki/Closed-world_assumption]&nbsp;view instead. Conseqently, a data model would normally be used independently of DL reasoners, and therefore a syntax that seperates the two is provided.
-= Data set definition language =
+== Data set definition language ==
 This language is designed to define the building of data sets from an underlying data model and semantic ontology, supplemented with value sets defined using the value set class, itself modelled in OWL2 syntax.
@@ Line 42: / Line 52: @@
 &nbsp;
-= Schema implementation mapping =
+== Schema implementation mapping ==
 This part of the language is used to define mappings between the data model and an actual schema to enable query and filers to automatically cope with the ever extending ontolofy and data properties.&nbsp;