Ontology services: Difference between revisions

From Endeavour Knowledge Base
No edit summary
Line 1: Line 1:
__TOC__
__TOC__
https://wiki.endeavourhealth.org/index.php?title=Information_model_service


The Information model service is a set of services that enables the Discovery common information model and its content to be created, updated, distributed and accessed.
The Information model service is a set of services that enables the Discovery common information model and its content to be created, updated, distributed and accessed.

Revision as of 09:27, 29 December 2021

https://wiki.endeavourhealth.org/index.php?title=Information_model_service

The Information model service is a set of services that enables the Discovery common information model and its content to be created, updated, distributed and accessed.

The service includes a set of web applications, a set of APIs to access the data within the information model and a set of distribution services to distribute the data to subscriber systems for their use.

All components of the service are open source and available at the Discovery Endeavour githubs https://github.com/endeavourhealth and https://github.com/endeavourhealth-discovery

The information service uses the content of the Discovery Health Information Model and uses a set of open standard and commonly used languages to exchange the data, with the provision of a pragmatic integrated information modelling language that brings the languages together to form a coherent whole.

The services are further described in this article.

Information model applications

 The information model service supports a number of applications for use by people who need to maintain or (simply access) the information model content. The applications are a suite of modules that can operate separately or together. For example the IM manager brings together a number of modules.

Each module contains components for use in other applications.

Information model Manager

The information model manager  is the main application for maintaining the information model content and provides viewing and authoring capabilities in order to maintain the content of the Discovery Health information model , which is the model of data that the information model services uses underpinned by a standards based modelling language.

The manager consists of a number of modules varying from a viewer through to the authoring of advanced ontological concepts. Modules include:

  • Ontology. Providing the ability to view and author the ontology which the Discovery IM uses
  • Data model. Providing the ability to view and author a common model defining the data held within the service using the concepts defined in the ontology
  • Sets. Provides the ability to view and maintain concept sets and value sets
  • Data sets. Provides the ability to view and maintain data set or query definitions using the ontology, data model and sets.
  • Mappings. Provides the ability to view and managing mappings between source publisher data and the common model
  • Workflow manager, used to manage tasks associated with the above

Ontology module

  • Views the ontology via the concept classification tree and views the concept axioms (the language hat defines the concepts in the ontology.
  • Views some common 'legacy' code based classifications used by source systems that have been mapped from the ontology.
  • Enables the authoring of concepts, including the ability to classify via reasoners.

Under the hood the ontology uses the OWL DL language to represent the structure, as well as properties required for mapping between core concepts and legacy concepts and codes.

The information model services include the use of reasoners that operate on the semantic ontology subset of the information model. A subset of a reasoner is a classifier which uses subsumption testing on the ontological entities to generate inferred relationships which can then be used in run time query, or to generate transitive closure tables.

The purposes of the reasoners are :

  • Help to make sure that the ontologies are logically consistent. Whilst most of the problems with the ontology are as a result of faulty axioms authored by humans, reasoners help to make sure that axioms are logically consistent within an ontology.
  • To generate inferred relationships from stated axioms, in particular the generation of the "is a" relationships from subclass axioms, equivalent class axioms, as well mapped to, replaced by and replaced relationships between active and inactive concepts or legacy concepts.

Reasoners are accessed via the use of the java OWL API ,which itself supports a number of reasoners such as Hermit and Elk.

In addition a simple ontology classifier is used to generate inferred relationships from the stated axioms, so that subsumption testing of the kind used in standard query can operate easily.

The ontology module supports expression constraint syntax for those users who wish to use Snomed-ECL to identify concepts.

Data model module

  • Views the common data model that covers a common logical description of data held within Discovery.
  • Enables the maintenance of the data model.

The common model uses ontological constructs to represent classes and properties and property value types, thus binding the ontology and the data model into single seamless continuum.

This should not be confused with a physical data model such as a database scheme or entity relationship diagram which are implementation specific.

Under the hood the model uses the SHACL language as the logical model is a graph model.

Sets module

  • Views a library of concept sets for use in query and reporting
  • Views the value sets that are bound to the data model
  • Enables the authoring of concept sets and value sets.

Under the hood, sets are defined using the data set definition language which is based on SHACL advanced grammar ,which itself includes the semantics used in Snomed-CT Expression Constraint Language. Sets can thus be viewed in ECL or SHACL.

Data set definitions

  • Views a library of Data set or 'query' definitions, which are logical descriptions of queries or data sets using data held in the common model.
  • Enables authoring of data sets.

Under the hood data set definitions enhance the set definitions by the use of more advanced filtering such as data property ranges, sub query, ordering and limiting of results. As in the data model and set definition modules, the underlying language is W3C SHACL.

Address matching and UPRN allocation application

main article : UPRN address matching application

The information model service includes a web application that allows a user to match one or more addresses from a systems address file to an authoritative address, and to allocate a Unique Property Reference Number (UPRN) for the location of that address.

main article UPRN address matching algorithms

Address matching is surprisingly difficult, and the algorithms used to match addresses are described in more detail by the UPRN address matching algorithms article.

Information service APIs

As well as the information model manager and various modules, the service provides a suite of APIs to support the use of data held within the information model libraries.

Get expanded concept set 

The value set generator returns a list of concepts that are defined by a set definition for use in queries, thus supporting advanced subsumption_testing against health care records.

Am expanded set can also contain details of mappings between core ontological concepts and legacy code systems where mappings are automated. Expansions also take account of concept replacements or substitutions thus enabling historical data to be retrieved.

The set generator API accepts the IRI of a value set either in full, or relative to a baseline IRI e.g.. http://DiscoveryDataService/InformationModel#CSET_Covid1 or simple VSET_Covid1, and returns a list of concepts to be used in the query. The API supports both core concepts and original codes that have been mapped to the core concepts, depending on whether the database uses Discovery concepts or actual original codes

Get Map APIs

The information model server provides a number of APIs and utilities that support the mapping of original fields and values into the common information model.

The data mapping APIs article describes the use of the mapping server's mapping APIs to support inbound and outbound data transformation processes that involve a map between two data models. The map maker manager article describes the way that the map maker manager operates when authoring maps

For example there are a set of Mapping hint algorithms that are machine assisted approaches to improving the speed and accuracy of mapping.

Get address and Get UPRN API

As well as providing an application the service supports the UPRN REST API, which enables a system to make a call to the UPRN address matching service with an address, and receive a response with the matched address and UPRN

Distribution services

As well as accessible by APIs and applications, the information model services provide distribution facilities for content of the IM for use in subscriber data bases or subscriber applications. All content of the information model can be distributed both in bulk and delta form

Set distributor

The concept and value set distributor maintains tables of value sets for databases that use local instances of the Discovery information model. 

This is part of the information model distribution service that runs on an application server,  and is designed to detect changes to the content of the information model and regenerate the value sets from the value set definitions. The value sets are regenerated whenever a value set definition changes or whenever there is an update to the concepts within the information model.