Ontology services: Difference between revisions

Revision as of 12:24, 7 August 2020

The Information model service is a set of services that enables the Discovery common information model and its content to be created, updated, distributed and accessed.

The service includes a set of web applications, a set of APIs to access the data within the information model and a set of distribution services to distribute the data to subscriber systems for their use.

All components of the service are open source and available at the Discovery Endeavour githubs https://github.com/endeavourhealth and https://github.com/endeavourhealth-discovery

The information service uses the content of the Discovery common information model and uses a set of open standard and commonly used languages to exchange the data, with the provision of a pragmatic integrated information modelling language that brings the languages together to form a coherent whole. The services are further described in this article.

Information model applications

The information model service supports a number of applications for use by people who need to maintain or (simply access) the information model content. The applications are a suite of modules that can operate separately or together. For example the IM manager brings together a number of modules.

Each module contains components for use in other applications.

Information model Manager

The information model manager is the main application for maintaining the information model content and provides viewing and authoring capabilities in order to maintain the content of the Discovery common information model , which is the model of data that the information model services uses underpinned by a standards based modelling language.

The manager consists of a number of modules varying from a viewer through to the authoring of advanced ontological concepts. Modules include:

IM viewer which enables a view of the ontology and/ or the data model and all artefacts created as part of the model, including a view of all the data maintained by the editors
Concept expression editor, which enables the creation of new concepts and expressions and define their meaning
Data model editor, which maintains one or more data models.
Value set editor, which maintains any number of sets of concepts or value sets
Data set editor, often used by the data sharing manager or data project manager to create data set definitions
Map maker, used to maintain maps between database schemas and the common model or maps to message types
Workflow manager, used to manage tasks associated with the above

Value set editor

The editor supports the creation of, maintenance of, distribution of, and access to, value set definitions and value sets, sometimes referred to as reference sets or concept sets.

A value set definition is a definition of a collection of concept expressions that have been brought together for a particular business purpose. A value set definition is different from a standard concept definition because the meaning of some members of a value set may not be subsumed by the implied meaning of the value set. For example a value set for gender which consists of male, female, and other, is different from the concept of gender which may include many more specialised variations.

Value set definitions are described in more detail in the Discovery_Information_model_language specification. In summary:

A value set definition is a class that has member properties and the value of each member is a class expression i.e. a Value set has members who are concept expressions.

A Class expression may be a simple pre-coordinated concepts such as a term like SN_1240751000000100 |Coronavirus disease 19 caused by severe acute respiratory syndrome coronavirus 2 (disorder)|or may be a complex class such as :

 Covid 19:  EquivalentTo: Disease
                               and (causative_agent some coronavirus-2)
                               and (has_pathological_process some infections_process)

Value sets themselves are the collections of concepts that are defined by the definition i.e. a list of concepts.

The value set editor enables people to create and maintain value set definitions which can then be downloaded, accessed via an API or distributed via the information model distribution service.

Ontology editor

Reasoners and classifiers

The information model services include the use of reasoners that operate on the semantic ontology subset of the information model. A subset of a reasoner is a classifier which uses subsumption testing on the ontological entities to generate inferred relationships which can then be used in run time query, or to generate transitive closure tables.

The purposes of the reasoners are :

Help to make sure that the ontologies are logically consistent. Whilst most of the problems with the ontology are as a result of faulty axioms authored by humans, reasoners help to make sure that axioms are logically consistent within an ontology.

To generate inferred relationships from stated axioms, in particular the generation of the "is a" relationships from subclass axioms, equivalent class axioms, as well mapped to, replaced by and replaced relationships between active and inactive concepts or legacy concepts.

Reasoners are accessed via the use of the java OWL API ,which itself supports a number of reasoners such as Hermit and Elk.

In addition a simple ontology classifier is used to generate inferred relationships from the stated axioms, so that subsumption testing of the kind used in standard query can operate easily

Address matching and UPRN allocation application

The information model service includes a web application that allows a user to match one or more addresses from a systems address file to an authoritative address, and to allocate a Unique Property Reference Number (UPRN) for the location of that address.

A person's health is significantly affected by their environment in which they live and work.

People live in flats, houses, or are homeless. People work at home, outside, or in offices or factories.

In order to identify or rectify health issues relating to environments it is necessary to evaluate the affect of location, type of property, and occupancy, on the people who live and work in them.

In order to measure the effects on health it is necessary to link the health records of people to the properties in which they live and work.

In order to create that link it is necessary to identify the relevant property and the usual way in which a property is identified is via the person's home or work address.

There are problems with using addresses as a way of linking:

Addresses are recorded by provider organisations in inconsistent ways.
Addresses themselves change over time.
Many properties have more than one address that matches the property. For example the local authority may have one address, whilst the post office may have another.

One way of resolving this problem is to assign a property identifier to an actual location and link the various addresses to it. When this is done, then by linking a person's health record to this property identifier, issues relating to the property, location and occupancy can be studied, problems can be acted on, and lives will be saved.

Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.

Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.

That being the case, if there was a service that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person. This will save lives.

Discovery information model supports the mapping of health related addresses to addresses provided by an authoritative organisation, those addresses being a gold standard for pointing to a UPRN. The matching service provides two subtypes of service:

A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs
A REST API that systems can use to request a matched address and a UPRN for that address

Address matching algorithms

main article UPRN address matching algorithms

Address matching is surprisingly difficult, and the algorithms used to match addresses are described in more detail by the UPRN address matching algorithms article.

Information model libraries

IM repositories hold the content of the information model. There are various categories of repositories that align with the model manager modules and the modellin language. The types of repositories include:

The Ontology library, which holds all of the concepts and their definitions from a multiplicity of taxonomies and classifications
Expression library, which holds a set of re-usable expressions that have been created from the concepts.
Value set library, which holds collections of concept definitions for use in query and reporting
Data model library, which holds data models.
Data set library, which holds data sets
Data map library, which holds collections of maps between data models, object models and related artefacts
Query library, which holds collections of queries designed to query data models in order to produce data sets for reports, or provide knowledge to aid decisions.

Information service APIs

As well as the information model manager and various modules, the service provides a suite of APIs to support the use of data held within the information model libraries.

Get run time value set

The value set generator returns a list of concepts that are defined by a value set definition for use in queries, thus supporting advanced subsumption_testing against health care records. A run time value set is effectively the same as the output of descendants from a transitive closure table, but includes indicators as to the nature of the leaf concepts (e.g. whether mapped or replaced, or replaced by relationships)

The value set generator API accepts the IRI of a value set either in full, or relative to a baseline IRI e.g.. http://DiscoveryDataService/InformationModel#VSET_Covid1 or simple VSET_Covid1, and returns a list of concepts to be used in the query. The API supports both core concepts and original codes that have been mapped to the core concepts, depending on whether the database uses Discovery concepts or actual original codes

Get Map APIs

The information model server provides a number of APIs and utilities that support the mapping of original fields and values into the common information model.

The data mapping APIs article describes the use of the mapping server's mapping APIs to support inbound and outbound data transformation processes that involve a map between two data models. The map maker manager article describes the way that the map maker manager operates when authoring maps

For example there are a set of Mapping hint algorithms that are machine assisted approaches to improving the speed and accuracy of mapping.

Get address and Get UPRN API

As well as providing an application the service supports the UPRN REST API, which enables a system to make a call to the UPRN address matching service with an address, and receive a response with the matched address and UPRN

Distribution services

As well as accessible by APIs and applications, the information model services provide distribution facilities for content of the IM for use in subscriber data bases or subscriber applications. All content of the information model can be distributed both in bulk and delta form

Value set distributor

The value set distributor maintains tables of value sets for databases that use local instances of the Discovery information model.

This is part of the information model distribution service that runs on an application server, and is designed to detect changes to the content of the information model and regenerate the value sets from the value set definitions. The value sets are regenerated whenever a value set definition changes or whenever there is an update to the concepts within the information model.

@@ Line 67: / Line 67: @@
 In addition a [[Ontology classifier|simple ontology classifie]]<nowiki/>r is used to generate inferred relationships from the stated axioms, so that subsumption testing of the kind used in standard query can operate easily
 ===Address matching and UPRN allocation application===
-The information model service includes a web application that allows a user to match one or more addresses from a systems address file to an authoritative address, and to allocate a unique property reference number for the location of that address.
+The information model service includes a web application that allows a user to match one or more addresses from a systems address file to an authoritative address, and to allocate a Unique Property Reference Number (UPRN) for the location of that address.
 A person's health is significantly affected by their environment in which they live and work.
@@ Line 87: / Line 87: @@
 One way of resolving this problem is to assign a property identifier to an actual location and link the various addresses to it. When this is done, then by linking a person's health record to this property identifier, issues relating to the property, location and occupancy can be studied, problems can be acted on, and lives will be saved.
-The Ordnance Survey organisation has created a unique property reference number for each property in the UK. This seems the best candidate for the property identifier, known as the unique property reference number (UPRN).  Furthermore, the Ordnance Survey have created a service which provides an assured link between the UPRN and a number of addresses such as the post office's address database and the local authorities address databases.
-That being the case, if there was a service that matched the address from someone's health record to at least ONE of the addresses supplied by Address Based Premium then the UPRN can be derived and linked to the person. This will save lives.
+Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.
+Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.
+That being the case, if there was a service that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person. This will save lives.
 Discovery information model supports the mapping of health related addresses to addresses provided by an authoritative organisation, those addresses being a gold standard for pointing to a UPRN. The matching service provides two subtypes of service: