Information modelling

From Endeavour Knowledge Base
Revision as of 09:46, 19 March 2023 by DavidStables (talk | contribs)

Background

To make sense of huge variation with thousands of data types and millions of codes from thousands of providers using scores of different systems, it is useful to create an information model covering a data model, an ontology of concepts, value sets bound to the data model.

It is useful to visualise the information model via publicly accessible web application and a set of APIs that enable users and systems to use the data within the model.

Having established such a model , it is then possible to construct logical definitions of query and concept sets that can then be used on the data published from the sources. The information model thus contains models of set definitions and queries.

Services that link and normalise the data can use the model and/ or the ontologies within it, creating maps between source data and a common model.

This articles and linked pages herein describe one approach to an information model based on linked data principles as established as part of the idea of a semantic web.

Most models in healthcare either use bespoke health care languages such as those used by HL7 or openEHR, or conventional entity diagrams with a separate terminology server. The approach used in the Endeavour information model is to adopt and adapt the Main stream semantic web languages, based on a view of health data as a graph with the nodes and edges modelled as RDF IRIs.

The model is not a new standard or an invention of new concepts. Instead, the content of the Endeavour IM incorporates concepts from a number of recognised sources including:

a) The main stream health ontology Snomed-CT with extensions to accommodate the unmapped NHS data dictionary attributes, local codes, and code taxonomies such as OPCS, ICD10 as well as the legacy mappings to Read 2.

b) The main stream messaging model resources such as FHIR making the IM FHIR compatible via simple transforms.

c) The main stream query definitions such as QOF rules and dataset definitions.

General approach

The IM is a representation of the meaning and structure of data held in the electronic records of the health and social care sector, together with libraries of query, value sets, concept sets, data set definitions and mappings. These are computable abstract logical models, not physical schemas. "Computable" means that operational software operates directly from the model artefacts, as opposed to using the model for illustration purposes. As a logical model it models data that may be physically held any a variety of different types of data stores, including relational or graph data stores. Because the model is independent of the physical schemas, the model itself has to be interoperable and without any proprietary lock in.

The IM is a broad model that integrates a set of different approaches to modelling using a common ontology. The components of the model are:

  1. A set of ontologies, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontologies is made up of the world's leading ontology Snomed-CT, with a London extensions, various code based taxonomies (e.g. ICD10, Read, supplier codes and local codes)
  2. A common data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by live systems that have published data, Note that this data model is NOT a standard model but a collated set of entities and relationships bound to the concepts based on real data, that are mapped to a common model.
  3. A library of business specific concept value sets, (aka reference sets) which are expression constraints on the ontology for the purpose of query
  4. A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
  5. A library of Data set (query) definitions for querying and extracting instance data from the information model, reference data, or health records.
  6. A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model.
  7. An open source set of utilities that can be used to browse, search, or maintain the model.

Modelling languages

To build a model, it is necessary to use building blocks. In computing, this means the use of high level languages of some kind.

The approach taken in the Endeavour IM is to use the Semantic Web languages, thus ensuring compatibility with mainstream web based approaches.

Thus the information model languages are thus constraints of the semantic web languages, with vocabularies tailored to the IM requirements.

Information model languages

The Semantic Web languages used to build the various components of the information models

Health query definition

Pages describing an approach to modelling query definitions with outputs in a machine readable form, covering the majority of health data query requirements.

Information model meta model

The class model (shapes model) of the classes used to hold the model content.

Mapping concepts and transforming published data

Introduces the approaches to matching and mapping concepts and the structural maps used in transforming published data.

Architectures -

A high level overview of the architectures that the technologies contribute to

GitHub repositories

Descriptions and information relating to the application source code, .