Information model meta model

From Discovery Data Service
Jump to navigation Jump to search

Scope of the meta model

The model meta model consists of a set of specialised classes or 'shapes', made interoperable via the use of the semantic web languages, which use either RDF grammar and syntax (turtle) or JSON-LD as the exchange formats, with options for other inpout and output formats such as OWL functional syntax, Snomed compositional grammar or Snomed expression constraint language.

The shapes cover the following areas:

  1. An ontology of terminology concepts, which is a vocabulary and definitions of the concepts used in healthcare, or more simply put, a vocabulary of health. The ontology is made up of the world's leading ontology Snomed-CT, with a London extension and supplemented with additional concepts for data modelling. Whether concepts or Snomed-CT concepts, or the London extension, or any legacy code based concept (e.g. ICD10 or EMIS local codes or Read codes), the class structure is the same.
  2. A data model, which is a set of classes and properties, using the vocabulary, that represent the data and relationships as published by live systems that have published data to a data service that uses these models. The data model is part of the overall ontology and there is seamless boundary between the data model shapes and the terminology concepts, as both use RDF. The data model meta model uses SHACL shapes and thus conforms to the W3C SHACL recommendation.
  3. A library of business specific concept and value sets, which are expression constraints on the ontology for the purpose of query. This uses a specialised "query" or "set definition" class, and encompasses the Snomed-CT expression constraint language with which it is compatible, using a simple translation API
  4. A catalogue of reference data such as geographical areas, organisations and people derived and updated from public resources.
  5. A library of Queries for querying and extracting instance data from reference data or health records. This uses a more extended class model than 3) but fundamentally is a set definition which is mapped to mainstream query languages to get actual data.
  6. A set of maps creating mappings between published concepts and the core ontology as well as structural mappings between submitted data and the data model. This uses a context class.
  7. A set of form generators that are used by the IM application to create forms for the creation and editing of the IM entities which are instances of a meta model class.

General approach

The information model language uses RDF triples as its basic grammar i.e. subject, predicate, and object with graph making up quadruples of an RDF data set.

When viewed from the perspective of data modelling or an ontology these could also be referred to as object, property and value or class property and type.

Thus, RDF is used for the meta model and the W3C language used for the metal model is SHACL. In other words, SHACL is used to define all the meta classes, and the meta classes are used to hold the instances which are the content of the model.

The paradox of the information model content is that the instances of the meta model classes are themselves considered classes in the real world. For example when noting that a patient has 'chest pain' this is recorded as a concept which is an instance of the concept model. But 'chest pain' is also the class of all chest pains, so the same identifiers can be used for subsumption query. Likewise a clinical event in a health record is an instance of a data model meta class, but the clinical event is also a class of all clinical events such as an observation or encounter!

The model itself can be exchanged in JSON-LD, but JSON-LD can be somewhat tedious (as RDF predicates cannot directly map to standard programming language class properties) , so the use of 'local names' for business related APIs (e.g. query) is supported. In this case the JSON field names can use the local names without the full IRI, as long as the JSON complies with the model classes as described here.

Consequently, in order to make the documentation clearer, local names are used in this documentation also with links to the IM viewer or W3C equivalent.

This documentation is auto generated from the information model folders and shapes themselves, and thus the shapes can also be viewed more fully in the Information Model Directory viewing application

Types and shapes

Types (as represented as the value of the rdf: type predicate), are used to indicate the class structure of an object in the model. Thus each meta model class is marked as an RDF TYPE.

Shapes are made up of SHACL node shapes and property shapes. A 'type' is defined in the form of a shape.

Shapes are also classes (i.e. also are rdf:type -> rdf:Class), so that any shape that is a subclass of another shape inherits its properties unless overridden by sub properties, or subclass values of the parent properties.

Model categories

Given the RDF nature of the class models, it is also reasonable to consider each area of the model as a "language". For example, the W3C OWL,SHACL, SPARQL etc are referred to as "languages" by dint of the types and properties being arranged as a grammar with a vocabulary.

The information model considers the model as a set of class descriptions. Thus things like ontological concepts, data models, and query definitions are modelled as SHACL RDF shapes, but implemented in programming languages such as Java and JavaScript, as classes and objects, mapped precisely to the triples..

To bridge RDF into programming languages requires some constraints on the triple design. For example, a predicate in a triple would normally be represented as a property in JSON or C# or Java, using the local name element of the IRI as the property name, thus the triple predicates should have local names which are unique for the class.

The IM meta model can be broadly divided into the following:

Ontology of concepts

These are classes that have an IRI, a term (name), a description, a code, one or more synonyms (that may also be coded), a set of subclass axioms, and one or more role groups with roles (property value pairs) which represent existential quantifications from Description Logic. OWL Equivalent axioms t(hat represent concepts whose definition are both necessary and sufficient) are classified via a reasoner prior to ingestion into the IM. Transitive relationships such as subclass, sub property, together with replacement maps, generate a transitive "is a" closure structure for efficient subsumption query.

External language support includes OWL functional syntax and Snomed-CT compositional grammar.

Data model shapes

These are entities that have an IRI, a name, a description, a subclass relationship with another entity, and one or more properties with range types and cardinality. i.e. a very straightforward entity definition with entity to entity relationships modelled as properties that point to other entities (sh:node), and properties whose value types may be literals (sh:dataType) or value set concepts (sh:class).

Set and Query definitions

This is a class model of a logical query expressed in the form of "from a thing, or a set of things, where the thing has characteristics, select properties of the thing and related things". In other words from, where, select. The model classes cover all the main query language constructs but arranged in a way that flows top to bottom in a series of steps. The effect of this is that the model supports multiple and nested sub queries as well as hierarchical results (of the kind described in GraphQL).

The set definition model supports translation from Snomed expression constraint language.

Internally, where a query definition is applied to the IM itself (e.g. concept sets or IMAPI) then instances of the classes are converted to SPARQL. When the query definition is applied to health records then the instances are converted to SQL using an entity table map.

A query model definition generally falls into one of 4 patterns

a) From one or more "focus" concept, with role groups and roles (supporting and / or / minus, as well as optional subtypes), select the concepts entailed by the definition.

b) From a cohort of objects of a certain type. with certain property values, ordered if required, as well as related objects with property values, select the objects entailed by the definition

c) From the set defined by b) List properties of those objects (and related objects)

d) From the set defined by c) List groups of characteristics further filtered by property values. (i.e. multi group data sets)

Data sets are defined as a collection of set definitions and thus the data set can include features of many entity types.

Concept mappings

Health data from systems that hold coded or text values should, where possible, be mapped to a core concept.

Linked Health records store more than a simple concept reference, as the original meaning, including its context, should be retained in the record following any mapping.

There are two structures that may be needed in mapping a source code to a core concept and the result is an instance of a codeable concept as an entry in a health record

Legacy concept to core concept match

A legacy concept is generated from a code scheme (e.g. local codes, OPCS, ICD10) and in many cases there is a direct 'matched to' property which points to the core concept. This match can be used directly when the source scheme and source code is known.

Source context

Often, more context is needed to effectively match a source code to a core concept. A 'source context' will often be required to further disambiguate the concept. Source context includes one or more of the following.

  1. Source system
  2. Source organisation
  3. Source schema (e,g, a database schema or extract name)
  4. Source table (or other entity type such as a FHIR resource)
  5. Source field (or property depending on the source syntax)
  6. Dependent fields (other field values that affect the context of the value of the field)
  7. Source scheme. The scheme of the original code (same as the legacy concept code scheme)

Codeable concept

This is the structure generated by a mapping exercise that includes both the core matched concept and its original representation. A codeable concept are instances of the codeable concept IM meta model class as well as structures within health record classes. A codeable concept includes some of the following

  1. Original code. Whatever code was present in the relevant field in the original resource. The code may be from a scheme, an enumerated type or some other table specific code.
  2. Original code scheme. In many cases the code is derived from a code scheme, which may be an international, national, system specific, or organisational specific code scheme.
  3. Original code term. This is the 'look up' display term for the code. This may or may not be the term that the user sees.
  4. Original qualifier term. This may be a text term for an entry which typically qualifies or modifies the meaning of the entry and results in a different concept map. Typical examples may be 'negative' or 'not present' etc.
  5. Original text. This (if present) is the term that the user would see in relation to this code. In many systems this is absent as the term is included in the text entry,
  6. Matched to concept. The concept that this codeable concept matches too in this instance. Note that this is not the same as the simpler more direct concept to concept match.

Health record maps

This is a class model of source to target maps for entities, fields and values, taking account of source context. This is used as the basis of data transformation from source messages or files into the target common data model.

Transactional classes

Used to update or carry requests to and from the information model or record store.

Meta model class specification

The meta model is specified in the article meta model class specification