Information model query

From Endeavour Knowledge Base

Background to IMQ

Health data is generally held in relational, document, or graph databases, each of which come with a high level query language used to extract or create data. The best known of these languages is SQL, with other languages such as CYPHER, SPARQL, ELASTIC DSL, less well known.

Each of these languages are designed "close to the tin" so that the query engines can perform highly optimised query by transforming these high level languages to low level languages.

Is is very difficult to construct understandable user interfaces directly from SQL or SPARQL, and nearly all search and report systems create some form of intermediate representation such as DAX or M (MS Power BI) . In many cases, logical definitions of queries remain in free text, supported by spreadsheets.

IM Query is designed to operate as an intermediary between plain language and the underlying run time query languages. The model supports query of both the information model and any health records that map to a data model as defined within the IM.

As an RDF Graph knowledge base, the information model could (and can) be directly queried using SPARQL. The IM also holds text data which can be queried directly using open Search or elastic.

However, as Health records are likely to be stored as relational, or at least SQL compatible data bases, querying health records that are aligned with the model will require SQL to query them.

Consider the following issues with SQL/SPARQL:

  • Directly authoring SQL and SPARQL languages require a high degree of skill and health query in particular needs heavily nested subqueries , including some of the more advanced techniques such as correlated query or window functions.
  • Translating a user oriented intuitive query builder into SQL or SPARQL directly and in reverse is very difficult. Most query applications use an intermediate language from which the queries are then generated. Examples include GraphQL or Power BI DAX and M.
  • Enabling direct query via SPARQL end points or SQL APIs can result in crippling performance problems and are hard to police.

Consequently the IM provides a pragmatic JSON based domain specific grammar to represent query definitions and help bridge the gap between a plain language user interface representation and the run time query. This DSL, being in plain JSON, can be used to exchange query definitions across multiple instances via standard REST APIs.

The IM, as open source, also provides reference software showing how SQL or SPARQL or OpenSearch Query can be generated from the DSL and how a plain language or diagrammatic interpretation can be produced from the DSL. The reference software also shows the converse i.e. the generation of the DSL from plain language.

It is not expected that IMQ must be understood by clinicians, or even those familiar with SQL. Instead , as an object model of query constructs, it enables interoperability of query definitions across standard REST APIs. Together with reference transforms to SQL or SPARQL these can then be used against any target schema that holds tdata that is modelled in the IM.

The logic of a query definition precisely follows the logic of a plain language description of the output required, and the criteria to be applied. Thus IM query is one way of fully documenting the logic of health query, in a human and machine readable format.

IMQ overview

The structure of the model closely follows the standard patterns as illustrated in the main stream languages of SQL, SPARQL and CYPHER.

The schema specific elements of these main stream languages are replaced with graph pattern based constructs, and some of the more complex constructs commonly used in SQL (such as correlated subqueries and window functions) are simplified into short cut labels that can better illustrate what the query is trying to do and enabling query interpreters to adopt the most performant approaches at run time.

An IM query definition follows a straightforward set of predicates which can be summarised as:

From a set of things, where those things have certain properties and values, Select the properties of those things, where the property has certain values.

i.e. a familiar approach of from/where/select/where clauses, each type of clause being made up of the necessary predicates to enable most rule based health queries to be modelled.

IMQ Grammar

The approach to illustrating the grammar here is to omit the json quote overhead for clarity, but because the grammar is an object model, it retains the json meta data so that object, name and value json structures can carry the logic.

Overall structure

A simple overall structure with nestable elements . Q query may contain many queries enabling a package of queries such as a column group report or full data set

{
"from" : {
          "where" : {},
         "from" : {}
          },
"select" :{
           "where" : {}
          },
"subQuery" : {}
}

Simple example

Get me the full name of all patients aged >

Cohort definition

Get me patients registered as a GMS patient with a general practice on the reference date of the query

  1. things of type person
  2. Their GP registration entries that consist o
  3. a regular GMS patient type
  4. a start date before the reference date
  5. either no end date (still registered) or an end date after the reference date (were registered at the time)
IMQ Example
IMQ Result json
{"select" : {
  "entityType" : { "@id" :"Person"},
  "match" : [ {
    "pathTo" : [ {"@id": "isSubjectOf"}],
    "entityType" : { "@id" : "GPRegistration"},                        
    "property" : [ {
      "@id" : "patientType",
      "isConcept" : {"@id" : regular patient}},                           
      {    
      "@id" : "effectiveDate",
      "value" : { 
         "comparison" : "LESS_THAN_OR_EQUAL",          
          "valueData" : "$referenceDate"}}], 
    "orProperty" : [ {                                            
      "notExist" : true,
      "@id" : "endDate"},                                           
      {
      "@id" :"endDate",
      "value" : {                                            
         "comparison" : "GREATER_THAN",
         "valueData" : "$referenceDate"}}]}}
{[
  {id : 1},
  { id : 2}]

The date range 'latest from and test' pattern

Get me patients whose latest systolic blood pressure within the last 18 months was taken in the surgery and was > 140.

  1. Select things of type person
  2. Look for observations and check for
  3. a systolic blood pressure
  4. within 18 months prior to the reference date,
  5. Then sort and select the latest
  6. Test whether it is office based (and not therefore not home based)
  7. Test to see if it is over 140


N.B only the match clause is illustrated here. The outer select and boolean match clauses would be present in the real query.

IMQ Example
IMQ Result json
{"match" : {
  "pathTo": [ {"@id" : "isSubjectOf"}],
  "entityType" : {"@id": "Observation"},
  "property": [ {
    "@id" : "concept",
    "inSet" : {"@id" : "Systolic arterial measurements"},
    {
     "@id" : "effectiveDate",
     "function" : {
        "@id" :"TimeDifference", 
        "argument": { [{
          "first date":  "$this"}, 
        { "second date" :"referenceDate"},
        { "units" : "MONTHS"}]},
     "value" : {
        "comparison" : "greater or equal", 
        "valueData" : -18}],
     "orderBy" : { 
         "@id" :"effectiveDate",
         "direction" : "descending",
         "count 1"},
     "testProperty" [ {
        ":@id" :"concept",
        "inSet" : { "@id" : "Office based systolic blood pressures"}},
        { 
         "value" : { "comparison" : "greater", "valueData" : 140}}]}]}
{[
  {id : 1},
  { id : 2}]

Expression constraint ECL

Get me concepts that are oral none steroidal ant inflammatory products

ECL for which is

<<763158003 | medinal products  :

<<127489000 : has active ingredient = <<372665008 | none steroidal anti inflammatory agent,

<<411116001 : manufactured dose form = <<385268001 | oral

IMQ Example
IMQ
{"match" : [{
  "entityId" : { 
      "@id" : "medicinal product",
      "includeSubtypes": true},
   "property" : [{ 
     "@id" : "has active ingredient",
     "includeSubtypes" : true,
     "isConcept" : {
          "@id" : "non steroidal anti inflammatory agent",
          "includeSubtypes" : true}},
     {
       "@id" : "manufactured dose form",
       "includeSubtypes" : true,
        "isConcept" : {
             "@id" :"oral",
              "includeSubtypes" :  true} ]}

Data Sets

Get me a data set for a cohort of patients with 120 fields grouped into 50 groups

This is a line level report from a list of patients, with data items from the health record filtered by criteria.

(Only the first couple are shown for brevity)

  1. Select things of type person,.
  2. must be in the search result "Cardiovascular bleed search"
  3. Patient details  : Full name, age
  4. Aspirin details : date, medication for Aspirin

a) issued in the last 6 months,

b) still an active authorised medication.

c) Get the latest

IMQ example
IMQ
{"query": {
   "select" :{
      "entityType" : {"@id" : "Person"},
      "match" : [{ "entityInSet" : { "@id": "Cardiovascular bleed search"}}],
      "subselect" : [{
         "name" : "patient details",
          "property" : [ {
              "@id": "fullName"},
              {"@id" : "age"},
           {
            "name" : "Aspirin",
           "pathTo" : "hasEntry",
            "property" : [{
               "@id": "effectiveDate","alias" : "date of issue"},
               {"@id": "medication",
                "select" : {
                   "property" : [{"@id": "rdfs:label","alias" : "name"}]}}],
          "match" :[{ 
            "entityType": { "@id" : "medicationRequest"},
             "property": [{
	       "@id" : "medication",
               "isConcept" : { 
                  "@id" : "Aspirin products",
                  "includeSubtypes" : true},
              {
               "@id" : "effectiveDate",
                "value": {
                    "comparison": "LESS_THAN_OR_EQUAL",	"valueData": 6},
                 "function": {
                    "@id": "Time Difference",
		       "argument": [
                        {
                          "parameter": "units", 
                          "valueData": "MONTH"},
			{
                           "parameter": "firstDate", 
                          "valueVariable": "$this"},
			{
                          "parameter": "secondDate", 
                          "valueVariable": "$referenceDate"}]},
               {
               "pathTo": [{"@id": "authorisation"},
               "@id": "course Status",
	       "isConcept": {"@id" : "Active"}}],
	      "orderLimit": {
                 "orderBy": {"@id": "effective Date"}}]}}

Connections between entities

A common use case is to find out whether there is a connection between an entity and another entity.

For example, We wish to support the idea that an enquirer, when building a query,  will be able to enter a term which matches any concept in the information model. Once having selected the concept, they may have one of the following intentions:

a)      The concept is a thing of interest, and they want to know something about it and in what way it may be related to health record entries.

b)     They have an intention to report on some record entries that hold something relating to the concept but have not yet refined their query i.e. want the application to help.

We have decided to support these by helping the user construct a query of health records that are in someway related to the selected concept, whilst at the same time illustrating how the concept they have selected relates to record entries.

Examples of things they may enter  are “Atenolol”, “date of birth”, “post code”, “Angina pectoris”, “beta blockers”.

There are some basic objectives for the application when helping users build a query. These are:

1.       We wish to constrain query to a finite set of entity types, for example a patient or organisation. This is for performance reasons and based on the assumption that most business requirements involve the identification of properties relating to a certain entity type. We refer to this type as the “main entity” and we have decided, based on experience, that there may be around half a dozen of these.

2.       We wish to present to the user a pretty much preformed query definition as quickly as possible, allowing them to tweak the query a bit e..g. by expanding.  This query definition will also illustrate the relationship between the main entity and Atenolol.


The use case has some special features:

a)      We wish to include the data model entities and data model properties in the path

b)     The concept may be a property of a a data model entity, a concept that is a member of a value set against a property in the data model, or may be the value of a property of a concept.

c)      If the selected entity is a concept property concept, We DO NOT wish to include the concepts that have the properties as there may  be too many. For example, many drugs contain Atenolol so we don’t want to list them.

d)     We DO NOT need to include the value set identifiers in the response as the data model properties will suffice. In other words, a property “type of observation” points to the value set for observations, so including the value set would be pointless.

e)      We DO wish to include the properties that link a concept property value to the  concept who are in the value sets. For example we do may wish to look for types of medication that have active ingredients of atenolol.

f)       If the concept we are looking for is a value of a concept property and there is a super property in common only show the super property. For example, active ingredient is a super property of specific active ingredient and precise active ingredient.

Examples may be

Selected concept Data model relationships
Atenolol Patient -> has entry -> Prescription - >type of medication -> has active ingredient- Atenolol

Patient -> has entry -. Allergy -> type of allergy -> causative agent - >atenolol

Date of birth Patient

Practitioner in role

Post code Patient -> address- > Address

Patient - > provider -> Organisation - >address - >Address

Address Patient -> address


The UI can then use this information to offer clarifications to the user and auto construct a query and concept set.

The following examples considers how to get from one of the main entities (e.g. patient) to a constituent of a drug.

Health records record actual or virtual products so the system needs to link from a patient through to the types of record that record products to products that have active ingredients of atenolol.

IMQ example
IMQ
  {
  "query" : {
    "name" : "Path query- Main record types to Atenolol - substance",
    "select" : {
      "entityId" : {
        "@id" : "http://endhealth.info/im#MainRecordType"
      },
      "pathToTarget" : {
        "name" : "atenolol substance",
        "depth" : 3,
        "@id" : "http://snomed.info/sct#387506000"
      }
    }
  }
}

Language grammar

The query language grammar is described in the article Language grammar and syntax set in the context of the full IM grammar
































Grammar

This is the section on grammar