Information model query

From Endeavour Knowledge Base

Background to IMQ

Its all very well modelling data, value sets, and ontologies. What about modelling the logical definitions of data sets or patient profiles, (these being usually referred to as query)?

IMQ is designed to facilitate the exchange of logical definitions of query via APIs.

IMQ is not a new query language or domain specific language. Instead, it is simply an object model of a subset of the main stream query language CYPHER and as such can be easily interpreted into plain CYPHER, SQL or SPARQL. In addition the classes includes a set of simple predicates that map to complex query syntax at run time.

The main purpose of the object representation is to enable easier build and maintenance of user interfaces and interpretation to system specific query engine languages. Because IMQ is in object form, and transportable as JSON-LD it is ideal for APIs and interoperability via messages.

It is common practice in health care IT to model query definitions , intended for use in many different systems, in plain text documents, leaving the interpretation of the logic into run time query languages to the vendors internal informatics teams. This process creates a bottle neck and is prone to human error, partly due to ambiguity of plain language. An approach which uses machine readable definitions can reduce the time and remove much of the human error.

It is possible to model a query definition in SQL, but an SQL query brings with it the specific database schema. SQL as a language is huge and developing SQL interpreters to interpret SQL to SQL is hard. Also, it is very difficult to construct understandable user interfaces directly from SQL or SPARQL, or vice versa, and thus most search and report applications create some form of intermediate representation.

IM considers health data to be a conceptual Graph, with the modelling of types, properties, and values as nodes and relationships. This means that in query, the more natural languages are CYPHER and SPARQL, the latter being the standard language used for RDF graph query. The information model uses IRIs for its types and properties so SPARQL is a natural target. However, instance data in health records are bested suited to a property graph model and therefore the 'target' language of IMQ is CYPHER.

IMQ overview

The class structure of an IMQ query definition precisely follows the logic of a plain language description of the the criteria to be applied to filter out sets from sets, and define the output required, and thus is ideal for data set definitions. It also uses the CYPHER concepts in its construction so as to map precisely to CYPHER or other query languages.

IMQ simplifies certain complex syntactical constructs by providing some grammatical short cuts covering the following areas

  1. Subsumption query. Essential for expression constraints, flags identifiers as including descendants or ancestors.
  2. Sets, types and instances. By flagging identifiers as @set, @type, @id differentiates 'members of a set' from 'instances of a type', from instances.
  3. The latest/earliest problem. This problem is common in health query as many queries are designed to infer state from events. In main stream queries these are variously modelled as subqueries, correlated subqueries, window functions and sub collections. In IMQ these are simplified in line with a plain language question 'for things that have X within the last 6 months, get the latest X and test whether it is Y'

Query Structure

Query structure is a class model, normally serialized as JSON-LD. In the following sections, ABNF is used to illustrate the predicates (json names) as well as JSON-LD and CYPHER equivalent examples.

Query Request

An IM query consists of a query request, which includes the necessary components to define a query, as well as a set of arguments that can be passed into the query and used at run time.

Query request consists of an optional context (Json-LD) object, optional arguments and either a query or path query.

In this example the query references a stored query via its IRI. Select expand/collapse to show/hide

{
"@context" : {
  "query" : "http://endhealth.info/query#",
  "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
},
"argument": [
    {
      "parameter": "$referenceDate",
       "value": "2023-01-01"
    }],
"query" : {"@id" :"query:GMSRegisteredPatients"}
 
}

Context - prefix Map

The format for data exchange is JSON-LD and thus a context object is supported, consisting of a prefix to expansion map. Only simple maps are required

In this example two prefixes are introduced

{
"@context" : {
  "im" : "http://endhealth.info/im#",
  "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
}
}

Arguments

An argument consists of a list of parameter value pairs i.e. the name of the parameter and its value as either a string, an iri or a list or list of IRIs.

In this example, this query is parameterised by the reference date and an IMQ data model property of shacl class (i.e. a placeholder for a property in a query) and a list of IRIs for ranges as place holders for values of the property.

{
  "argument": [
    {
      "parameter": "$referenceDate",
      "value": "2023-01-01"
    },
    {
      "parameter": "aProperty",
      "valueIri": {
        "@id": "sh:class"
      }
    },
    {
      "parameter": "aRange",
      "valueIriList": [
        {
          "@id": "xsd:integer"
        },
        {
          "@id": "xsd:string"
        }
      ]
    }
  ]
}

Query

A simple overall structure with nestable elements providing an object form input and output similar to GRAPHQL . A query may contain many queries, enabling a package of queries such as a column group report or full data set .

The request may fully define the query (dynamic query) or more commonly reference a pre-existing query definition via an IRI (i.e. a preformed query definition with variables resolved to the arguments passed in at run time). The definition of a pre-existing query is obtained from the "has Definition" property of a stored query entity.

Predefined Query

A query may simply reference another query, which produces the result object from the other query

For example, the following query request gets the results of a pre-defined query for gms registered patients with a reference date of January 2023

{
"argument" : [
 {"parameter" : "$referenceDate",
  "value" : "2023-0-01"
} ],
"query" : {"@id" :"http://endhealth.info/query#GMSRegisteredPatients"}
}

Query Clauses

IMQ considers a query to be a set of steps, each step starting from a graph and resulting in a sub graph which is then the starting point of the next step. Sub queries within the steps are used to supplement the graph with results of other queries. Unions are used to merge sub graphs. Steps can reference results of other steps.

A query definition consists of a list of 1 or more match clauses and an optional return clause. Optionally a query may have one or many queries acting as further queries on the instances identified by the first match clause.


In this simple example, a query request contains a query with a single match clause identifying all instances of type patient and returning their age in years. The neo4j cypher equivalent is included showing that the object model provides context for the grammatical constructs in a more succinct language.

IMQ CYPHER
{
  "@context" : {
    "im" : "http://endhealth.info/im#"
  },
  "argument": [{"parameter": "referenceDate","value" : "2023-01-01"}],
  "query" : {
    "match" : [ {
      "@type" : "im:Patient"
    } ],
    "return" : [ {
      "property" : [ {
        "@id" : "im:age",
        "as" : "age",
        "unit" : "years"
      } ]
    } ]
  }
}
:params 
{
  "referenceDate": "2023-01-01"
}
MATCH (p:Patient)
RETURN {
          id: p.id,
          age :duration.between(p.dateOfBirth, date($referenceDate)).years
            }

With the object result being a list of entities as instances of patients with their age

{
  "entities": [
    {
      "@id": "urn:uuid:232dfsdserw23",
      "age": 74
    },
    {
      "@id": "urn:uuid:232d34gerw23",
      "age": 76
    }
  ]
}

Match

Takes a graph and identifies a subset of the graph before returning results. The clause consists of node and relationship (path) mapping out a graph traversal of any depth. Property values of nodes can be filtered using a where clause.

A match consists of : a node identifier reference, an exclusion operator, a boolean and/or operator (for unions), and optional where clause and order by clause, as well as the sub matches for any union.

N.B At run time Boolean OR match clauses would be considered as subquery UNIONs, and a match clause with an order by would be considered a sub query applying the optimised syntax for the target database language (e.g. correlated subquery, window function, init/compare etc)

In this example there is a simple match for a medicinal product or any of its descendants. i.e. searching the information model itself.

IMQ Expression constraint lnguage
{
  "match": [
    {
      "@id": "sn:763158003",
      "name": "MedicinalProduct",
      "descendantsOrSelfOf": true
    }
  ]
}
<<763158003|Medicinal Product (product)|

In this example one is looking for things that are either aged between 65 and 70, or in the query result set of Diabetics, or where they have an observation with a concept of pre-diabetes.

N.B note that there is no return statement. By default the matched instances would be returned equivalent to the cypher 'return p' command below

IMQ CYPHER
 {
    "boolMatch" : "or",
    "match" : [ {
      "description" : "aged between 65 and 70",
      "where" : [ {
        "@id" : "http://endhealth.info/im#age",
        "range" : {
          "from" : {
            "operator" : ">=",
            "value" : "65"
          },
          "to" : {
            "operator" : ">",
            "value" : "70"
          }
        }
      } ]
    }, {
      "description" : "Diabetic",
      "@set" : "http://example/queries#Q_Diabetics"
    }, {
      "description" :" pre diabetes",
      "path" : {
        "@id" : "http://endhealth.info/im#observation",
        "node" : {
          "@type" : "Observation"
        }
      },
      "where" : [ {
        "@id" : "http://endhealth.info/im#concept",
        "in" : [ {
          "@id" : "http://snomed.info/sct#714628002",
          "descendantsOfOrSelfOf" : true
        } ]
      } ]
    } ]
  }
match (p:Patient) //aged between 65 and 70
      where p.age>=65 and p.age <70 return p 
union 
match(p:Patient)-[:memberOf]->(r:ResultSet) //diabetic 
       where r.id= 'http://example/queries#Q_Diabetics' return p 
union 
match(p:Patient)-[:observation]-> (O:Observation)-[:concept]->(c:Concept) // pre diabetes 
      where c.id='http://snomed.info/sct#714628002' return p

Identifiers

Nodes, paths, where IN clauses all use identifiers, which include iris, prefixed iris or local names (for local usage). They all share a set of basic predicates as follows:

@id Is the standard JSON-LD approach for an iri and indicates the match is on an instance identified by the IRI.

Equivalent to SQL where ID=

name The rdf label or main term representing the name of the IRI. Used for human readability
variable used to declare a node variable to be used in WHERE clause or RETURN clause or subsequent MATCH clauses i.e. resolves to the set of instances found in the match clause
parameter The name of a parameter(conventionally preceded by $) to be resolved from the query arguments e.g. "parameter" : "$referenceDate"
descendantsOrSelfOf (<<) subtypes (or subclasses) are incorporated at run time. The can apply either in the from clause, the where property, or the value.
descendantsOf (<) indicates only subtypes are examined (ECL compliance)
ancestorsOf (>>) to enable the parent hierarchy to be transitively examined. Used in assessing allowable ranges and properties of concepts.
Node identifiers

Node identifiers are extended identifiers of nodes in a graph, Node identifiers in IMQ offer a convenient way of differentiating instances from types and from sets

@type is an imq convenience indicating that the match is against instances of a certain type.

Equivalent to CYPER (:TheType) or SQL FROM TheType

@set is an imq convenience indicating the match is on any member of a set.

Equivalent to SQL join Result.ID on SET where SET.ID= ID and SET.id= X

IRI format

an IRI may be a full iri string such as "http://example.org/something#anything" or an abbreviated IRI e.g. "ex:anything" and if being used locally with a default namespace can simply be "anything". If abbreviated, a context object must be provided in the query request document

Paths and Nodes- Relationships

A relationship (path) connects one node to another (or in data terms, one object to another) and is equivalent to an SQL foreign key. Match clauses enable navigation and capture of the graph to any level of depth and is particularly useful for querying properties of connected entities.

A path consists of a chain of path/ node pairs from the match clause. The end node may be omitted by default if it serves no purpose but can be useful to clarify the type of end node, or a variable for binding in a where or return clause.

In the following example, a match clause is looking for the address of a GP practice which the patient is currently registered with.

Note that the second node does not have a type and the end node is omitted. As the data model knows that the GP registration property "organisation", points to an organisation, it is not necessary to include the node type.

IMQ Cypher
{
  "match": {
    "@type": "im:Patient",
    "path": {
      "@id": "im:currrentGPRegistration",
      "node": {
        "@type": "im:GPRegistration",
        "path": "im:organisation",
        "node": {
          "path": {
            "@id": "im:address"
          }
        }
      }
    }
  }
}
match (p:Patient)-[:gpRegistration]->
                      (reg:GPRegisration)- [:organisation]->
                                              ()-[:address]->() 
return p

Path identifiers

Path identifiers consider the relationship as an instance. Sub properties may also be tested for, or indeed a variable instead if the relationship. Thus @type and @set are not suported

Where

A where clause filters nodes from a match path according to their property values. A where clause can reference nodes from within the match clause or nodes in previous match clauses by the use of node references.

A where clause consists of an optional description, an optional node reference a property identifier, a value or range or an IN value where the values are identified by IRIs. Range and value qualifiers such as operators and various arguments for properties that are functions.

A where test of a value can be absolute or relative to another value already captured in the query .

For convenience the parameter "unit" being a common argument to a value time function is included for ease.

Where clauses are also boolean i.e. where and/or where.

In this example a set of observations are filtered on having a systolic blood pressure or home systolic blood pressure within the last 6 months before the reference date. Note that the property IRIs are using local names as the requestor and receiver both know the data model being used and its namespace.

A value label is assigned for display purposes for the human user interface.

imq cypher
{
  "bool": "and",
  "where": [
    {
      "description": "Home or office based Systolic",
      "@id": "concept",
      "in": [
        {
          "@id": "http://snomed.info/sct#271649006",
          "name": "Systolic blood pressure",
          "descendantsOrSelfOf" : true
        },
        {
          "@id": "http://endhealth.info/emis#1994021000006104",
          "name": "Home systolic blood pressure",
          "descendantsOrSelfOf" : true
        }
      ],
      "valueLabel": "Office or home systolic blood pressure"
    },
    {
      "description": "Last 6 months",
      "@id": "effectiveDate",
      "operator": ">=",
      "value": "-6",
      "unit": "MONTHS",
      "relativeTo": {
        "parameter" : "$referenceDate"
      },
      "valueLabel": "last 6 months"
    }
  ]
}
match (o:Observation)-[:concept]->(c:Concept)
where c.id in['sct:271649006','em:1994021000006104'] 
      and duration.between(o.effectiveDate,$referenceDate).months>=-6

Where property identifiers

A where property extends a property identifier by also referencing a node variable where necessary via a "nodeRef". This can be used when the where clause is operating on nodes at different levels, allowing a full match path to be filtered at different levels.

In this example the where clause is testing whether patients aged >100 have had an observation in the last day

imq cypher
{
  "match": {
    "@type": "im:Patient",
    "variable": "pat",
    "path": {
      "@id": "im:observation",
      "node": {
        "variable": "obs"
      }
    },
    "where": [
      {
        "bool": "and",
        "where": [
          {
            "nodeRef": "pat",
            "@id": "age",
            "value": {
              "operator": ">",
              "value": 100
            },
            "unit": "years"
          },
          {
            "nodeRef": "obs",
            "@id": "effectiveDate",
            "unit": "days",
            "value": {
              "operator": ">=",
              "value": -1,
              "relativeTo": {
                "parameter": "$referenceDate"
              }
            }
          }
        ]
      }
    ]
  }
}
match (pat:Patient)-[:observation]->(o:Observation)
where duration(pat.age).years >100
and duration.between(o.effectiveDate,$referenceDate).days>=-1
return o

Query model specifications

Specification of query clauses are described in a set of pages.

IMQ classes are a subset of the IM meta model classes i.e. set of plain data classes.























Grammar

This is the section on grammar