Information model query: Difference between revisions

From Endeavour Knowledge Base
Line 374: Line 374:
|An alias that can be referred to elsewhere in the query e.g. for testing for relativeTo etc
|An alias that can be referred to elsewhere in the query e.g. for testing for relativeTo etc
|-
|-
|variable
|where
|0..1
|0..1
|String
|Where clause
|A variable name passed into the query e.g. "$this" or "$referenceDate"
|Further filters on the results by performing a filter on the property values.
|-
|-
|includeSubtypes
|select
|0..1
|0..*
|true/false
|Select clause
|Any sub properties are included
|Selects properties from the value object of the parent select property
|-
|includeSypertypes
|0..1
|true/false
|if true then super properties will be included
|-
|bool
|0..1
|enum
|"and" , "or" , "not" applying to nested where  clause
|-
|notExist
|0..1
|true/false
|whether the property exists or not i.e. a test for not null
|-
|with
|0..1
|With clause
|A common table expression i.e. a subquery that enables sorting and ordering as a further filter for the remaining where clause to operate from.
Typically used to get the latest X entries.
|-
|value
|0..1
|String or number
|Tests for the value being X, further modified by the operator
|-
|-
|operator
|orderBy
|0..1
|0..*
|enum
|Order Limt clause
|= , >=, <= >, <  , startsWith, contains operators acting on the value
|List of properties to order by and any limit to the count of rows when returned e.g. latest 1
|-
|-
|relativeTo
|relativeTo

Revision as of 14:59, 6 February 2023

Background to IMQ

Health data is generally held in relational, document, or graph databases, each of which come with a high level query language used to extract or create data. The best known of these languages is SQL, with other languages such as CYPHER, SPARQL, ELASTIC DSL, less well known.

Each of these languages are designed "close to the tin" so that the query engines can perform highly optimised query by transforming these high level languages to lower level languages such as C or Java.

Is is very difficult to construct understandable user interfaces directly from SQL or SPARQL,or vice versa, and nearly all search and report systems create some form of intermediate representation such as MS Power BI DAX . In many cases, logical definitions of queries remain in free text, supported by spreadsheets.

IM Query is designed to operate as an intermediary between plain language and the underlying run time query languages. The model supports query of both the information model and any health records that map to a data model as defined within the IM.

As an RDF Graph knowledge base, the information model could (and can) be directly queried using SPARQL. The IM also holds text data which can be queried directly using open Search or elastic.

However, as Health records are likely to be stored as relational, or at least SQL compatible data bases, querying health records that are aligned with the model will require SQL to query them.

Consider the following issues with SQL/SPARQL:

  • Directly authoring SQL and SPARQL languages require a high degree of skill and health query in particular needs heavily nested subqueries , including some of the more advanced techniques such as correlated query or window functions.
  • Translating a user oriented intuitive query builder into SQL or SPARQL directly and in reverse is very difficult. Most query applications use an intermediate language from which the queries are then generated. Examples include GraphQL or Power BI DAX and M.
  • Enabling direct query via SPARQL end points or SQL APIs can result in crippling performance problems and are hard to police.

Consequently the IM provides a pragmatic JSON based domain specific grammar to represent query definitions, which help bridge the gap between a plain language user interface representations and the run time query. This DSL, being in plain JSON, can be used to exchange query definitions across multiple instances via standard REST APIs.

The IM, as open source, also provides reference software showing how SQL or SPARQL or OpenSearch Query can be generated from the DSL and how a plain language or diagrammatic interpretation can be produced from the DSL. The reference software also shows the converse i.e. the generation of the DSL from plain language.

It is not expected that IMQ should be understood by clinicians, or even those familiar with SQL. Instead , as an object model of query constructs, it enables interoperability of query definitions across standard REST APIs. Together with reference transforms to SQL or SPARQL these can then be used against any target schema that holds data that is modelled in the IM.

The logic of an IMQ query definition precisely follows the logic of a plain language description of the output required, and the criteria to be applied. Thus IM query is one way of fully documenting the logic of health query, in a human and machine readable format.

IMQ overview

The structure of the model closely follows the standard patterns as illustrated in the main stream languages of SQL, SPARQL and CYPHER.

The schema specific elements of these main stream languages are replaced with graph pattern based constructs, and some of the more complex constructs commonly used in SQL (such as correlated subqueries and window functions) are simplified into short cut labels that can better illustrate what the query is trying to do and enabling query interpreters to adopt the most performant approaches at run time.

An IM query definition follows a straightforward set of predicates which can be summarised as:

From a set of things, where those things have certain properties and values, Select the properties of those things, where the property has certain values.

i.e. a familiar approach of from/where/select/where clauses, each type of clause being made up of the necessary predicates to enable most rule based health queries to be modelled.

IMQ Grammar

The approach to illustrating the grammar here is to omit the json quote overhead for clarity, but because the grammar is an object model, it retains the json meta data so that object, name and value json structures can carry the logic.

Overall structure

A simple overall structure with nestable elements providing an object form input and output similar to GRAPHQL . A query may contain many queries, enabling a package of queries such as a column group report or full data set .

A high level query structure is as follows:

{
"from" : {
          "where" : {
             "with" : {},
             "where" : {}},
         "from" : {}
          },
"select" :[{
           "where" : {},
           "select" :[ {}]
          }],
"subQuery" :[ {}]

}

Simple example

Get me the full name of all patients aged >= 18 years

{
"from" : {"id": "Patient",
          "sourceType" : "type",
           "where" :{"id" : "age",
                      "operator" : ">=",
                      "value" : 18,
                      "unit" : "YEARS"}
         },
"select" : [ {"id" :"fullName"}]
}

Boolean operators

The boolean operators 'and' , 'or' and 'not' can operate in either the from or where clause, using the "bool" predicate. 'and' is the same as an intersection (all must be true), or is a union (at least one true) and 'not' is a union (Any may be true). Further nesting can resolve boolean combinations (e.g. not/and)

{"where" : {
       "bool" : "and",
       "where" : [ { "id" : "age",
                     "operator" : ">=",
                     "value" : 18,
                     "unit" : "year"},
                    {"id" : "statedGender",
                     "in" : [ {"@id" : "im:905031000252103",
                               "name" : "Male (stated gender)" } ] } ] }

Subsumption query

A key differentiator of IMQ is the support for a variety of subsumption patterns in both the from and where clause. This makes IMQ compliant with expression constraint language when applied to concepts, but can also be used to incorporate subtypes of data model types.

The indicators are:

  • Include Subtype ? true or false as to whether subtypes (or subclasses) are incorporated at run time. The can apply either in the from clause, the where property, or the value.
  • Exclude self? true or false, rarely used but indicates only subtypes are examined (ECL compliance)
  • Include super types? True or false to enable the parent hierarchy to be transitively examined. Used in assessing allowable ranges and properties of concepts.
IMQ ECL
{
  "from": {
    "@id": "sn:763158003","name": "Medicinal product (product)","includeSubtypes" : true,
    "where": {
      "bool": "and",
      "where": [
        {
          "@id": "sn:127489000","name": "Has active ingredient (attribute)", "includeSubtypes": true,
          "in": [
            {
           "@id": "sn:372665008","name": "Non-steroidal anti-inflammatory agent","includeSubtypes" : true },
          "anyRoleGroup": true
        },
        {
          "@id": "sn:411116001","name": "Has manufactured dose form (attribute)","includeSubtypes": true,
          "in": [
            {"@id": "sn:385268001","name": "Oral dose form (dose form)","includeSubtypes": true}
          ],
          "anyRoleGroup": true}]}}}
<<763158003| Medicinal product (product)|
    : <<127489000 | Has active ingredient (attribute)| =  
                   <<372665008| Non-steroidal anti-inflammatory agent|, 
       << 41116001|Has manufactured dose form (attribute)| = 
                <<385268001|Oral dose form (dose form)|

From Clause

The from clause determines a set of objects whose properties can be further filtered by the where clause.

The from clause incorporates the from/join/subquery elements of SQL and the where pattern clause of SPARQL. From provides a number of syntactical short cuts by viewing types as a graph with inheritance.

Examples (using non quoted syntax)

{ from : {
          description : "Get concepts that are either Procedures or observations",
          bool : or,
          from : [{@id : "sn:71388002", includeSubtypes: true},

                  {@id : "sn:363787002", includeSubtypes: true} ]}

// Note that the source types are not needed as the default is from instance.
{ from : {
          description : "Get patients that are on the diabetic or hypertension register",
          bool : or,
          from : [{@id : "q:HypertensionRegister", name : "patients on hypertension register",
                    sourceType : "set"},
                  {@id : "q:DiabetesRegister", name : patients on diabetes register", 
                   sourceType : "set"}] }

// The source type is a set which is the set of patients in the query result
Predicate cardinality range explanation
description 0..1 String description of this clause, used in the user interface or to explain the purpose of a clause
id 0..1 string A short form for a type or instance identifier
iri 0..1 url The IRI of the type or instance
variable 0..1 String variable is resolved from an argument parameter passed into the query, enabling queries to operate as template e.g. from.. "$this"
sourceType 0..1 enum "type" means from instances of type (equivalent to FROM TABLE in SQL , rdf:type in RDF)

"set" means the members of a set, such as a base population or value set. A run time query would determine whether the set needs to be generated or use one previously generated

"instance" (default) means an actual instance e.g. a concept or single object, used mostly for subsumption or single object select

includeSubtypes 0..1 true/false Any subtypes or subclasses are included
excludeSelf 0..1 true/false If subtypes true then the actual instance is not included
includeSypertypes 0..1 true/false if true then all ancestors will be included
bool 0..1 enum "and" , "or" , "not" applying to nested from clause
where 0..* Where clause Filters on properties of the objects found by the from clause
from 0..* From clause Enables boolean from clauses

Where clause

The where clause is a filter on the presence or absence of properties and their values, or the presence of absence of relationships with their objects, that may be further filtered.

{ description : "patients with home addresses with post code starting with YO",

from : {
   id :Patient,
   sourceType : "type",
   where : {
         description : "home address with post code starting with YO",
         id :  "homeAddress",
         where : { id : "postCode",
                   operator : "startsWith",
                   value : "YO"
Predicate cardinality range explanation
description 0..1 String description of this clause, used in the user interface or to explain the purpose of a clause
id 0..1 string A short form for a property identifier
iri 0..1 url The iri of the property
alias 0..1 String An alias that can be referred to elsewhere in the query e.g. for testing for relativeTo etc
variable 0..1 String A variable name passed into the query e.g. "$this" or "$referenceDate"
includeSubtypes 0..1 true/false Any sub properties are included
includeSypertypes 0..1 true/false if true then super properties will be included
bool 0..1 enum "and" , "or" , "not" applying to nested where clause
notExist 0..1 true/false whether the property exists or not i.e. a test for not null
with 0..1 With clause A common table expression i.e. a subquery that enables sorting and ordering as a further filter for the remaining where clause to operate from.

Typically used to get the latest X entries.

value 0..1 String or number Tests for the value being X, further modified by the operator
operator 0..1 enum = , >=, <= >, < , startsWith, contains operators acting on the value
relativeTo 0..1 String When testing a value, points to a variable bound to a set of results from a where or with clause somewhere in the query.
unit 0..1 String unit qualifier with values for dates such as YEAR, MONTH etc
in 0..1 From clause One or more values that may be an instance, type, or set. Equivalent to "Where in (..)" or Where in ,,,(Select...)
notIn 0..1 From clause A negation test for a set of value objects
range 0..1 Range clause A from and To clause with values e.g. from X to Y relative to Z
where 0..* Where clause further Filters on properties of the objects pointed to by the property

Select clause

This instructs the engine on the properties to export from the objects identified by the from/where clause. Select clause is nestable i.e. where objects have properties that are objects, a nested select clause operates on the properties of the contained objects.

As in the from clause, a select clause can further manipulate and filter the properties according to their values.

Predicate cardinality range explanation
description 0..1 String description of this clause, used in the user interface or to explain the purpose of a clause
id 0..1 string A short form for a property identifier
iri 0..1 url The iri of the property
alias 0..1 String An alias that can be referred to elsewhere in the query e.g. for testing for relativeTo etc
where 0..1 Where clause Further filters on the results by performing a filter on the property values.
select 0..* Select clause Selects properties from the value object of the parent select property
orderBy 0..* Order Limt clause List of properties to order by and any limit to the count of rows when returned e.g. latest 1
relativeTo 0..1 String When testing a value, points to a variable bound to a set of results from a where or with clause somewhere in the query.
unit 0..1 String unit qualifier with values for dates such as YEAR, MONTH etc
in 0..1 From clause One or more values that may be an instance, type, or set. Equivalent to "Where in (..)" or Where in ,,,(Select...)
notIn 0..1 From clause A negation test for a set of value objects
range 0..1 Range clause A from and To clause with values e.g. from X to Y relative to Z
where 0..* Where clause further Filters on properties of the objects pointed to by the property























Grammar

This is the section on grammar