Information model query: Difference between revisions

From Endeavour Knowledge Base
No edit summary
Line 40: Line 40:
A string of text used to filter query entities whose terms  fuzzy match the text search string. Multi words are supported
A string of text used to filter query entities whose terms  fuzzy match the text search string. Multi words are supported


textSearch : "Angina"
div class="toccolours mw-collapsible mw-collapsed">
Example use of text search. Select Expand to show:
<div class="mw-collapsible-content">
<syntaxhighlight lang="json">
{
  "textSearch" : "Angina"
}
{
  "textSearch" : "ches pai"
}
</syntaxhighlight></div></div>


textSearch : "ches pai"


=== Query ===
=== Query ===

Revision as of 16:56, 14 February 2023

Background to IMQ

Health data is generally held in relational, document, or graph databases, each of which come with a high level query language used to extract or create data. The best known of these languages is SQL, with other languages such as CYPHER, SPARQL, ELASTIC DSL, less well known.

Each of these languages are designed "close to the tin" so that the query engines can perform highly optimised query by transforming these high level languages to lower level languages such as C or Java.

Is is very difficult to construct understandable user interfaces directly from SQL or SPARQL,or vice versa, and many search and report applications create some form of intermediate representation.

IM Query is designed to operate as an intermediary between plain language and the underlying run time query languages. IMQ is a constraint of query languages and is designed to enable both the query of the information model itself, and any health records that map to a data model as used by IMQ.

IMQ provides a pragmatic JSON based object model and a more succinct text to represent query definitions, and IM provides implementations in SQL and SPARQL via the use of interpreters.

The logic of an IMQ query definition precisely follows the logic of a plain language description of the output required, and the criteria to be applied.

IMQ overview

The structure of the model closely follows the standard patterns as illustrated in the main stream languages of SQL, SPARQL and CYPHER with object nesting supported to enable hierarchical object retrieval, akin to GRAPHQL.

Implementation syntax includes a JSON object format (IMQ-JSON) and a succinct text language equivalent (IMQ-text)

An IM query definition follows a straightforward set of predicates which can be summarised as:

"From a set of things, where those things have certain properties and values, Select the properties of those things, where the properties have certain values."

i.e. a familiar approach of from/where/select/where clauses, each type of clause being made up of the necessary predicates to enable most rule based health queries to be modelled.

IMQ Structure

An IM query consists of a query request, which includes the necessary components to define a query, as well as a set of arguments that can be passed into the query and used at run time.

Query Request

textSearch argument ?* ( query | update | pathQuery)

IMQ supports conventional query for extract, query based updates (deletion) and a special 'path query' for determining paths between two classes. In addition to rule based query, a free text search using Lucene indexing is supported providing a term filter on the query rules.

Queries and updates are initiated by a Query Request passed as a payload to the API.

A query request can contain a set of arguments or parameter variables passed into the query to be used at tun time.

text Search

A string of text used to filter query entities whose terms fuzzy match the text search string. Multi words are supported

div class="toccolours mw-collapsible mw-collapsed"> Example use of text search. Select Expand to show:

{
  "textSearch" : "Angina"
}
{
  "textSearch" : "ches pai"
}


Query

A simple overall structure with nestable elements providing an object form input and output similar to GRAPHQL . A query may contain many queries, enabling a package of queries such as a column group report or full data set .

The request may fully define the query (dynamic query) or more commonly reference a pre-existing query definition via an IRI (i.e. a preformed query definition with variables resolved to the arguments passed in at run time). The pre-existing query definition is obtained from the "hasDefinition" property of a stored query entity.

High level query structure is as follows. Select Expand to show:
{
"from" : {
          "where" : {
             "with" : {},
             "where" : {}},
         "from" : {}
          },
"select" :[{
           "where" : {},
           "select" :[ {}]
          }],
"subQuery" :[ {}]

}

Simple example

Get me the full name of all patients aged >= 18 years. Select Expand to show:

{
"from" : {"id": "Patient",
          "sourceType" : "type",
           "where" :{"id" : "age",
                      "operator" : ">=",
                      "value" : 18,
                      "unit" : "YEARS"}
         },
"select" : [ {"id" :"fullName"}]
}

Boolean operators

The boolean operators 'and' , 'or' and 'not' can operate in either the from or where clause, using the "bool" predicate. 'and' is the same as an intersection (all must be true), or is a union (at least one true) and 'not' is a union (Any may be true). Further nesting can resolve boolean combinations (e.g. not/and)

Example boolean operators. Select Expand to show:
{"where" : {
       "bool" : "and",
       "where" : [ { "id" : "age",
                     "operator" : ">=",
                     "value" : 18,
                     "unit" : "year"},
                    {"id" : "statedGender",
                     "in" : [ {"@id" : "im:905031000252103",
                               "name" : "Male (stated gender)" } ] } ] }


Subsumption query

A key differentiator of IMQ is the support for a variety of subsumption patterns in both the from and where clause. This makes IMQ compliant with expression constraint language when applied to concepts, but can also be used to incorporate subtypes of data model types.

The indicators are:

  • Include Subtype ? true or false as to whether subtypes (or subclasses) are incorporated at run time. The can apply either in the from clause, the where property, or the value.
  • Exclude self? true or false, rarely used but indicates only subtypes are examined (ECL compliance)
  • Include super types? True or false to enable the parent hierarchy to be transitively examined. Used in assessing allowable ranges and properties of concepts.
Example subsumption on the IM itself Json and ECL equivalent. Select Expand to show:
IMQ ECL
{
  "from": {
    "@id": "sn:763158003","name": "Medicinal product (product)","includeSubtypes" : true,
    "where": {
      "bool": "and",
      "where": [
        {
          "@id": "sn:127489000","name": "Has active ingredient (attribute)", "includeSubtypes": true,
          "in": [
            {
           "@id": "sn:372665008","name": "Non-steroidal anti-inflammatory agent","includeSubtypes" : true },
          "anyRoleGroup": true
        },
        {
          "@id": "sn:411116001","name": "Has manufactured dose form (attribute)","includeSubtypes": true,
          "in": [
            {"@id": "sn:385268001","name": "Oral dose form (dose form)","includeSubtypes": true}
          ],
          "anyRoleGroup": true}]}}}
<<763158003| Medicinal product (product)|
    : <<127489000 | Has active ingredient (attribute)| =  
                   <<372665008| Non-steroidal anti-inflammatory agent|, 
       << 41116001|Has manufactured dose form (attribute)| = 
                <<385268001|Oral dose form (dose form)|

From Clause

The from clause determines a set of objects whose properties can be further filtered by the where clause.

The from clause incorporates the from/join/subquery elements of SQL and the where pattern clause of SPARQL. From provides a number of syntactical short cuts by viewing types as a graph with inheritance.

Structure of a from clause is :

from description? id? @id? variable? sourceType? includeSubtypes? excludeSelf? includeSupertypes? (bool from*)? where?

Examples (using non quoted syntax). Select Expand to show:

{ from : {
          description : "Get concepts that are either Procedures or observations",
          bool : or,
          from : [{@id : "sn:71388002", includeSubtypes: true},

                  {@id : "sn:363787002", includeSubtypes: true} ]}

// Note that the source types are not needed as the default is from instance.
{ from : {
          description : "Get patients that are on the diabetic or hypertension register",
          bool : or,
          from : [{@id : "q:HypertensionRegister", name : "patients on hypertension register",
                    sourceType : "set"},
                  {@id : "q:DiabetesRegister", name : patients on diabetes register", 
                   sourceType : "set"}] }

// The source type is a set which is the set of patients in the query result

description

A text description of the clause, useful for display.

id

A short form identifier (iri local name) for a type or instance identifier, the purpose being to make the IRIs easier to read. Used in a local implementation when the namespace (base prefix) of a data model is known.

@id

Json-LD standard for the IRI of a resource. Should be used wherever possible for transportable queries

variable

A string variable name that is resolved against arguments passed into the query, or variables used elsewhere. Typical variable name "'$this" which is then passed in as an argument

source Type

An enumeration indicating whether the @id in the from clause is against a type , an instance or a set

"type" means from instances of type (equivalent to FROM TABLE in SQL , rdf:type in RDF)

"set" means the members of a set, such as a base population or value set. A run time query would determine whether the set needs to be generated or use one previously generated

"instance" (default) means an actual instance e.g. a concept or single object, used mostly for subsumption or single object select

include Subtypes

Whenever an IRI is used, subsumption of subclasses or subtypes can be entailed if this is set to true. It means "this and all its descendants" or "<<" in ECL

exclude Self

Enables a subsumption query but excluding the parent itself. Means "all descendants but not this" or "<" in ECL

include Supertypes

Enables a query to traverse up the superclasses and super types to find properties that are implicitly inherited but not explicitly modelled. Typical use would be in using the Snomed content model to determine allowable properties

bool

Boolean operators. Enables nesting of 'and' 'or' and 'not' in the from and where clauses. When used in a from clause this means that two or more from clauses are present. when used in a where clause, two or more properties are being filtered.

from

2 or more nested from clauses when used with the boolean operators.

where

The where clauses used to filter the objects identified by the from according to properties and values.

Where clause

The where clause is a filter on the presence or absence of properties and their values, or the presence of absence of relationships with their objects, that may be further filtered.

{ description : "patients with home addresses with post code starting with YO",

from : {
   id :Patient,
   sourceType : "type",
   where : {
         description : "home address with post code starting with YO",
         id :  "homeAddress",
         where : { id : "postCode",
                   operator : "startsWith",
                   value : "YO"
Predicate cardinality range explanation
description 0..1 String description of this clause, used in the user interface or to explain the purpose of a clause
id 0..1 string A short form for a property identifier
@id 0..1 url The iri of the property
alias 0..1 String An alias that can be referred to elsewhere in the query e.g. for testing for relativeTo etc
variable 0..1 String A variable name passed into the query e.g. "$this" or "$referenceDate"
includeSubtypes 0..1 true/false Any sub properties are included
includeSypertypes 0..1 true/false if true then super properties will be included
bool 0..1 enum "and" , "or" , "not" applying to nested where clause
notExist 0..1 true/false whether the property exists or not i.e. a test for not null
with 0..1 With clause A common table expression i.e. a subquery that enables sorting and ordering as a further filter for the remaining where clause to operate from.

Typically used to get the latest X entries.

value 0..1 String or number Tests for the value being X, further modified by the operator
operator 0..1 enum = , >=, <= >, < , startsWith, contains operators acting on the value
relativeTo 0..1 String When testing a value, points to a variable bound to a set of results from a where or with clause somewhere in the query.
unit 0..1 String unit qualifier with values for dates such as YEAR, MONTH etc
in 0..1 From clause One or more values that may be an instance, type, or set. Equivalent to "Where in (..)" or Where in ,,,(Select...)
notIn 0..1 From clause A negation test for a set of value objects
range 0..1 Range clause A from and To clause with values e.g. from X to Y relative to Z
where 0..* Where clause further Filters on properties of the objects pointed to by the property

With clause

Select clause

This instructs the engine on the properties to export from the objects identified by the from/where clause. Select clause is nestable i.e. where objects have properties that are objects, a nested select clause operates on the properties of the contained objects.

As in the from clause, a select clause can further manipulate and filter the properties according to their values and the properties of the values.

description : "From patients select nhs number, select numeric measure ( effective date and value) where concept = systolic blood pressure and value >150",

from: { @id : "im:Patient"},
  select: [
    { id: "nhsNumber"},
    {id: "numericMeasure",
     where: {
        bool: "and",
        where: [
          {id: "concept",
           in: [{@id: "sn:271649006",
                includeSubtypes: true }]
          },
          {id: "value",
            operator: ">",
            value: 150} ],
        select: [
          {id: "effectiveDate"},
          {id: "value"}]}}]}
Predicate cardinality type explanation
description 0..1 String description of this clause, used in the user interface or to explain the purpose of a clause
id 0..1 string A short form for a property identifier
@id 0..1 url The iri of the property (@id is json-ld for an iri)
alias 0..1 String An alias that can be referred to elsewhere in the query e.g. for testing for relativeTo etc
function 0..1 Function clause Function that operates on the value of the property, includes aggregate functions
where 0..1 Where clause Further filters on the results by performing a filter on the property values.
select 0..* Select clause Selects properties from the value object of the parent select property
orderBy 0..* Order Limt clause List of properties to order by and any limit to the count of rows when returned e.g. latest 1
groupBy 0..* Alias List of properties to group by at this select level
having 0..1 Having Where clause operating on results of groups or functions
case 0..* Case If then rules operating on the property values

Order limit clause

Group by clause

Having clause

Case clause

Function clause

Grammar

This is the section on grammar