Information model query: Difference between revisions

From Endeavour Knowledge Base
 
(218 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{TOC left}}
== Background to IMQ ==
== Background to IMQ ==
Information model query language is designed to operate as an intermediary between plain language and the underlying run time query languages. The language supports query of both the information model and any health records that map to a data model as defined within the IM.
Its all very well modelling data, value sets, and ontologies. What about modelling the logical definitions of data sets or  patient profiles, (these being usually referred to as query)?


As an RDF Graph knowledge base, the information model could (and can)  be directly queried using SPARQL. The IM also holds text data which can be queried directly using open Search or elastic.
IMQ is designed to facilitate the exchange of logical definitions of query via APIs.


However, as Health records are likely to be stored as relational, or at least SQL compatible data bases, querying health records that are aligned with the model will require SQL to query them.
IMQ is not a new query language or domain specific language. Instead, it is simply an object model of a subset of the main stream query language CYPHER and as such  can be easily interpreted into plain CYPHER, SQL or SPARQL. In addition the classes includes a set of simple predicates that map to complex query syntax at run time.


Consider the following issues with SQL/SPARQL:
The main purpose of the object representation is to enable easier build and maintenance of user interfaces and interpretation to system specific query engine languages. Because IMQ is in object form, and transportable as JSON-LD  it is ideal for APIs and interoperability via messages.


* Directly authoring  SQL and SPARQL languages require a high degree of skill and health query in particular needs heavily nested subqueries , including some of the more advanced techniques such as correlated query or window functions.
It is common practice in health care IT to model query definitions , intended for use in many different systems, in plain text documents, leaving  the interpretation of the logic into run time query languages to the vendors internal informatics teams. This process creates a bottle neck and is prone to human error, partly due to ambiguity of plain language. An approach which uses machine readable definitions can reduce the time and remove much of the human error.
* Translating a user oriented intuitive query builder into SQL or SPARQL directly and in reverse is very difficult. Most query applications use an intermediate language from which the queries are then generated. Examples include GraphQL or Power BI DAX and M.
* Enabling direct query via SPARQL end points or SQL APIs can result in crippling performance problems and are hard to police.


Consequently the IM provides a pragmatic Query domain specific language (DSL)  to help bridge the gap between a plain language representation and the run time query. This DSL, being in plain JSON, can be used to exchange query definitions across multiple instances via standard REST APIs.  
It is possible to model a query definition in SQL, but an SQL query brings with it the specific database schema. SQL as a language is huge and developing SQL interpreters to interpret SQL to SQL is hard. Also, it is very difficult to construct understandable user interfaces directly from SQL or SPARQL, or vice versa, and thus most search and report applications create some form of intermediate representation.


The IM, as open source, also provides reference software showing how SQL or SPARQL or OpenSearch Query can be generated from the DSL and how a plain language or diagrammatic interpretation can be produced from the DSL. The reference software also shows the converse i.e. the generation of the DSL from plain language.
IM considers health data to be a conceptual Graph, with the modelling of types, properties, and values as nodes and relationships. This means that in query, the more natural languages are CYPHER  and SPARQL, the latter being the standard language used for RDF graph query. The information model uses IRIs for its types and properties so SPARQL is a natural target. However, instance data in health records are bested suited to a property graph model and therefore the 'target' language of IMQ is CYPHER.


'''It is not expected that the language must be understood by clinicians, or even those familiar with SQL.''' Instead the purpose of the language is to enable interoperability of query definition and use across standard REST APIs. However, the logic of a query definition precisely follows the logic of a plain language description of the output required and the criteria to be applied. Thus IM query is one way of fully documenting the logic of health query, in a human and machine readable format.
== IMQ overview ==
The  class structure  of an IMQ query definition precisely follows the logic of a plain language description of the the criteria to be applied to filter out sets from sets, and define the output required, and thus is ideal for  data set definitions. It also uses the CYPHER concepts in its construction so as to map precisely to CYPHER or other query languages.


The language is designed to meet the following requirements
IMQ simplifies certain complex syntactical constructs by providing some grammatical short cuts covering the following areas
== Query language requirements ==
'''Requirement 1 -''' Should support the vast majority of query patterns for defining and producing data sets or patient profiles that are needed in the real world.


'''Requirement 2 -''' Should enable mapping to SQL via simple type-table, property- field maps, for health data held in relational forms, as long as the health data content conforms to an IM data model
# Subsumption query. Essential for expression constraints, flags identifiers as including descendants or ancestors.
# Sets, types and instances. By flagging identifiers as @set, @type, @id differentiates 'members of a set' from 'instances of a type', from instances.
# The latest/earliest problem. This problem is common in health query as many queries are designed to infer state from events. In main stream queries these are variously modelled as subqueries, correlated subqueries, window functions and sub collections. In IMQ these are simplified in line with a plain language  question 'for things that have X within the last 6 months, get the latest X and test whether it is Y'
Examples in json format can be seen at https://github.com/endeavourhealth-discovery/IMAPI/tree/develop/TestQueries/Definitions


'''Requirement 3 -''' Should enable mapping to SPARQL and Elastic and open search directly, for querying the IM itself
With example results at https://github.com/endeavourhealth-discovery/IMAPI/tree/develop/TestQueries/Results


'''Requirement 4''' - Should enable mapping directly from and to Snomed Expression constraint language (ECL) for searching and set definitions.
and Sparql equivalents at https://github.com/endeavourhealth-discovery/IMAPI/tree/develop/TestQueries/Sparql


'''Requirement 5 -''' Should enable a technical query to be built as Java Script or POJO objects avoiding the need for a language specific parser.
== General Structures ==


'''Requirement 6-''' Should embed inference statements such as subtype, super type, or set inclusion as part of the query definition, thus avoiding the need for explicit modelling of the complex logic in the query itself
=== IRI format ===
Within IMQ , as per RDF, an IRI may be a full iri string such as "http://example.org/something#anything"  or an abbreviated IRI e.g. "ex:anything". If abbreviated, a context object must be provided in the query request document.
As a pragmatic approach for readability, and if the client and server know the default namespace, plain local names can be used without prefix.


'''Requirement 7 -''' Should support object result format as well as relational format i.e. nested json object results as well as flat table results akin to GraphQL,
=== Identifiers ===
Nodes, paths,  where IN clauses all use identifiers, which include iris,  prefixed iris or local names (for local usage).  They all share a set of basic predicates as follows:
{| class="wikitable"
|+
!@id
!Is the standard JSON-LD approach for an iri and indicates the match is on an instance identified by the IRI.
Equivalent to SQL where ID=
|-
|name
|The rdf label or main term representing the name of the IRI. Used for human readability
|-
|variable
|used to declare a node variable to be used in WHERE clause or RETURN clause or subsequent MATCH clauses  i.e. resolves to the set of instances found in the match clause
|-
|parameter
|The name of a parameter(conventionally preceded by $) to be resolved from the query arguments e.g. "parameter" : "$referenceDate"
|-
|descendantsOrSelfOf
|(<<) subtypes (or subclasses) are incorporated at run time. The can apply either in the from clause, the where property, or the value.
|-
|descendantsOf
|(<) indicates only subtypes are examined (ECL compliance)
|-
|ancestorsOf
|(>>) to enable the parent hierarchy to be transitively examined. Used in assessing allowable ranges and properties of concepts.
|}


'''Requirement 8 -'''  Should follow a logical plain but logical language definition of the output and criteria to be applied but there is no requirement for the language itself to be understandable by non technical users. Plain logical language is distinct from natural language, the latter being prone to ambiguity.
==== Node identifiers ====
Node identifiers are extended identifiers of nodes in a graph, Node identifiers in IMQ offer a convenient way of differentiating instances from types and from sets
{| class="wikitable"
|+
|-
|@type
|is an imq convenience indicating that the match is against instances of a certain type.
Equivalent to CYPER (:TheType) or SQL FROM TheType
|-
|@set
|is an imq convenience indicating the match is on any member of a set.
Equivalent to SQL join Result.ID on SET where
SET.ID=  ID and SET.id= X
|}


'''Requirement 9 -''' Should support at least one plain language and at least one diagrammatic representation of the query that can be understood by lay user.
==== Path and property identifiers ====
Path identifiers consider the relationship as an instance. Sub properties may also be tested for, or indeed a variable instead if the relationship. Thus @type and @set are not supported


In other words the requirement is not only to create an intermediate query model but to include a translator to and from an understandable language or diagrammatic user interface.<br />
== Query Structure ==
== Language overview ==
Query structure is a class model, normally serialized as JSON-LD. In the following sections, ABNF is used to illustrate the predicates (json names) as well as JSON-LD and CYPHER equivalent examples.
The language follows the familiar pattern of most query languages with constraints. As well as incorporating the core concepts of SQL and SPARQL it includes the nested structure approach as used by GRAPHQL.


The syntax uses JSON and therefore is somewhat verbose compared with a succinct language but In return there is less ambiguity and conforms to a standard object representation.  
== Query Request ==
An IM query consists of a query request, which includes the necessary components to define a query, as well as a set of arguments that can be passed into the query and used at run time.


A query consists of the following main clauses:
Query request consists of an optional context (Json-LD) object, optional arguments and either a query or path query.
<div class="toccolours mw-collapsible mw-collapsed">


'''QueryRequest :''' The query run time wrapper holding the run time parameters such as reference date, paging, and contains the query either as a reference IRI to a previously authored query or the inline query itself. A Query Request contains one query
In this example the query references a stored query via its IRI. Select expand/collapse to show/hide
<div class="mw-collapsible-content"><span style="color:#FF0000">
<syntaxhighlight lang="json-ld">
{
"@context" : {
  "query" : "http://endhealth.info/query#",
  "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
},
"argument": [
    {
      "parameter": "$referenceDate",
      "value": "2023-01-01"
    }],
"query" : {"@id" :"query:GMSRegisteredPatients"}
}
</syntaxhighlight></div></div>


'''Query  :'''    Includes the iri, name, description, result format, use of prefixes, the main entity, Select clause, sub-select clause (where a query produces many column groups). Query contains one select, but as selects contain sub-selects, complex multiple queries against a single entity type can be constructed as a single select.
=== Context - prefix Map ===
The format for data exchange is JSON-LD and thus a context object is supported, consisting of a prefix to expansion map. Only simple maps are required
<div class="toccolours mw-collapsible mw-collapsed">


'''Select''' : Equivalent to SQL and SPARQL SELECT with GraphQL nesting. Includes a list of properties and aliases as well as  any nested properties  i.e. select property , select and a match clause and any ordering or limit. Select contains properties to select,  one or more match clauses (implied and) or one or more subset selects.
In this example two prefixes are introduced
<div class="mw-collapsible-content"><span style="color:#FF0000">
<syntaxhighlight lang="json-ld">
{
"@context" : {
  "im" : "http://endhealth.info/im#",
  "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
}
}
</syntaxhighlight></div></div>


'''Match''' : Equivalent to SQL FROM/WHERE and SPARQL WHERE/FILTER.  Defines the graph patterns filters or functions including ordering and limit followed by a test i.e. subquery. Match clause contains match Boolean operators (and./or/not) , and subquery, with any number of levels.
=== Arguments ===
An argument consists of a list of parameter value pairs i.e. the name of the parameter and its value as either a string, an iri or a list or list of IRIs.


'''Order/ Limit/Offset  :''' used in both select and match, Orders by a field and optionally limits return to a number of entities. . Often used for paging but the commonest use in within a match clause to match the most recent entry of something that matches, in order to support another match on the result.
In this example, this query is parameterised by the reference date and an IMQ data model property of shacl class (i.e. a placeholder for a property in a query) and a list of IRIs for ranges as place holders for values of the property.
<div class="toccolours mw-collapsible mw-collapsed">
<div class="mw-collapsible-content"><span style="color:#FF0000">
<syntaxhighlight lang="json-ld">
{
  "argument": [
    {
      "parameter": "$referenceDate",
      "value": "2023-01-01"
    },
    {
      "parameter": "aProperty",
      "valueIri": {
        "@id": "sh:class"
      }
    },
    {
      "parameter": "aRange",
      "valueIriList": [
        {
          "@id": "xsd:integer"
        },
        {
          "@id": "xsd:string"
        }
      ]
    }
  ]
}
</syntaxhighlight></div></div>


The language grammar and syntax can be represented as a formal grammar (ABNF) or Json schema. this article presents the grammar by plain language example and goes on to the more formal specification of the language.
== Query ==
A simple overall structure with nestable elements providing an object form input and output similar to GRAPHQL . A query may contain many queries, enabling a package of queries such as a column group report or full data set .


== IM Query API ==
The request may fully define the query (dynamic query) or more commonly reference a pre-existing query definition via an IRI (i.e. a preformed query definition with variables resolved to the arguments passed in at run time). The definition of a pre-existing query is obtained from the "has Definition" property of a stored query entity.
The information model manager provides a REST API for querying the IM or validating health queries. As the preferred approach to API is to exchange JSON, IMQ uses the Json syntax and because entities are represented as RDF URIs, Json-LD is the preferred format.


N.B. The difference between JSON-LD and plain JSON is the representation of ontology IRIS. The JSON-LD context object enables prefixed iris for ease of reading and iri references are presented as objects with the "@id" : "http://.." format
== Predefined Query ==
A query request may simply reference another query, which produces the result object from the other query
<div class="toccolours mw-collapsible mw-collapsed">
For example, the following query request gets the results of a pre-defined query for gms registered patients with a reference date of January 2023
<div class="mw-collapsible-content"><span style="color:#FF0000"><syntaxhighlight lang="json-ld">
{
"argument" : [
{"parameter" : "$referenceDate",
  "value" : "2023-0-01"
} ],
"query" : {"@id" :"http://endhealth.info/query#GMSRegisteredPatients"}
}
</syntaxhighlight></div></div>


Within the body of a query the query predicates or "key words" are represented by their local names, making it easier to map to JS or Java objects.
== Query Clauses ==
IMQ considers a query to be a set of steps, each step starting from a graph and  resulting in a sub graph which is then the starting point of the next step. Sub queries within the steps are used to supplement the graph with results of other queries. Unions are used to merge sub graphs. Steps can reference results of other steps.  


== IMQ by example ==
A query definition consists of a list of 1 or more match clauses and an optional return clause. Optionally a query may have one or many  queries acting as further queries on the instances identified by the first match clause.  
This section takes a set of common snippets of plain language query and shows the DSL equivalent.  


Examples start with simple patterns and progress to complex temporal patterns of the kind used in real world query.
All match clauses must operate on the same node types as defined in the first match clause, thus avoiding cartesian explosions. This "main entity" forms the basis of the result objects being generated. If the first match clause is matching set members or instances, then subsequent match clauses must also match nodes of the same type.  


For convenience data IRIs are represented by their labels.
In this simple example, a query request contains a query with a single match clause identifying all instances of type patient and returning their age in years.


=== Simple queries ===
<div class="toccolours mw-collapsible mw-collapsed">
Two properties of persons.
The neo4j cypher equivalent is included showing that the object model provides context for the grammatical constructs in a more succinct language.
 
<div class="mw-collapsible-content"><span style="color:#FF0000">
Get me the NHS number and age in years of all patients.
{| class="wikitable"
 
|+
# Select things of  type person,
# Select their NHS number,.
# Select their age
 
(this being a "function property" that is defined in the IM as the time
 
difference between the date of birth and a reference date, with the units of YEARS
{|class="mw-collapsible" border="1" cellpadding="1" cellspacing="1" style="width: 500px;" summary="Summary"
|+ IMQ example
!IMQ
!IMQ
!Result (json)
!CYPHER
!
|-
|-
|<syntaxhighlight lang="json">
|<syntaxhighlight lang="json-ld">
"select" :{  
{
   "entityType" : {"@id" : "Person"},
  "@context" : {
   "property" [ {
    "im" : "http://endhealth.info/im#"
     "@id" : {"@id": "nhsNumber" } ,
  },
  {
   "argument": [{"parameter": "referenceDate","value" : "2023-01-01"}],
    "@id" : {"@id": "age"},
   "query" : {
    "argument": [ { "unit" : "YEARS" }
     "match" : [ {
}]}
      "@type" : "im:Patient"
    } ],
    "return" : [ {
      "property" : [ {
        "@id" : "im:age",
        "as" : "age",
        "unit" : "years"
      } ]
    } ]
  }
}
</syntaxhighlight>
</syntaxhighlight>
|<syntaxhighlight>
|<syntaxhighlight lang="cypher">
[ {
:params
 
{
    nhsNumber : 1234567890,
  "referenceDate": "2023-01-01"
    age : 44},
}
 
MATCH (p:Patient)
[ {
RETURN {
    nhsNumber : 0987654321,
          id: p.id,
    age : 14}]]
          age :duration.between(p.dateOfBirth, date($referenceDate)).years
            }
</syntaxhighlight>
</syntaxhighlight>
|
|}
|}
Two properties of persons, this time as CSV
</div></div>
{| class="mw-collapsible" border="1" cellpadding="1" cellspacing="1" style="width: 500px;" summary="Summary"
|+ IMQ Example
!IMQ
!Result csv)
|-
|<syntaxhighlight lang="json">
{"query" : {
"resultFormat" : "CSV",
"select" :{
  "entityType" : {"@id" : "Person"},
  "property" [ {
    "@id" : {"@id": "nhsNumber" } ,
  {
    "@id" : {"@id": "age"},
    "argument": [ { "unit" : "YEARS" }
}]}}
       
</syntaxhighlight><br />
|nhsNumber ,      age
1234567890,      44


0987654321,      14


<div class="toccolours mw-collapsible mw-collapsed">
With the object result being  a list of entities as instances of patients with their age/
<div class="mw-collapsible-content"><span style="color:#FF0000">
  <syntaxhighlight lang="json-ld">
{
  "entities": [
    {
      "@id": "urn:uuid:232dfsdserw23",
      "age": 74
    },
    {
      "@id": "urn:uuid:232d34gerw23",
      "age": 76
    }
  ]
}
</syntaxhighlight></div></div>


|}
== Match ==
Takes a graph and identifies a subset of the graph before returning results. The clause consists of node and relationship (path) mapping out a graph traversal of any depth. Property values of nodes can be filtered using a where clause.


The same DSL can be used to query the IM itself.  
A match consists of :  a node identifier reference, an exclusion operator, a boolean and/or operator (for unions), and optional where clause and order by clause, as well as the sub matches for any union.  


Get me the code and term of  'body structures' that match  
N.B At run time Boolean OR match clauses would be considered as subquery UNIONs, and a match clause with an order by would be considered a sub query applying the optimised syntax for the target database language (e.g. correlated subquery, window function, init/compare etc)


the phrase "ches wall"
If preceded by a match clause in a query, the match clauses must operate on the same node types as defined in the first match clause, thus avoiding cartesian explosions. This "main entity" forms the basis of the result objects being generated. If the first match clause is matching set members or instances, then subsequent match clauses must also match nodes of the same type.


Input  term "ches wall"
<div class="toccolours mw-collapsible mw-collapsed">


get the first page of length 10
In this example there is a simple match for a medicinal product or any of its descendants. i.e. searching the information model itself.
<div class="mw-collapsible-content"><span style="color:#FF0000">


that are subtypes of the concept "body structure"
{| class="wikitable"
{|class="mw-collapsible" border="1" cellpadding="1" cellspacing="1" style="width: 500px;" summary="Summary"
|+
|+ IMQ Example
!IMQ
!IMQ
!Result json
!Expression constraint lnguage
|-
|-
|<syntaxhighlight lang="json-ld">
|<syntaxhighlight lang="json-ld">
{"queryRequest" : {
{
   "textSearch" : "ches wall",
   "match": [
  "page" : 1, "pageSize": 10,
     {
  "query" : {
       "@id": "sn:763158003",
     "select" :{
      "name": "MedicinalProduct",
       "entityId" : {
       "descendantsOrSelfOf": true
        "@id": "Body structure",
    }
        "includeSubtypes" : true},
  ]
       "property" : [ {
}
          "@id" : "code", "alias" : "code"},
 
          "@id" : "label","alias" : "term"}]}}}
</syntaxhighlight>
</syntaxhighlight>
|<syntaxhighlight>
|<syntaxhighlight lang="ecl">
{[
<<763158003|Medicinal Product (product)|
  code : 244237006
  term : Chest wall artery },
{
  code : 78904004
  term  : chest wall structure ,
</syntaxhighlight>
</syntaxhighlight>
|}
|}
</div></div>


=== Cohort definition ===
<div class="toccolours mw-collapsible mw-collapsed">
Get me patients registered as a GMS patient with a general practice on the reference date of the query
In this example one is looking for things that are either aged between 65 and 70, or in the query result set of Diabetics, or where they have an observation with a concept of pre-diabetes. In the data model being used here, a patient has an observation and an observation has a concept. Consequently a property path is used (SQL join)


# things of type person
<div class="mw-collapsible-content"><span style="color:#FF0000">
# Their GP registration entries that consist o
{| class="wikitable"
# a regular GMS patient type
# a start date before the reference date
# either no end date (still registered) or an end date after the reference date (were registered at the time)<br />


{|class="mw-collapsible" border="1" cellpadding="1" cellspacing="1" style="width: 500px;" summary="Summary"
|+
|+ IMQ Example
!IMQ
!IMQ
!Result json
!CYPHER
|-
|-
|<syntaxhighlight lang="json-ld">
|<syntaxhighlight lang="json-ld">
{"select" : {
{
  "entityType" : { "@id" :"Person"},
"bool": "or",
  "match" : [ {
"match": [
    "pathTo" : [ {"@id": "isSubjectOf"}],
{
    "entityType" : { "@id" : "GPRegistration"},                      
"description": "aged between 65 and 70",
    "property" : [ {
"property": [
      "@id" : "patientType",
{
      "isConcept" : {"@id" : regular patient}},                          
"@id": "http://endhealth.info/im#age",
      {  
"range": {
      "@id" : "effectiveDate",
"from": {
      "value" : {
"operator": ">=",
        "comparison" : "LESS_THAN_OR_EQUAL",        
"value": 65
          "valueData" : "$referenceDate"}}],  
},
    "orProperty" : [ {                                          
"to": {
      "notExist" : true,
"operator": ">",
      "@id" : "endDate"},                                          
"value": 70
      {
}
      "@id" :"endDate",
}
      "value" : {                                          
}
        "comparison" : "GREATER_THAN",
]
        "valueData" : "$referenceDate"}}]}}
},
{
"description": "Diabetic",
"@set": "http://example/queries#Q_Diabetics"
},
{
"description": " pre diabetes",
"property": [
{
"@id": "http://endhealth.info/im#observation",
"match": {
"@type": "Observation",
"property": [
{
"@id": "http://endhealth.info/im#concept",
"in": [
{
"@id": "http://snomed.info/sct#714628002",
"descendantsOfOrSelfOf": true
}
]
}
]
}
}
]
}
]
}
</syntaxhighlight>
</syntaxhighlight>
|<syntaxhighlight>
|<syntaxhighlight lang="cypher">
{[
MATCH (p:Patient)
  {id : 1},
//aged between 65 and 70
  { id : 2}]
    WHERE p.age>=65 and p.age <70 return p
UNION
MATCH (p:Patient)-[:memberOf]->(r:ResultSet)
//diabetic
      WHERE r.id= 'http://example/queries#Q_Diabetics' return p
UNION
MATCH(p:Patient)-[:observation]-> (O:Observation)-[:concept]->(c:Concept)
// pre diabetes
      WHERE c.id='http://snomed.info/sct#714628002' return p
</syntaxhighlight>
</syntaxhighlight>
|}
</div></div>
== Property ==
=== Properties and relationships ===
IMQ considers relationships and properties as edges. A relationship connects one node to another (or in data terms, one object to another) and is equivalent to an SQL foreign key. Match clauses enable navigation and capture of the graph to any level of depth and is particularly useful for querying properties of connected entities.


|}
A path consists of a chain of relationship/ node pairs from the match clause. The end node may be omitted by default if it serves no purpose but can be useful to clarify the type of end node, or a variable for binding in a where or return clause.
 
Both relationships and data properties are referred to as "properties" and are differentiated by the subsequent predicates. Relationship paths are represented as chains of property/match/property clauses.


=== The date range 'latest from and test' pattern ===
<div class="toccolours mw-collapsible mw-collapsed">
Get me patients whose latest systolic blood pressure within the last 18 months was taken in the surgery and was > 140.
In this example, a match clause is looking for the address of a GP practice which the patient is currently registered with. In the data model, a patient has a current GP registration episode (functional property) which has a an organisation with an address.


# Select things of type person
<div class="mw-collapsible-content"><span style="color:#FF0000">
# Look for observations and check for
# a systolic blood pressure 
# within 18 months prior to the reference date,
# Then sort and select the latest
# Test whether it is office based (and not therefore not home based) 
# Test to see if it is over 140




N.B only the match clause is illustrated here. The outer select and boolean match clauses would be present in the real query.
{| class="wikitable"
{|class="mw-collapsible" border="1" cellpadding="1" cellspacing="1" style="width: 500px;" summary="Summary"
|+
|+ IMQ Example
!IMQ
!IMQ
!Result json
!Cypher
|-
|-
|<syntaxhighlight line="1" lang="json-ld">
|<syntaxhighlight lang="json-ld">
{"match" : {
{
  "pathTo": [ {"@id" : "isSubjectOf"}],
  "match": {
  "entityType" : {"@id": "Observation"},
    "@type": "im:Patient",
  "property": [ {
    "property": [
    "@id" : "concept",
      {
    "inSet" : {"@id" : "Systolic arterial measurements"},
        "@id": "im:currrentGPRegistration",
        "match": {
          "@type": "im:GPRegistration",
          "property": [
            {
              "@id": "im:organisation",
              "match": {
                "property": [
                  {
                    "@id": "im:address",
                    "match": {
                      "variable": "registeredAddress"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    ]
  },
  "return": [
     {
     {
    "@id" : "effectiveDate",
      "nodeRef": "registeredAddress"
    "function" : {
    }
        "@id" :"TimeDifference",
  ]
        "argument": { [{
}
          "first date":  "$this"},
 
        { "second date" :"referenceDate"},
        { "units" : "MONTHS"}]},
    "value" : {
        "comparison" : "greater or equal",
        "valueData" : -18}],
    "orderBy" : {
        "@id" :"effectiveDate",
        "direction" : "descending",
        "count 1"},
    "testProperty" [ {
        ":@id" :"concept",
        "inSet" : { "@id" : "Office based systolic blood pressures"}},
        {
        "value" : { "comparison" : "greater", "valueData" : 140}}]}]}
</syntaxhighlight>
</syntaxhighlight>
|<syntaxhighlight>
|<syntaxhighlight lang="cypher">
{[
match (p:Patient)-[:gpRegistration]->
  {id : 1},
                      (reg:GPRegisration)- [:organisation]->
  { id : 2}]
                                              ()-[:address]->()
RETURN p
</syntaxhighlight>
</syntaxhighlight>
|}
</div></div>
Note that the second node does not have a type and the end node is omitted. As the data model knows that the GP registration property "organisation", points to an organisation, it is not necessary to include the node type.


|}
=== Data properties ===
A property clause can also filter nodes from a match path according to their property values. A property clause can reference nodes from within the match clause or nodes in previous match clauses by the use of node references.
 
A where clause consists of an optional '''description,'''  an optional node reference a property identifier, a value or range or an IN value where the values are identified by IRIs. Range and value qualifiers such as '''operators''' and various arguments for properties that are functions.
 
A where test of a value can be absolute or relative to another value already captured in the query .
 
For convenience the parameter "unit" being a common argument to a value time function is included for ease.
 
Where clauses are also boolean i.e. where and/or where.
 
In this example a set of observations are filtered on having a systolic blood pressure or home systolic blood pressure within the last 6 months before the reference date. Note that the property IRIs are using local names as the requestor and receiver both know the data model being used and its namespace.
<div class="toccolours mw-collapsible mw-collapsed">
A value label is assigned for display purposes for the human user interface.


=== Expression constraint ECL ===
Get me concepts that are oral none steroidal ant inflammatory products


ECL for which is
<div class="mw-collapsible-content"><span style="color:#FF0000">


<<763158003 | medinal products  :


<<127489000 : has active ingredient = <<372665008 | none steroidal anti inflammatory agent,


<<411116001 : manufactured dose form = <<385268001 | oral
{| class="wikitable"
{|class="mw-collapsible" border="1" cellpadding="1" cellspacing="1" style="width: 500px;" summary="Summary"
|+
|+ IMQ Example
!imq
!IMQ
!cypher
!Result json
|-
|-
|<syntaxhighlight lang="json-ld">
|<syntaxhighlight lang="json-ld">
{"match" : [{
{
   "entityId" : {  
  "bool": "and",
       "@id" : "medicinal product",
   "property": [
       "includeSubtypes": true},
    {
  "property" : [{  
       "description": "Home or office based Systolic",
    "@id" : "has active ingredient",
       "@id": "concept",
    "includeSubtypes" : true,
      "in": [
    "isConcept" : {
        {
           "@id" : "non steroidal anti inflammatory agent",
          "@id": "http://snomed.info/sct#271649006",
           "includeSubtypes" : true}},
          "name": "Systolic blood pressure",
    {
          "descendantsOrSelfOf" : true
      "@id" : "manufactured dose form",
        },
      "includeSubtypes" : true,
        {
        "isConcept" : {
           "@id": "http://endhealth.info/emis#1994021000006104",
            "@id" :"oral",
          "name": "Home systolic blood pressure",
              "includeSubtypes" : true} ]}
           "descendantsOrSelfOf" : true
        }
      ],
      "valueLabel": "Office or home systolic blood pressure"
    },
    {
      "description": "Last 6 months",
      "@id": "effectiveDate",
      "operator": ">=",
      "value": "-6",
      "unit": "MONTHS",
      "relativeTo": {
        "parameter" : "$referenceDate"
      },
      "valueLabel": "last 6 months"
    }
  ]
}
</syntaxhighlight>
</syntaxhighlight>
|
|<syntaxhighlight lang="cypher">
MATCH (o:Observation)-[:concept]->(c:Concept)
WHERE c.id in['sct:271649006','em:1994021000006104']
      and duration.between(o.effectiveDate,$referenceDate).months>=-6
</syntaxhighlight>
|}
</div></div>
 
=== Property identifiers ===
A property extends a property identifier by also referencing a node variable where necessary via a "nodeRef". This can be used when the propery clause is operating on nodes at different levels, allowing a full match path to be filtered at different levels.
<div class="toccolours mw-collapsible mw-collapsed">
In this example the where clause is finding patients that have had an observation within 14 days of their data of birth


<div class="mw-collapsible-content"><span style="color:#FF0000">
{| class="wikitable"
|+
!imq
!cypher
|-
|<syntaxhighlight lang="json-ld">
{
  "match": {
    "@type": "im:Patient",
    "variable": "patient",
    "property": [
      {
        "@id": "im:observation",
        "match": {
          "property": [
            {
              "@id": "effectiveDate",
              "unit": "days",
              "value": {
                "operator": "<=",
                "value": 7,
                "relativeTo": {
                  "nodeRef": "patient",
                  "@id": "im:dateOfBirth"
                }
              }
            }
          ]
        }
      }
    ]
  }
}
</syntaxhighlight>
|<syntaxhighlight lang="cypher">
MATCH (pat:Patient)-[:observation]->(o:Observation)
duration.between(pat.dateOfBirth.o.effectiveDate).days <=-14
RETURN pat
</syntaxhighlight>
|}
|}
</div></div>


=== Data Sets ===
Get me a data set for a cohort of patients with 120 fields grouped into 50 groups


This is a line level  report from a list of patients, with data items from the health record filtered by criteria.


(Only the first couple are shown for brevity)
=== Filtering property values ===
Property clauses have filters (where equivalent) that can test for values, ranges and items in a list.


# Select things of type person,.
==== in - not in ====
# must be in the search result "Cardiovascular bleed search"
this refers to a list of one or more values which may be a list of concepts or a set and indicates that for a particular entry or object to be included the value of the property must be in the list. Conversely if "not in" then the entries would be matched that have property values that are not in the list.
# Patient details  : Full name, age
# Aspirin details : date, medication for Aspirin


a) issued in the last 6 months,
Note that this is not the same as match exclude (as below)


b) still an active authorised medication.
==== Absolute and relative values ====
 
A value consists of a numeric value or string value and an operator e.g. =, <=,>,>=,< , starts with, contains
c) Get the latest
{|class="mw-collapsible" border="1" cellpadding="1" cellspacing="1" style="width: 500px;" summary="Summary"
|+ IMQ example
!IMQ
!Result json
|-
|<syntaxhighlight lang="json-ld">
{"query": {
  "select" :{
      "entityType" : {"@id" : "Person"},
      "match" : [{ "entityInSet" : { "@id": "Cardiovascular bleed search"}}],
      "subselect" : [{
        "name" : "patient details",
          "property" : [ {
              "@id": "fullName"},
              {"@id" : "age"},
          {
            "name" : "Aspirin",
          "pathTo" : "isSubjectOf",
            "property" : [{
              "@id": "effectiveDate","alias" : "date of issue"},
              {"@id": "medication",
                "select" : {
                  "property" : [{"@id": "rdfs:label","alias" : "name"}]}}],
          "match" :[{
            "entityType": { "@id" : "medicationRequest"},
            "property": [{
      "@id" : "medication",
              "isConcept" : {
                  "@id" : "Aspirin products",
                  "includeSubtypes" : true},
              {
              "@id" : "effectiveDate",
                "value": {
                    "comparison": "LESS_THAN_OR_EQUAL", "valueData": 6},
                "function": {
                    "@id": "Time Difference",
      "argument": [
                        {
                          "parameter": "units",  
                          "valueData": "MONTH"},
{
                          "parameter": "firstDate",
                          "valueVariable": "$this"},
{
                          "parameter": "secondDate",  
                          "valueVariable": "$referenceDate"}]},
              {
              "pathTo": [{"@id": "authorisation"},
              "@id": "course Status",
      "isConcept": {"@id" : "Active"}}],
      "orderLimit": {
                "orderBy": {"@id": "effective Date"}}]}}


     
A property clause may have a relative to predicate that points to another objects property or run time variable. the value as a numeric is then tested as the difference between the properties value and the value of the relative property.
</syntaxhighlight>
|


|}
==== Range tests ====
A range test is simply a test of "from" and "to" of a pair of values.


==== Units ====
A value may have a unit of measure, which tests the unity of measure as recorded in the record, or if the property is a functional property, acts as the parameter to time units such as years, or days.


== Boolean query ==
Boolean operators and / or are supported at match and property levels and enables unlimited levels of boolean nested match or property tests.


When used in a match path, boolean 'OR' operator supports branching paths.


The "not" operator is disambiguated via the "exclude" predicate in a match clause which excludes the matched object from the result of the parent match or query to which the match applies. Nesting of match clauses allow exclusions at a granular level i.e. can exclude certain paths without excluding the root object from the result.


== Return ==
The return clause returns the results derived from the subgraph defined in  the match clauses and the subqueries.


The return clause is optional, in which case the query will return a set of distinct identifiers of the main nodes of the match clauses.


The return clause is a powerful construct that  covers many of the commonly used approaches in SQL SELECT, CYPHER RETURN and GRAPHQL.


It supports tabular style outputs (flat tables), nested objects using objects and properties from the matched subgraphs, functional manipulation, and the construction of new objects, partially populated from the match graph.


A return clause consists of a single return node, any number of return properties, each of which either has a value, or another return node.


=== Return node ===
A return node has a node reference that is bound to a match node variable. By default, if the match clauses omit the variable and the return node has no node reference then it is assumed that the return node binds to the main entity defined by the match clauses.


A return node may have an alias or 'as' predicate which forces the run time query to assign these to the columns. This is useful for debug and maintenance as otherwise the result binder uses generated variables.


A return node has zero or many return properties.


<div class="toccolours mw-collapsible mw-collapsed">
This example has matched a patient observations and is returning the numeric values of the above subgraph


<div class="mw-collapsible-content">
<syntaxhighlight lang="json-ld">
{
  "return": [
    {
      "nodeRef": "patient",
      "property": [
        {
          "@id": "observation",
          "node": {
            "property": [
              {
                "@id": "numericValue"
              }
            ]
          }
        }
      ]
    }
  ]
}
</syntaxhighlight>
<syntaxhighlight lang="cypher">
match(patient:Patient)-[:observation]->(ob)
where ob.value>'100'
with patient,collect({value:ob.value}) as obs
return {id:patient.id,name:patient.name,observation:obs}
</syntaxhighlight>
</div></div>


===Return Property===
A return node has zero or many return properties.


Return properties may simply state the property as a standard field. Alternatively the property can operate as a relationship and contain a return node as a value , thus creating objects with arrays.


The interpreter would be expected to collect these nodes and sub nodes as arrays of objects (as indicated by the cypher 'collect' function above. The net result being something like
<div class="toccolours mw-collapsible mw-collapsed">


<div class="mw-collapsible-content">
<syntaxhighlight lang="json">
{
  "name": "fred",
  "id": "1",
  "observation": [
    {
      "value": "150"
    },
    {
      "value": "140"
    },
    {
      "value": "180"
    }
  ]
}
</syntaxhighlight></div></div>


== Variables, parameters and node references ==
At various places in a query, variables can be declared and named, and referred to in future clauses. Queries can also be defined as templates whereby identifiers can be passed into the query in the argument list. The parameters are then authored as parameter predicates in the relevant place in the query.




Line 413: Line 659:









Latest revision as of 11:00, 5 July 2023

Background to IMQ

Its all very well modelling data, value sets, and ontologies. What about modelling the logical definitions of data sets or patient profiles, (these being usually referred to as query)?

IMQ is designed to facilitate the exchange of logical definitions of query via APIs.

IMQ is not a new query language or domain specific language. Instead, it is simply an object model of a subset of the main stream query language CYPHER and as such can be easily interpreted into plain CYPHER, SQL or SPARQL. In addition the classes includes a set of simple predicates that map to complex query syntax at run time.

The main purpose of the object representation is to enable easier build and maintenance of user interfaces and interpretation to system specific query engine languages. Because IMQ is in object form, and transportable as JSON-LD it is ideal for APIs and interoperability via messages.

It is common practice in health care IT to model query definitions , intended for use in many different systems, in plain text documents, leaving the interpretation of the logic into run time query languages to the vendors internal informatics teams. This process creates a bottle neck and is prone to human error, partly due to ambiguity of plain language. An approach which uses machine readable definitions can reduce the time and remove much of the human error.

It is possible to model a query definition in SQL, but an SQL query brings with it the specific database schema. SQL as a language is huge and developing SQL interpreters to interpret SQL to SQL is hard. Also, it is very difficult to construct understandable user interfaces directly from SQL or SPARQL, or vice versa, and thus most search and report applications create some form of intermediate representation.

IM considers health data to be a conceptual Graph, with the modelling of types, properties, and values as nodes and relationships. This means that in query, the more natural languages are CYPHER and SPARQL, the latter being the standard language used for RDF graph query. The information model uses IRIs for its types and properties so SPARQL is a natural target. However, instance data in health records are bested suited to a property graph model and therefore the 'target' language of IMQ is CYPHER.

IMQ overview

The class structure of an IMQ query definition precisely follows the logic of a plain language description of the the criteria to be applied to filter out sets from sets, and define the output required, and thus is ideal for data set definitions. It also uses the CYPHER concepts in its construction so as to map precisely to CYPHER or other query languages.

IMQ simplifies certain complex syntactical constructs by providing some grammatical short cuts covering the following areas

  1. Subsumption query. Essential for expression constraints, flags identifiers as including descendants or ancestors.
  2. Sets, types and instances. By flagging identifiers as @set, @type, @id differentiates 'members of a set' from 'instances of a type', from instances.
  3. The latest/earliest problem. This problem is common in health query as many queries are designed to infer state from events. In main stream queries these are variously modelled as subqueries, correlated subqueries, window functions and sub collections. In IMQ these are simplified in line with a plain language question 'for things that have X within the last 6 months, get the latest X and test whether it is Y'

Examples in json format can be seen at https://github.com/endeavourhealth-discovery/IMAPI/tree/develop/TestQueries/Definitions

With example results at https://github.com/endeavourhealth-discovery/IMAPI/tree/develop/TestQueries/Results

and Sparql equivalents at https://github.com/endeavourhealth-discovery/IMAPI/tree/develop/TestQueries/Sparql

General Structures

IRI format

Within IMQ , as per RDF, an IRI may be a full iri string such as "http://example.org/something#anything" or an abbreviated IRI e.g. "ex:anything". If abbreviated, a context object must be provided in the query request document.

As a pragmatic approach for readability, and if the client and server know the default namespace, plain local names can be used without prefix.

Identifiers

Nodes, paths, where IN clauses all use identifiers, which include iris, prefixed iris or local names (for local usage). They all share a set of basic predicates as follows:

@id Is the standard JSON-LD approach for an iri and indicates the match is on an instance identified by the IRI.

Equivalent to SQL where ID=

name The rdf label or main term representing the name of the IRI. Used for human readability
variable used to declare a node variable to be used in WHERE clause or RETURN clause or subsequent MATCH clauses i.e. resolves to the set of instances found in the match clause
parameter The name of a parameter(conventionally preceded by $) to be resolved from the query arguments e.g. "parameter" : "$referenceDate"
descendantsOrSelfOf (<<) subtypes (or subclasses) are incorporated at run time. The can apply either in the from clause, the where property, or the value.
descendantsOf (<) indicates only subtypes are examined (ECL compliance)
ancestorsOf (>>) to enable the parent hierarchy to be transitively examined. Used in assessing allowable ranges and properties of concepts.

Node identifiers

Node identifiers are extended identifiers of nodes in a graph, Node identifiers in IMQ offer a convenient way of differentiating instances from types and from sets

@type is an imq convenience indicating that the match is against instances of a certain type.

Equivalent to CYPER (:TheType) or SQL FROM TheType

@set is an imq convenience indicating the match is on any member of a set.

Equivalent to SQL join Result.ID on SET where SET.ID= ID and SET.id= X

Path and property identifiers

Path identifiers consider the relationship as an instance. Sub properties may also be tested for, or indeed a variable instead if the relationship. Thus @type and @set are not supported

Query Structure

Query structure is a class model, normally serialized as JSON-LD. In the following sections, ABNF is used to illustrate the predicates (json names) as well as JSON-LD and CYPHER equivalent examples.

Query Request

An IM query consists of a query request, which includes the necessary components to define a query, as well as a set of arguments that can be passed into the query and used at run time.

Query request consists of an optional context (Json-LD) object, optional arguments and either a query or path query.

In this example the query references a stored query via its IRI. Select expand/collapse to show/hide

{
"@context" : {
  "query" : "http://endhealth.info/query#",
  "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
},
"argument": [
    {
      "parameter": "$referenceDate",
       "value": "2023-01-01"
    }],
"query" : {"@id" :"query:GMSRegisteredPatients"}
 
}

Context - prefix Map

The format for data exchange is JSON-LD and thus a context object is supported, consisting of a prefix to expansion map. Only simple maps are required

In this example two prefixes are introduced

{
"@context" : {
  "im" : "http://endhealth.info/im#",
  "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
}
}

Arguments

An argument consists of a list of parameter value pairs i.e. the name of the parameter and its value as either a string, an iri or a list or list of IRIs.

In this example, this query is parameterised by the reference date and an IMQ data model property of shacl class (i.e. a placeholder for a property in a query) and a list of IRIs for ranges as place holders for values of the property.

{
  "argument": [
    {
      "parameter": "$referenceDate",
      "value": "2023-01-01"
    },
    {
      "parameter": "aProperty",
      "valueIri": {
        "@id": "sh:class"
      }
    },
    {
      "parameter": "aRange",
      "valueIriList": [
        {
          "@id": "xsd:integer"
        },
        {
          "@id": "xsd:string"
        }
      ]
    }
  ]
}

Query

A simple overall structure with nestable elements providing an object form input and output similar to GRAPHQL . A query may contain many queries, enabling a package of queries such as a column group report or full data set .

The request may fully define the query (dynamic query) or more commonly reference a pre-existing query definition via an IRI (i.e. a preformed query definition with variables resolved to the arguments passed in at run time). The definition of a pre-existing query is obtained from the "has Definition" property of a stored query entity.

Predefined Query

A query request may simply reference another query, which produces the result object from the other query

For example, the following query request gets the results of a pre-defined query for gms registered patients with a reference date of January 2023

{
"argument" : [
 {"parameter" : "$referenceDate",
  "value" : "2023-0-01"
} ],
"query" : {"@id" :"http://endhealth.info/query#GMSRegisteredPatients"}
}

Query Clauses

IMQ considers a query to be a set of steps, each step starting from a graph and resulting in a sub graph which is then the starting point of the next step. Sub queries within the steps are used to supplement the graph with results of other queries. Unions are used to merge sub graphs. Steps can reference results of other steps.

A query definition consists of a list of 1 or more match clauses and an optional return clause. Optionally a query may have one or many queries acting as further queries on the instances identified by the first match clause.

All match clauses must operate on the same node types as defined in the first match clause, thus avoiding cartesian explosions. This "main entity" forms the basis of the result objects being generated. If the first match clause is matching set members or instances, then subsequent match clauses must also match nodes of the same type.

In this simple example, a query request contains a query with a single match clause identifying all instances of type patient and returning their age in years.

The neo4j cypher equivalent is included showing that the object model provides context for the grammatical constructs in a more succinct language.

IMQ CYPHER
{
  "@context" : {
    "im" : "http://endhealth.info/im#"
  },
  "argument": [{"parameter": "referenceDate","value" : "2023-01-01"}],
  "query" : {
    "match" : [ {
      "@type" : "im:Patient"
    } ],
    "return" : [ {
      "property" : [ {
        "@id" : "im:age",
        "as" : "age",
        "unit" : "years"
      } ]
    } ]
  }
}
:params 
{
  "referenceDate": "2023-01-01"
}
MATCH (p:Patient)
RETURN {
          id: p.id,
          age :duration.between(p.dateOfBirth, date($referenceDate)).years
            }


With the object result being a list of entities as instances of patients with their age/

{
  "entities": [
    {
      "@id": "urn:uuid:232dfsdserw23",
      "age": 74
    },
    {
      "@id": "urn:uuid:232d34gerw23",
      "age": 76
    }
  ]
}

Match

Takes a graph and identifies a subset of the graph before returning results. The clause consists of node and relationship (path) mapping out a graph traversal of any depth. Property values of nodes can be filtered using a where clause.

A match consists of : a node identifier reference, an exclusion operator, a boolean and/or operator (for unions), and optional where clause and order by clause, as well as the sub matches for any union.

N.B At run time Boolean OR match clauses would be considered as subquery UNIONs, and a match clause with an order by would be considered a sub query applying the optimised syntax for the target database language (e.g. correlated subquery, window function, init/compare etc)

If preceded by a match clause in a query, the match clauses must operate on the same node types as defined in the first match clause, thus avoiding cartesian explosions. This "main entity" forms the basis of the result objects being generated. If the first match clause is matching set members or instances, then subsequent match clauses must also match nodes of the same type.

In this example there is a simple match for a medicinal product or any of its descendants. i.e. searching the information model itself.

IMQ Expression constraint lnguage
{
  "match": [
    {
      "@id": "sn:763158003",
      "name": "MedicinalProduct",
      "descendantsOrSelfOf": true
    }
  ]
}
<<763158003|Medicinal Product (product)|

In this example one is looking for things that are either aged between 65 and 70, or in the query result set of Diabetics, or where they have an observation with a concept of pre-diabetes. In the data model being used here, a patient has an observation and an observation has a concept. Consequently a property path is used (SQL join)

IMQ CYPHER
{
	 "bool": "or",
	 "match": [
		{
			 "description": "aged between 65 and 70",
			 "property": [
				{
					 "@id": "http://endhealth.info/im#age",
					 "range": {
						 "from": {
							 "operator": ">=",
							 "value": 65
						},
						 "to": {
							 "operator": ">",
							 "value": 70
						}
					}
				}
			]
		},
		{
			 "description": "Diabetic",
			 "@set": "http://example/queries#Q_Diabetics"
		},
		{
			 "description": " pre diabetes",
			 "property": [
				{
					 "@id": "http://endhealth.info/im#observation",
					 "match": {
						 "@type": "Observation",
						 "property": [
							{
								 "@id": "http://endhealth.info/im#concept",
								 "in": [
									{
										 "@id": "http://snomed.info/sct#714628002",
										 "descendantsOfOrSelfOf": true
									}
								]
							}
						]
					}
				}
			]
		}
	]
}
MATCH (p:Patient) 
//aged between 65 and 70
     WHERE p.age>=65 and p.age <70 return p 
UNION 
MATCH (p:Patient)-[:memberOf]->(r:ResultSet) 
//diabetic 
       WHERE r.id= 'http://example/queries#Q_Diabetics' return p 
UNION 
MATCH(p:Patient)-[:observation]-> (O:Observation)-[:concept]->(c:Concept)
 // pre diabetes 
      WHERE c.id='http://snomed.info/sct#714628002' return p

Property

Properties and relationships

IMQ considers relationships and properties as edges. A relationship connects one node to another (or in data terms, one object to another) and is equivalent to an SQL foreign key. Match clauses enable navigation and capture of the graph to any level of depth and is particularly useful for querying properties of connected entities.

A path consists of a chain of relationship/ node pairs from the match clause. The end node may be omitted by default if it serves no purpose but can be useful to clarify the type of end node, or a variable for binding in a where or return clause.

Both relationships and data properties are referred to as "properties" and are differentiated by the subsequent predicates. Relationship paths are represented as chains of property/match/property clauses.

In this example, a match clause is looking for the address of a GP practice which the patient is currently registered with. In the data model, a patient has a current GP registration episode (functional property) which has a an organisation with an address.


IMQ Cypher
{
  "match": {
    "@type": "im:Patient",
    "property": [
      {
        "@id": "im:currrentGPRegistration",
        "match": {
          "@type": "im:GPRegistration",
          "property": [
            {
              "@id": "im:organisation",
              "match": {
                "property": [
                  {
                    "@id": "im:address",
                    "match": {
                      "variable": "registeredAddress"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    ]
  },
  "return": [
    {
      "nodeRef": "registeredAddress"
    }
  ]
}
match (p:Patient)-[:gpRegistration]->
                      (reg:GPRegisration)- [:organisation]->
                                              ()-[:address]->() 
RETURN p


Note that the second node does not have a type and the end node is omitted. As the data model knows that the GP registration property "organisation", points to an organisation, it is not necessary to include the node type.

Data properties

A property clause can also filter nodes from a match path according to their property values. A property clause can reference nodes from within the match clause or nodes in previous match clauses by the use of node references.

A where clause consists of an optional description, an optional node reference a property identifier, a value or range or an IN value where the values are identified by IRIs. Range and value qualifiers such as operators and various arguments for properties that are functions.

A where test of a value can be absolute or relative to another value already captured in the query .

For convenience the parameter "unit" being a common argument to a value time function is included for ease.

Where clauses are also boolean i.e. where and/or where.

In this example a set of observations are filtered on having a systolic blood pressure or home systolic blood pressure within the last 6 months before the reference date. Note that the property IRIs are using local names as the requestor and receiver both know the data model being used and its namespace.

A value label is assigned for display purposes for the human user interface.



imq cypher
{
  "bool": "and",
  "property": [
    {
      "description": "Home or office based Systolic",
      "@id": "concept",
      "in": [
        {
          "@id": "http://snomed.info/sct#271649006",
          "name": "Systolic blood pressure",
          "descendantsOrSelfOf" : true
        },
        {
          "@id": "http://endhealth.info/emis#1994021000006104",
          "name": "Home systolic blood pressure",
          "descendantsOrSelfOf" : true
        }
      ],
      "valueLabel": "Office or home systolic blood pressure"
    },
    {
      "description": "Last 6 months",
      "@id": "effectiveDate",
      "operator": ">=",
      "value": "-6",
      "unit": "MONTHS",
      "relativeTo": {
        "parameter" : "$referenceDate"
      },
      "valueLabel": "last 6 months"
    }
  ]
}
MATCH (o:Observation)-[:concept]->(c:Concept)
WHERE c.id in['sct:271649006','em:1994021000006104'] 
      and duration.between(o.effectiveDate,$referenceDate).months>=-6

Property identifiers

A property extends a property identifier by also referencing a node variable where necessary via a "nodeRef". This can be used when the propery clause is operating on nodes at different levels, allowing a full match path to be filtered at different levels.

In this example the where clause is finding patients that have had an observation within 14 days of their data of birth

imq cypher
{
  "match": {
    "@type": "im:Patient",
    "variable": "patient",
    "property": [
      {
        "@id": "im:observation",
        "match": {
          "property": [
            {
              "@id": "effectiveDate",
              "unit": "days",
              "value": {
                "operator": "<=",
                "value": 7,
                "relativeTo": {
                  "nodeRef": "patient",
                  "@id": "im:dateOfBirth"
                }
              }
            }
          ]
        }
      }
    ]
  }
}
MATCH (pat:Patient)-[:observation]->(o:Observation)
duration.between(pat.dateOfBirth.o.effectiveDate).days <=-14
RETURN pat


Filtering property values

Property clauses have filters (where equivalent) that can test for values, ranges and items in a list.

in - not in

this refers to a list of one or more values which may be a list of concepts or a set and indicates that for a particular entry or object to be included the value of the property must be in the list. Conversely if "not in" then the entries would be matched that have property values that are not in the list.

Note that this is not the same as match exclude (as below)

Absolute and relative values

A value consists of a numeric value or string value and an operator e.g. =, <=,>,>=,< , starts with, contains

A property clause may have a relative to predicate that points to another objects property or run time variable. the value as a numeric is then tested as the difference between the properties value and the value of the relative property.

Range tests

A range test is simply a test of "from" and "to" of a pair of values.

Units

A value may have a unit of measure, which tests the unity of measure as recorded in the record, or if the property is a functional property, acts as the parameter to time units such as years, or days.

Boolean query

Boolean operators and / or are supported at match and property levels and enables unlimited levels of boolean nested match or property tests.

When used in a match path, boolean 'OR' operator supports branching paths.

The "not" operator is disambiguated via the "exclude" predicate in a match clause which excludes the matched object from the result of the parent match or query to which the match applies. Nesting of match clauses allow exclusions at a granular level i.e. can exclude certain paths without excluding the root object from the result.

Return

The return clause returns the results derived from the subgraph defined in the match clauses and the subqueries.

The return clause is optional, in which case the query will return a set of distinct identifiers of the main nodes of the match clauses.

The return clause is a powerful construct that covers many of the commonly used approaches in SQL SELECT, CYPHER RETURN and GRAPHQL.

It supports tabular style outputs (flat tables), nested objects using objects and properties from the matched subgraphs, functional manipulation, and the construction of new objects, partially populated from the match graph.

A return clause consists of a single return node, any number of return properties, each of which either has a value, or another return node.

Return node

A return node has a node reference that is bound to a match node variable. By default, if the match clauses omit the variable and the return node has no node reference then it is assumed that the return node binds to the main entity defined by the match clauses.

A return node may have an alias or 'as' predicate which forces the run time query to assign these to the columns. This is useful for debug and maintenance as otherwise the result binder uses generated variables.

A return node has zero or many return properties.

This example has matched a patient observations and is returning the numeric values of the above subgraph

{ 
  "return": [
    {
      "nodeRef": "patient",
      "property": [
        {
          "@id": "observation",
          "node": {
            "property": [
              {
                "@id": "numericValue"
              }
            ]
          }
        }
      ]
    }
  ]
}
match(patient:Patient)-[:observation]->(ob)
where ob.value>'100'
with patient,collect({value:ob.value}) as obs
return {id:patient.id,name:patient.name,observation:obs}

Return Property

A return node has zero or many return properties.

Return properties may simply state the property as a standard field. Alternatively the property can operate as a relationship and contain a return node as a value , thus creating objects with arrays.

The interpreter would be expected to collect these nodes and sub nodes as arrays of objects (as indicated by the cypher 'collect' function above. The net result being something like

{
  "name": "fred",
  "id": "1",
  "observation": [
    {
      "value": "150"
    },
    {
      "value": "140"
    },
    {
      "value": "180"
    }
  ]
}

Variables, parameters and node references

At various places in a query, variables can be declared and named, and referred to in future clauses. Queries can also be defined as templates whereby identifiers can be passed into the query in the argument list. The parameters are then authored as parameter predicates in the relevant place in the query.
























Grammar

This is the section on grammar