DavidStables (talk | contribs) |
DavidStables (talk | contribs) |
||
Line 68: | Line 68: | ||
Examples start with simple patterns and progress to complex temporal patterns of the kind used in real world query. | Examples start with simple patterns and progress to complex temporal patterns of the kind used in real world query. | ||
=== Simple queries === | |||
{| class="wikitable" | {| class="wikitable" | ||
|+ | |+ | ||
!Phrase | !Phrase and explanation. | ||
!IMQ | !IMQ | ||
!Result (json) | !Result (json) | ||
|- | |- | ||
|Get me the nhs number and age in years of all patients | |Get me the nhs number and age in years of all patients. | ||
Selecting things of type person, | |||
select their NHS number,. | |||
Select their age, this being a "function property" that is defined in the IM as the time difference between the date of birth | |||
and a reference date, with the units of YEARS | |||
|<syntaxhighlight> | |<syntaxhighlight> | ||
select :{ | select :{ | ||
Line 104: | Line 115: | ||
|Get me the nhs number and age in years of all patients | |Get me the nhs number and age in years of all patients | ||
in csv format | in csv format | ||
Selecting things of type person, | |||
select their NHS number,. | |||
Select their age, this being a "function property" that is defined in the IM as the time difference between the date of birth | |||
and a reference date, with the units of YEARS | |||
|<syntaxhighlight lang="json"> | |<syntaxhighlight lang="json"> | ||
query : { | query : { | ||
Line 121: | Line 141: | ||
|} | |} | ||
The same DSL | Same as the first but with a CSV output format | ||
The same DSL can be used to query the IM itself | |||
{| class="wikitable" | {| class="wikitable" | ||
|+ | |+ | ||
Line 151: | Line 173: | ||
|} | |} | ||
In this case the query is for page 1 of results for the term "ches wall" that are subtypes of the concept "body structure" | In this case the query is for page 1 of results for the term "ches wall" that are subtypes of the concept "body structure" | ||
Revision as of 10:29, 17 July 2022
Background to IMQ
Information model query language is designed to query both the information model and any health records that map to an IM data model.
As an RDF Graph knowledge base the information model could (and can) be directly queried using SPARQL. The IM also holds text data which can be queried directly using open Search or elastic.
However, as Health records are likely to be stored as relational, or at least SQL compatible data bases, querying health records that are aligned with the model will require SQL to query them.
Consider the following issues with SQL/SPARQL:
- Directly authoring SQL and SPARQL languages require a high degree of skill and health query in particular needs heavily nested subqueries , including some of the more advanced techniques such as correlated query or window functions.
- Translating a user oriented intuitive query builder into SQL or SPARQL directly and in reverse is very difficult. Most query applications use an intermediate language from which the queries are then generated. Examples include GraphQL or Power BI DAX and M.
- Enabling direct query via SPARQL end points or SQL APIs can result in crippling performance problems.
Consequently the IM provides a pragmatic Query domain specific language (DSL) to help bridge the gap between a plain language representation and the run time query. This DSL can be used to exchange query definitions across multiple instances.
The IM also provides reference software showing how SQL or SPARQL or OpenSearch Query can be generated from the DSL and how a plain language or diagrammatic interpretation can be produced from the DSL. The reference software also shows the converse i.e. the generation of the DSL from plain language.
It is not expected that the language must be understood by clinicians, or even those familiar with SQL. Instead the purpose of the language is to enable interoperability of query definition and use across standard REST APIs. However, the logic of a query definition precisely follows the logic of a plain language description of the output required and the criteria to be applied.
The language is designed to meet the following requirements
Query language requirements
Requirement 1 - Should support the vast majority of query patterns for defining and producing data sets or patient profiles that are needed in the real world.
Requirement 2 - Should enable mapping to SQL via simple type-table, property- field maps, for health data held in relational forms, as long as the health data content conforms to an IM data model
Requirement 3 - Should enable mapping to SPARQL directly for querying the IM itself
Requirement 4 - Should enable mapping directly from and to Expression constraint language (ECL) for searching and set definitions.
Requirement 5 - Should enable a technical query to be built as Java Script or POJO objects avoiding the need for a language specific parser.
Requirement 6- Should embed inference statements such as subtype, super type, or set inclusion as part of the query definition, thus avoiding the need for explicit modelling of the complex logic in the query itself
Requirement 7 - Should support object result format as well as relational format i.e. nested json object results as well as flat table results.
Requirement 8 - Should follow a logical plain language definition of the output and criteria to be applied but there is no requirement for the language itself to be understandable by non technical users.
Requirement 9 - Should support at least one plain language and at least one diagrammatic representation of the query that can be understood by lay user.
In other words the requirement is not only to create an intermediate query model but to include a translator to and from an understandable language or diagram.
Language overview
The language follows the familiar pattern of most query languages with constraints. As well as incorporating the core concepts of SQL and SPARQL it includes the nested structure approach as used by GRAPHQL.
The syntax uses json and therefore is very verbose compared with a succinct language. In return for verbosity there is less ambiguity and conforms to a standard object representation.
A query consists of the following main clauses:
QueryRequest : The query run time wrapper holding the run time parameters such as reference date, and contains the query either as a reference IRI or the query itself. Query Request contains one query
Query : Includes the iri, name, description, result format, use of prefixes, the main entity, Select clause and sub-select clause (where a query produces many column groups). Query contains one select
Select : Equivalent to SQL and SPARQL SELECT with GraphQL nesting. Includes a list of properties and aliases as well as any nested properties i.e. select property , select and a match clause and any ordering or limit
Select contains properties to select, one or more match clauses (implied and) or one or more subset selects.
Match : Equivalent to SQL FROM/WHERE and SPARQL WHERE/FILTER. Defines the graph patterns filters or functions including ordering and limit followed by a test i.e. subquery. Match clause contains match oinformation boolean operators, and sub query
Order/ Limit/Offset : used in both select and match, Orders by a field and optionally limits return to a number. Often used for paging.
The language grammar and syntax can be represented as a formal grammar (ABNF) or Json schema. this article presents the grammar by plain language example.
IM Query API
The information model manager provides a REST API for querying the IM or validating health queries. As the preferred approach to API is to exchange JSON, IMQ uses the Json syntax and because entities are represented as RDF URIs, Json-LD is the preferred format.
IMQ by example
This section takes a set of common snippets of plain language query and shows the DSL equivalent. For convenience, as JSON is somewhat bloated, quotes are ommitted, and the Json-LD iri format {"@id" : "http/snomed.info/sct#12345"} is replaced with the term of the IRI.
Examples start with simple patterns and progress to complex temporal patterns of the kind used in real world query.
Simple queries
Phrase and explanation. | IMQ | Result (json) |
---|---|---|
Get me the nhs number and age in years of all patients.
select their NHS number,. Select their age, this being a "function property" that is defined in the IM as the time difference between the date of birth and a reference date, with the units of YEARS |
select :{
property [ {
id : nhsNumber } ,
{id : age,
argument: [ { unit : YEARS }}]
entityType :Person} |
[ {
nhsNumber : 1234567890,
age : 44},
[ {
nhsNumber : 0987654321,
age : 14}]] |
The select clause lists 2 properties of the entity type person, the age property being a function property in the IM with units as a parameter. Default output format is json.
Phrase | IMQ | Result csv) |
---|---|---|
Get me the nhs number and age in years of all patients
in csv format
select their NHS number,. Select their age, this being a "function property" that is defined in the IM as the time difference between the date of birth and a reference date, with the units of YEARS |
query : {
resultFormat= CSV
select :{
property [ {
id : nhsNumber } ,
{id : age,
argument: [ { unit : YEARS }}]
entityType :Person,
|
nhsNumber , age
1234567890, 44 0987654321,14 |
Same as the first but with a CSV output format
The same DSL can be used to query the IM itself
Phrase | IMQ | Result json |
---|---|---|
Get me the code and term of body structures that match the phrase "ches wall" | query : {
textSearch : "ches wall",
page : 1, page size 10,
select :{
entityId : { id: Body structure,
include Subtypes : true}
property [ {
id : im:code, alias : code
iri rdf:label,alias : term}] |
{[
code : 244237006
term : Chest wall artery },
{
code : 78904004
term : chest wall structure , |
In this case the query is for page 1 of results for the term "ches wall" that are subtypes of the concept "body structure"
Example query for age over 65
Query to calculate whether a person is 65 years or over
{
"@id" : "http://endhealth.info/im#Q_RegisteredGMS",
"name" : "Patients registered for GMS services on the reference date",
"description" : "For any registration period,a registration start date before the reference date and no end date,or an end date after the reference date.",
"mainEntity" : {
"@id" : "http://endhealth.info/im#Person"
},
"select" : {
"entityType" : {
"@id" : "http://endhealth.info/im#Person",
"name" : "Person"
},
"match" : {
"property" : {
"@id" : "http://endhealth.info/im#isSubjectOf",
"name" : "has GP registration"
},
"match" : {
"entityType" : {
"@id" : "http://endhealth.info/im#GPRegistration"
},
"and" : [ {
"name" : "patient type is regular GMS Patient",
"property" : {
"@id" : "http://endhealth.info/im#gpPatientType"
},
"isConcept" : [ {
"@id" : "http://endhealth.info/im#2751000252106",
"name" : "Regular GMS patient"
} ]
}, {
"name" : "start of registration is before the reference date",
"property" : {
"@id" : "http://endhealth.info/im#effectiveDate"
},
"value" : {
"comparison" : "LESS_THAN_OR_EQUAL",
"valueData" : "$ReferenceDate"
}
} ],
"or" : [ {
"name" : "the registration has not ended ",
"notExist" : true,
"property" : {
"@id" : "http://endhealth.info/im#endDate"
}
}, {
"name" : "the end of registration is after the reference date",
"property" : {
"@id" : "http://endhealth.info/im#endDate"
},
"value" : {
"comparison" : "GREATER_THAN",
"valueData" : "$ReferenceDate"
}
} ]
}
}
}
}
The above query in verbose language says "select the full name of persons whose property "age" in "units" of "years is greater or equal to 65"
More complex example for patients registered as GMS regular patients on the reference date of the query
Query to calculate whether a patient is registered with a general practice on the reference date
{
"@id" : "http://endhealth.info/im#Q_RegisteredGMS",
"name" : "Patients registered for GMS services on the reference date",
"description" : "For any registration period,a registration start date before the reference date and no end date,or an end date after the reference date.",
"mainEntity" : {
"@id" : "http://endhealth.info/im#Person"
},
"select" : {
"entityType" : {
"@id" : "http://endhealth.info/im#Person",
"name" : "Person"
},
"match" : {
"property" : {
"@id" : "http://endhealth.info/im#isSubjectOf",
"name" : "has GP registration"
},
"match" : {
"entityType" : {
"@id" : "http://endhealth.info/im#GPRegistration"
},
"and" : [ {
"name" : "patient type is regular GMS Patient",
"property" : {
"@id" : "http://endhealth.info/im#gpPatientType"
},
"isConcept" : [ {
"@id" : "http://endhealth.info/im#2751000252106",
"name" : "Regular GMS patient"
} ]
}, {
"name" : "start of registration is before the reference date",
"property" : {
"@id" : "http://endhealth.info/im#effectiveDate"
},
"value" : {
"comparison" : "LESS_THAN_OR_EQUAL",
"valueData" : "$ReferenceDate"
}
} ],
"or" : [ {
"name" : "the registration has not ended ",
"notExist" : true,
"property" : {
"@id" : "http://endhealth.info/im#endDate"
}
}, {
"name" : "the end of registration is after the reference date",
"property" : {
"@id" : "http://endhealth.info/im#endDate"
},
"value" : {
"comparison" : "GREATER_THAN",
"valueData" : "$ReferenceDate"
}
} ]
}
}
}
}
The verbose logical description of this is to select patients who are subjects of GP registration entries and there is an entry for registration of a regular patient before the reference date and that either there is no end date for the registration (i.e. the patient is still registered) or the end date is after the reference date).
Boolean clauses and functions
The language supports unlimited nesting of Boolean clauses such as AND (all must be true), OR (at least one must be true) and OPTIONAL (may be present but if present the subsequent clause must be true) as well as negation. Booleans can apply at the level of the entity (equivalent to a JOIN in SQL) at the level of a property (NULL or outer join) or at the level of the value.
There are two patterns of functions.
- Function properties. These are properties in the data model whose values are generated by a function. in this case the query simply passes in the parameters to the function that are user definable. For Example "age" is modelled as a property with a function that calculates the age at a reference date but the query definition may pass in the age units.
- Functions operating on values. Typical example would be a time difference function between two values from the same or other query.
The language supports functions which are themselves queries
Inheritance and sets
The language supports the labelling of entities with the inclusion of subtypes, the inclusion of supertypes and the inclusion of the concept in a set. In effect this means that subsumption queries are easily performed but also there is the ability to query upward to include supertypes.
For example, when querying for allowable properties of a snomed-ct concept when creating an expression of set definition, the Snomed MRCM is examined to find the properties that have the supertypes of concepts as their domain and to return properties and their sub properties.
ECL example
<<39330711000001103 # is a Covid vaccine
OR #or (
<<10363601000001109: # is a uk product
#and
<<s10362601000001103 = 10362601000001103} } #has vmp Covd vaccine)
This is the section on grammar