Information model query: Difference between revisions

From Endeavour Knowledge Base
No edit summary
Line 2: Line 2:
== Background to IMQ ==
== Background to IMQ ==
Its all very well modelling data, value sets, and ontologies. What about modelling the logical definitions of data sets or  profiles, (this being usually referred to as  query)?
Its all very well modelling data, value sets, and ontologies. What about modelling the logical definitions of data sets or  profiles, (this being usually referred to as  query)?
IMQ is designed to facilitate the exchange of logical definitions of query via APIs.
IMQ is not a new query language or domain specific language. Instead, it is simply an object representation of the main stream query language CYPHER and as such  can be easily interpreted into plain CYPHER, SQL or SPARQL.
The main purpose of the object representation is to enable easier build and maintenance of user interfaces. Because IMQ is in object form, and transportable as JSON-LD  it is ideal for APIs and interoperability via messages.


It is common practice in health care IT to model query definitions , intended for use in many different systems, in plain text documents, leaving  the interpretation of the logic into run time query languages to the vendors internal informatics teams. This process creates a bottle neck and is prone to human error, partly due to ambiguity of plain language.  An approach which uses machine readable definitions can reduce the time and remove much of the human error.
It is common practice in health care IT to model query definitions , intended for use in many different systems, in plain text documents, leaving  the interpretation of the logic into run time query languages to the vendors internal informatics teams. This process creates a bottle neck and is prone to human error, partly due to ambiguity of plain language.  An approach which uses machine readable definitions can reduce the time and remove much of the human error.
Line 7: Line 13:
It is possible to model a query definition in SQL, but an SQL query brings with it the specific database schema. SQL as a language is huge and developing SQL interpreters to interpret SQL to SQL is hard. Also, it is very difficult to construct understandable user interfaces directly from SQL or SPARQL, or vice versa, and thus most search and report applications create some form of intermediate representation.
It is possible to model a query definition in SQL, but an SQL query brings with it the specific database schema. SQL as a language is huge and developing SQL interpreters to interpret SQL to SQL is hard. Also, it is very difficult to construct understandable user interfaces directly from SQL or SPARQL, or vice versa, and thus most search and report applications create some form of intermediate representation.


IM considers health data to be a conceptual Graph, with the modelling of types, properties, and values as nodes and relationships. This means that in query, the more natural language is SPARQL, which is the standard language used for RDF graph query. However, SPARQL is hard to understand  visually and quite difficult to write interpreters for. Given that most systems use SQL, it seems fairly pointless adopting a language that is specific to RDF and would need interpreting anyway.
IM considers health data to be a conceptual Graph, with the modelling of types, properties, and values as nodes and relationships. This means that in query, the more natural languages are CYPHER  and SPARQL, the latter being the standard language used for RDF graph query. However, SPARQL is hard to understand  visually and quite difficult to write interpreters for. Given that most systems use SQL, it seems fairly pointless adopting a language that is specific to RDF.


IM Query is designed to operate as an intermediary between plain language and the underlying run time query languages. The grammar is a constraint of main steam query languages and is a constraint on the query logic itself designed to match most health queries used in practice.
== IMQ overview ==
The  class structure  of an IMQ query definition precisely follows the logic of a plain language description of the  the criteria to be applied to filter out sets from sets, and define the output required, and thus is ideal for  data set definitions. It also uses the CYPHER concepts in its construction so as to map precisely to CYPHER or other query languages.


== IMQ overview ==
In addition to standard query support IMQ supports subsumption query by the use of entailment qualifiers of the kind used in Expression Constraint language. Thus is it also idea for querying the information model ontologies and data models themselves.
IMQ can be considered either as a DSL (a language with Grammar rules), or an Object model with query definitions as plain data objects that are instances of plain data classes . Both forms are supported. IMQ provides a pragmatic  JSON based object model and a more succinct text based grammar to represent query definitions, and the IM services provide implementations in SQL and SPARQL via the use of interpreters.


The grammar or "structure"  of an IMQ query definition precisely follows the logic of a plain language description of the the criteria to be applied to filter out sets from sets, and define the output required, and thus is ideal for  data set definitions.
IMQ considers a query to be a set of steps, each step resulting in a graph which is then the starting point of the next step. Sub queries within the steps are used to supplement the graph with results of other queries. Unions are used to merge sub graphs. Correlated subqueries can use the variables from outside the scope of the subquery.  


'''"From''' a set of things (things of a certain types, members of a set,  or the things themselves) 
A step consists of  


'''where''' those things have certain properties and values, (e.g. observations, concepts and values), optionally '''ordered''' and '''limited''' (e.g. most recent) and optionally '''then''' further filtered   
'''"Match''' a set of things (things of a certain types, members of a set, or instances, that have relationships to other things  i.e. a graph pattern)   


'''Select''' the properties of those things (e.g. age or date of birth) 
'''where''' those things have certain properties and values, (e.g. observations, concepts and values), or compared with values from other steps.   


'''where''' the properties have certain values  ('''where'''  the values as objects further filtered e.g. latest blood pressure),    
'''Select''' or return the properties of those things (e.g. age or date of birth)   


and optionally further '''select''' properties of the linked things 
and optionally '''ordered''' and '''limited''' (e.g. most recent) and optionally '''then''' further filtered to enable the next step to process s smaller subset. 


In line with the rest of the IM languages, IMQ uses an RDF approach to identifiers, thus enabling global identifiers for types, properties and value sets.   
In line with the rest of the IM languages, IMQ uses an RDF approach to identifiers, thus enabling global identifiers for types, properties and value sets.   
Line 34: Line 40:


The request may fully define the query (dynamic query) or more commonly reference a pre-existing query definition via an IRI (i.e. a preformed query definition with variables resolved to the arguments passed in at run time). The pre-existing query definition is obtained from the "has Definition" property of a stored query entity.  
The request may fully define the query (dynamic query) or more commonly reference a pre-existing query definition via an IRI (i.e. a preformed query definition with variables resolved to the arguments passed in at run time). The pre-existing query definition is obtained from the "has Definition" property of a stored query entity.  
High level query structure is as follows:
{| class="wikitable"
|+
!Plain
!Json
|-
|query {
      from {
            where {
                  where {
                      then {}
                        }
                  }
          }
    select {
              where {}
              select {}
            }
        }
|<syntaxhighlight lang="json">
{"query" : {"from" : {
              "where" :[ {
                "where" : {
                "then" : {}
                        }
                      ]}
                    },
            "select" :[{
                  "where" : {},
                  "select" :[ {}]
          }],
            "subQuery" :[ {}]
}
</syntaxhighlight>
|-
|}
'''Simple example'''   
Get me the full name of all patients aged >= 18 years.
{| class="wikitable"
|+
!Plain
!Json
|-
|query {
    from {
      @:Patient
      where {
            :age >=18(years)
            }
          }
    select { :fullName}
      }
|<syntaxhighlight lang="json">
{
"from" : {"@type": ":Patient",
          "where" :{"id" : ":age",
                      "operator" : ">=",
                      "value" : 18,
                      "unit" : "YEARS"}
        },
"select" : [ {"id" : ":fullName"}]
}
</syntaxhighlight>
|-
|}


== Subsumption query ==
== Subsumption query ==
Line 131: Line 63:


== Query model  specifications ==
== Query model  specifications ==
Specification of query clauses are described in two sets of pages.   
Specification of query clauses are described in a set of pages.   


One set describes the grammar of the [[IMQGrammar|plain language approach]] and the other describes the same model as a one of the [[Meta model class specification|meta model classes]] i.e. set of plain data classes.   
IMQ classes are a subset of the IM  [[Meta model class specification|meta model classes]] i.e. set of plain data classes.   





Revision as of 21:01, 5 April 2023

Background to IMQ

Its all very well modelling data, value sets, and ontologies. What about modelling the logical definitions of data sets or profiles, (this being usually referred to as query)?

IMQ is designed to facilitate the exchange of logical definitions of query via APIs.

IMQ is not a new query language or domain specific language. Instead, it is simply an object representation of the main stream query language CYPHER and as such can be easily interpreted into plain CYPHER, SQL or SPARQL.

The main purpose of the object representation is to enable easier build and maintenance of user interfaces. Because IMQ is in object form, and transportable as JSON-LD it is ideal for APIs and interoperability via messages.

It is common practice in health care IT to model query definitions , intended for use in many different systems, in plain text documents, leaving the interpretation of the logic into run time query languages to the vendors internal informatics teams. This process creates a bottle neck and is prone to human error, partly due to ambiguity of plain language. An approach which uses machine readable definitions can reduce the time and remove much of the human error.

It is possible to model a query definition in SQL, but an SQL query brings with it the specific database schema. SQL as a language is huge and developing SQL interpreters to interpret SQL to SQL is hard. Also, it is very difficult to construct understandable user interfaces directly from SQL or SPARQL, or vice versa, and thus most search and report applications create some form of intermediate representation.

IM considers health data to be a conceptual Graph, with the modelling of types, properties, and values as nodes and relationships. This means that in query, the more natural languages are CYPHER and SPARQL, the latter being the standard language used for RDF graph query. However, SPARQL is hard to understand visually and quite difficult to write interpreters for. Given that most systems use SQL, it seems fairly pointless adopting a language that is specific to RDF.

IMQ overview

The class structure of an IMQ query definition precisely follows the logic of a plain language description of the the criteria to be applied to filter out sets from sets, and define the output required, and thus is ideal for data set definitions. It also uses the CYPHER concepts in its construction so as to map precisely to CYPHER or other query languages.

In addition to standard query support IMQ supports subsumption query by the use of entailment qualifiers of the kind used in Expression Constraint language. Thus is it also idea for querying the information model ontologies and data models themselves.

IMQ considers a query to be a set of steps, each step resulting in a graph which is then the starting point of the next step. Sub queries within the steps are used to supplement the graph with results of other queries. Unions are used to merge sub graphs. Correlated subqueries can use the variables from outside the scope of the subquery.

A step consists of

"Match a set of things (things of a certain types, members of a set, or instances, that have relationships to other things i.e. a graph pattern)

where those things have certain properties and values, (e.g. observations, concepts and values), or compared with values from other steps.

Select or return the properties of those things (e.g. age or date of birth)

and optionally ordered and limited (e.g. most recent) and optionally then further filtered to enable the next step to process s smaller subset.

In line with the rest of the IM languages, IMQ uses an RDF approach to identifiers, thus enabling global identifiers for types, properties and value sets.

Query Structure

An IM query consists of a query request, which includes the necessary components to define a query, as well as a set of arguments that can be passed into the query and used at run time.

A simple overall structure with nestable elements providing an object form input and output similar to GRAPHQL . A query may contain many queries, enabling a package of queries such as a column group report or full data set .

The request may fully define the query (dynamic query) or more commonly reference a pre-existing query definition via an IRI (i.e. a preformed query definition with variables resolved to the arguments passed in at run time). The pre-existing query definition is obtained from the "has Definition" property of a stored query entity.

Subsumption query

A key differentiator of IMQ from standard SQL is the support for a variety of subsumption (entailment) or qualifiers of the identifiers in both the from and where clause. This makes IMQ compliant with expression constraint language when applied to concepts, but can also be used to incorporate subtypes of data model types.

The qualifiers are:

  • Descendants Or Self Of (<<) subtypes (or subclasses) are incorporated at run time. The can apply either in the from clause, the where property, or the value.
  • Descendants Of (<) indicates only subtypes are examined (ECL compliance)
  • Ancestors Of (>>) to enable the parent hierarchy to be transitively examined. Used in assessing allowable ranges and properties of concepts.
  • Member of (^) to use the instance members of a set in the From clause
  • Type (@) to use instances of a certain type in a from clause (e.g. patients) or when navigating the graph to illustrate node types (e.g. Hospital Admission)

ECL support

Expression constraint language is supported by IMQ as the from/where logic maps precisely concepts refinements, attributes and attribute groups

Query Request

IMQ supports conventional query for extract, query based updates (deletion) and a special 'path query' for determining paths between two classes. In addition to rule based query, a free text search using Lucene indexing is supported providing a term filter on the query rules.

Queries and updates are initiated by a Query Request passed as a payload to the API.

A query request can contain a set of arguments or parameter variables passed into the query to be used at tun time.

Query model specifications

Specification of query clauses are described in a set of pages.

IMQ classes are a subset of the IM meta model classes i.e. set of plain data classes.























Grammar

This is the section on grammar