Information model query: Difference between revisions

From Endeavour Knowledge Base
Line 3: Line 3:
Its all very well modelling data, value sets, and ontologies. What about modelling the logical definitions of data sets or  profiles, this being usually referred to as  query?
Its all very well modelling data, value sets, and ontologies. What about modelling the logical definitions of data sets or  profiles, this being usually referred to as  query?


It is common practice in health care to model query definitions in plain text documents, which leave the interpretation of the logic to the vendors internal informatics team. This process creates a bottle neck, and is prone to human error. An approach which uses machine readable definitions can reduce the time and remove much of the human error i.e. transferring the risk of error from a human to a computer.
It is common practice in health care IT to model query definitions , intended for use in many different systems, in plain text documents, leaving  the interpretation of the logic into run time query languages to the vendors internal informatics teams. This process creates a bottle neck and is prone to human error, partly due to ambiguity of plain language. An approach which uses machine readable definitions can reduce the time and remove much of the human error.


Health data is generally held in relational, document,  or graph databases, each of which have different schemas and each come with a high level query language used to extract or create data. The best known of these languages is SQL, with other languages such as CYPHER, SPARQL, ELASTIC DSL, less well known.  Each of these languages are designed "close to the tin" so that the query engines can perform highly optimised query.
Each healthcare management system has their own database technologies, and database schemas. Each use their own approach to query, usually a combination of application code and SQL.


It is possible to model a query definition in SQL, but an SQL query brings with it the specific database schema.  Also, using SQL as a means of modelling query  brings the risk of poorly performing SQL, as SQL performs differently at different scales. For example a correlated subquery on a huge database may perform poorly  and  sometimes better represented as a Window function. One systems good SQL is another systems bad SQL.
It is possible to model a query definition in SQL, but an SQL query brings with it the specific database schema.  Also, using SQL as a means of modelling query  brings the risk of poorly performing SQL, as SQL performs differently at different scales om different schemas. For example a correlated subquery on a huge database may perform poorly  and  sometimes better represented as a Window function. Most application use optimisation techniques to make sure the SQL performs to its best.  One systems good SQL is another systems bad SQL.


Is is very difficult to construct understandable user interfaces directly from SQL or SPARQL, or vice versa, and thus most search and report applications create some form of intermediate representation. The principle behind IMQ is thus already established and IMQ is therefore "yet another DSL".
Is is very difficult to construct understandable user interfaces directly from SQL or SPARQL, or vice versa, and thus most search and report applications create some form of intermediate representation, which is their own model.


IM Query is a health query domain specific approach, designed to operate as an intermediary between plain language and the underlying run time query languages. The grammar is a constraint of main steam query languages and is a constraint on the query logic itself designed to match most health queries used in practice
IM considers health data to be a conceptual Graph, with the modelling of types, properties, and values as an RDF graph. This means that in query, the more natural language is SPARQL, which is the standard language used for RDF graph query. However, SPARQL is hard to understand  visually and quite difficult to write interpreters for. Given that most systems use SQL, it seems fairly pointless adopting a language that is specific to RDF and would need interpreting anyway.
 
IM Query is a health query domain specific approach, designed to operate as an intermediary between plain language and the underlying run time query languages. The grammar is a constraint of main steam query languages and is a constraint on the query logic itself designed to match most health queries used in practice.
 
In particular IM Query uses a set of syntactical short cuts to encapsulate complex query syntax into some simple statements.


== IMQ overview ==
== IMQ overview ==
IMQ can be considered as a DSL (a language with Grammar like GraphQL or SQL), or an Object model DSL such as Elastic. Both forms are supported. IMQ provides a pragmatic  JSON based object model  and a more succinct text to represent query definitions, and the IM services provide implementations in SQL and SPARQL via the use of interpreters.
IMQ can be considered as a DSL (a language with Grammar rules), or an Object model with query definitions as plain data objects that are instances of plain data classes . Both forms are supported. IMQ provides a pragmatic  JSON based object model  and a more succinct text to represent query definitions, and the IM services provide implementations in SQL and SPARQL via the use of interpreters.


The grammar or "structure"  of an IMQ query definition precisely follows the logic of a plain language description of the  the criteria to be applied to filter out sets from sets, and define the output required, and collectively this is a data set definition?
The grammar or "structure"  of an IMQ query definition precisely follows the logic of a plain language description of the  the criteria to be applied to filter out sets from sets, and define the output required, and thus is ideal for  data set definitions.


The structure of a query definition closely follows the standard patterns as illustrated in the main stream languages of SQL, SPARQL and CYPHER with object nesting supported to enable hierarchical object retrieval, akin to GRAPHQL.  
As health data is seen as a graph, the language follows the nodes and relationships in a direct acyclic manner, making graph traversal easier than the complex join syntax of SQL. Furthermore as most data sets include one to many relationships, IMQ supports and object form of output such as that used in GRAPHQL.  


Implementation syntax includes a JSON object format (IMQ-JSON) and a succinct text language equivalent (IMQ-text)   
Implementation syntax includes a JSON object format (IMQ-JSON) and a succinct text language equivalent (IMQ-text)   
Line 24: Line 28:
An IM  query definition follows a straightforward set of predicates which can be summarised as:   
An IM  query definition follows a straightforward set of predicates which can be summarised as:   


'''"From''' a set of things, '''where'''  those things have certain properties and values, '''Select''' the properties of those things, '''where'''  the properties have certain values." 
'''"From''' a set of things (e.g. patients) 
 
'''where'''  those things have certain properties and values, (e.g. observations, concepts and values), optionally ordered and limited (e.g. most recent)   
 
'''Select''' the properties of those things,  


i.e. a familiar approach of from/where/select/where clauses, each type of clause being made up of the necessary predicates to enable most rule based health queries to be modelled.  
'''where'''  the properties have certain values themselves being objects further filtered.


In line with the rest of the IM languages, IMQ uses an RDF approach to identifiers, thus enabling global identifiers for types, properties and value sets.   
In line with the rest of the IM languages, IMQ uses an RDF approach to identifiers, thus enabling global identifiers for types, properties and value sets.   

Revision as of 09:01, 11 March 2023

Background to IMQ

Its all very well modelling data, value sets, and ontologies. What about modelling the logical definitions of data sets or profiles, this being usually referred to as query?

It is common practice in health care IT to model query definitions , intended for use in many different systems, in plain text documents, leaving the interpretation of the logic into run time query languages to the vendors internal informatics teams. This process creates a bottle neck and is prone to human error, partly due to ambiguity of plain language. An approach which uses machine readable definitions can reduce the time and remove much of the human error.

Each healthcare management system has their own database technologies, and database schemas. Each use their own approach to query, usually a combination of application code and SQL.

It is possible to model a query definition in SQL, but an SQL query brings with it the specific database schema. Also, using SQL as a means of modelling query brings the risk of poorly performing SQL, as SQL performs differently at different scales om different schemas. For example a correlated subquery on a huge database may perform poorly and sometimes better represented as a Window function. Most application use optimisation techniques to make sure the SQL performs to its best. One systems good SQL is another systems bad SQL.

Is is very difficult to construct understandable user interfaces directly from SQL or SPARQL, or vice versa, and thus most search and report applications create some form of intermediate representation, which is their own model.

IM considers health data to be a conceptual Graph, with the modelling of types, properties, and values as an RDF graph. This means that in query, the more natural language is SPARQL, which is the standard language used for RDF graph query. However, SPARQL is hard to understand visually and quite difficult to write interpreters for. Given that most systems use SQL, it seems fairly pointless adopting a language that is specific to RDF and would need interpreting anyway.

IM Query is a health query domain specific approach, designed to operate as an intermediary between plain language and the underlying run time query languages. The grammar is a constraint of main steam query languages and is a constraint on the query logic itself designed to match most health queries used in practice.

In particular IM Query uses a set of syntactical short cuts to encapsulate complex query syntax into some simple statements.

IMQ overview

IMQ can be considered as a DSL (a language with Grammar rules), or an Object model with query definitions as plain data objects that are instances of plain data classes . Both forms are supported. IMQ provides a pragmatic JSON based object model and a more succinct text to represent query definitions, and the IM services provide implementations in SQL and SPARQL via the use of interpreters.

The grammar or "structure" of an IMQ query definition precisely follows the logic of a plain language description of the the criteria to be applied to filter out sets from sets, and define the output required, and thus is ideal for data set definitions.

As health data is seen as a graph, the language follows the nodes and relationships in a direct acyclic manner, making graph traversal easier than the complex join syntax of SQL. Furthermore as most data sets include one to many relationships, IMQ supports and object form of output such as that used in GRAPHQL.

Implementation syntax includes a JSON object format (IMQ-JSON) and a succinct text language equivalent (IMQ-text)

An IM query definition follows a straightforward set of predicates which can be summarised as:

"From a set of things (e.g. patients)

where those things have certain properties and values, (e.g. observations, concepts and values), optionally ordered and limited (e.g. most recent)

Select the properties of those things,

where the properties have certain values themselves being objects further filtered.

In line with the rest of the IM languages, IMQ uses an RDF approach to identifiers, thus enabling global identifiers for types, properties and value sets.

IMQ Structure

An IM query consists of a query request, which includes the necessary components to define a query, as well as a set of arguments that can be passed into the query and used at run time.

Query

A simple overall structure with nestable elements providing an object form input and output similar to GRAPHQL . A query may contain many queries, enabling a package of queries such as a column group report or full data set .

The request may fully define the query (dynamic query) or more commonly reference a pre-existing query definition via an IRI (i.e. a preformed query definition with variables resolved to the arguments passed in at run time). The pre-existing query definition is obtained from the "has Definition" property of a stored query entity.

High level query structure is as follows:

Plain Json
query {
     from {
           where {
               with {
                    then {}
                    }
                 }
          }
    select {
             where {}
             select {}
           }
       }
{"query" : {"from" : {
              "where" :[ {
                "with" : {
                 "then" : {}
                         }
                       ]}
                     },
             "select" :[{
                  "where" : {},
                   "select" :[ {}]
          }],
            "subQuery" :[ {}]

}


Simple example

Get me the full name of all patients aged >= 18 years.

Plain Json
query {
    from { 
     @:Patient
     where {
            :age >=18 units: years
           }
         }
   select { :fullName}
     }
{
"from" : {"@type": ":Patient",
           "where" :{"id" : ":age",
                      "operator" : ">=",
                      "value" : 18,
                      "unit" : "YEARS"}
         },
"select" : [ {"id" : ":fullName"}]
}

Clause specification

Specification of query clauses are described in the article IMQueryClauses

Subsumption query

A key differentiator of IMQ is the support for a variety of subsumption patterns in both the from and where clause. This makes IMQ compliant with expression constraint language when applied to concepts, but can also be used to incorporate subtypes of data model types.

The indicators are:

  • DescendantsOrSelfOf subtypes (or subclasses) are incorporated at run time. The can apply either in the from clause, the where property, or the value.
  • DescendantsOf indicates only subtypes are examined (ECL compliance)
  • AncestorsOf to enable the parent hierarchy to be transitively examined. Used in assessing allowable ranges and properties of concepts.

ECL support

Expression constraint language is supported by IMQ as the from/where logic maps precisely concepts refinements, attributes and attribute groups

Query Request

IMQ supports conventional query for extract, query based updates (deletion) and a special 'path query' for determining paths between two classes. In addition to rule based query, a free text search using Lucene indexing is supported providing a term filter on the query rules.

Queries and updates are initiated by a Query Request passed as a payload to the API.

A query request can contain a set of arguments or parameter variables passed into the query to be used at tun time.






















Grammar

This is the section on grammar