Information model query

From Endeavour Knowledge Base

Background

Having an information model is one thing. Querying to extract data is another.

As an RDF Graph knowledge base the information model can be directly queried using SPARQL. The IM holds text data which can be queried directly using open Search or elastic.

However, as Health records are likely to be stored as relational, or at least SQL compatible data bases, querying health records that are aligned with the model will require SQL to query them.

There are problems with SQL and SPARQL /Elastic as an approach to developing query of the IM or the health records that use them:

  • Directly authoring SQL and SPARQL languages require a high degree of skill and health query in particular needs heavily nested subqueries and including some of the more advanced techniques such as correlated query or window function.
  • Translating a user oriented intuitive query builder into SQL or SPARQL directly and in reverse is very difficult. Most query applications use an intermediate language from which the queries are then generated. Examples include GraphQL or Power BI DAX and M.
  • Enabling direct query via SPARQL end points or SQL APIs can result in crippling performance problems.

Consequently the IM provides a pragmatic Query domain specific language (DSL) to bridge the gap between a plain language representation and the run time query. This DSL can be used to exchange query definitions across multiple instances.

The IM also provides reference software showing how SQL or SPARQL or OpenSearch Query can be generated from the DSL and how a plain language or diagrammatic interpretation can be produced from the DSL. The reference software also shows the converse i.e. the generation of the DSL from plain language.

The language is designed to meet the following requirements

Query language requirements

Requirement 1 - Should support the vast majority of query patterns for defining and producing data sets or patient profiles that are needed in the real world.

Requirement 2 - Should enable mapping to SQL via simple type-table, property- field maps, for health data held in relational forms, as long as the health data content conforms to an IM data model

Requirement 3 - Should enable mapping to SPARQL directly for querying the IM itself

Requirement 4 - Should enable mapping directly from and to Expression constraint language (ECL) for searching and set definitions.

Requirement 5 - Should enable a query to be built as Java Script or POJO objects avoiding the need for a language specific parser.

Requirement 6- Should embed inference statements such as subtype, super type, or set inclusion as part of the query definition, thus avoiding the need for explicit modelling of the complex logic in the query itself

Requirement 7 - Should support object result format as well as relational format i.e. nested json object results as well as flat table results.

Language overview

The language follows the familiar pattern of most query languages with some variations and simplifications. As well as incorporating the core concepts of SQL and SPARQL it includes the nested structure approach as used by GRAPHQL.

A query consists of the following main clauses

Query  : Includes the iri, name, description, result format, use of prefixes, the main entity, Select clause and sub-select clause (where a query produces many column groups)

Select : Equivalent to SQL and SPARQL SELECT with GraphQL nesting. Includes a list of properties and aliases as well as any nested properties i.e. select property , select and a match clause and any ordering or limit

Match : Equivalent to SQL FROM/WHERE and SPARQL WHERE/FILTER. Defines the graph patterns filters or functions including ordering and limit followed by a test i.e. subquery.

Order (limit) : used in both select and match, Orders by a field and optionally limits return to a number

Example query

Query to calculate whether a patient is registered with a general practice on the reference date

{
  "@id" : "http://endhealth.info/im#Q_RegisteredGMS",
  "name" : "Patients registered for GMS services on the reference date",
  "description" : "For any registration period,a registration start date before the reference date and no end date,or an end date after the reference date.",
  "mainEntity" : {
    "@id" : "http://endhealth.info/im#Person"
  },
  "select" : {
    "entityType" : {
      "@id" : "http://endhealth.info/im#Person",
      "name" : "Person"
    },
    "match" : {
      "property" : {
        "@id" : "http://endhealth.info/im#isSubjectOf",
        "name" : "has GP registration"
      },
      "match" : {
        "entityType" : {
          "@id" : "http://endhealth.info/im#GPRegistration"
        },
        "and" : [ {
          "name" : "patient type is regular GMS Patient",
          "property" : {
            "@id" : "http://endhealth.info/im#gpPatientType"
          },
          "isConcept" : [ {
            "@id" : "http://endhealth.info/im#2751000252106",
            "name" : "Regular GMS patient"
          } ]
        }, {
          "name" : "start of registration is before the reference date",
          "property" : {
            "@id" : "http://endhealth.info/im#effectiveDate"
          },
          "value" : {
            "comparison" : "LESS_THAN_OR_EQUAL",
            "valueData" : "$ReferenceDate"
          }
        } ],
        "or" : [ {
          "name" : "the registration has not ended ",
          "notExist" : true,
          "property" : {
            "@id" : "http://endhealth.info/im#endDate"
          }
        }, {
          "name" : "the end of registration is after the reference date",
          "property" : {
            "@id" : "http://endhealth.info/im#endDate"
          },
          "value" : {
            "comparison" : "GREATER_THAN",
            "valueData" : "$ReferenceDate"
          }
        } ]
      }
    }
  }
}


ECL example


<<39330711000001103          # is a Covid vaccine
OR                                            #or (
<<10363601000001109:          # is a uk product
                                                                      #and
     <<s10362601000001103 = 10362601000001103} }      #has vmp Covd vaccine)



































Grammar

This is the section on grammar