Remote Subscriber Database (RSD) SQL guide (Compass 2): Difference between revisions

From Endeavour Knowledge Base
m (Formatting of images)
No edit summary
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This article describes the steps needed to create, and analyse, patient data selections from the Discovery database.
This article describes the steps needed to create, and analyse, patient data selections from the Discovery database.


= 1. Introduction =
= Introduction =
This outlines the steps we take to create the patient selection for analysis from the Discovery database.
This outlines the steps we take to create the patient selection for analysis from the Discovery database.


Line 10: Line 10:
[[File:RSD SQL Guide C2 Fig 1.1.jpg|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%201.1.jpg|frameless|633x633px]]
[[File:RSD SQL Guide C2 Fig 1.1.jpg|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%201.1.jpg|frameless|633x633px]]


= 2. Creating the Patient Cohort =
= Creating the Patient Cohort =
This is the first step when we receive a list of patients marked for observational study.
This is the first step when we receive a list of patients marked for observational study.


Line 16: Line 16:
'''Fig 2.1 Entity diagram showing the relationships between patient and organization'''
'''Fig 2.1 Entity diagram showing the relationships between patient and organization'''


[[File:RSD SQL Guide C2 Fig 2.1.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%202.1.png|frameless|532x532px]]
[[File:RSD SQL Guide C2 Fig 2.1.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%202.1.png|frameless|494x494px]]


We filter patients by their organization ODS codes to ensure only patients of the required GP practices are selected.
We filter patients by their organization ODS codes to ensure only patients of the required GP practices are selected.


'''Example 1.1 Building patient cohort SQL with NHS numbers'''
'''Example 1.1 Building patient cohort SQL with NHS numbers'''<syntaxhighlight lang="sql">
 
DROP TABLE IF EXISTS cohort_patient;
DROP TABLE IF EXISTS cohort_patient;


CREATE TABLE cohort_patient AS
CREATE TABLE cohort_patient AS
SELECT
SELECT
 
      p.id AS patient_id,
      p.id AS patient_id,
      p.nhs_number,
 
      p.person_id AS group_by,
      p.nhs_number,
      p.organization_id,
 
      p.date_of_birth -- add additional columns if required   
      p.person_id AS group_by,
 
      p.organization_id,
 
      p.date_of_birth -- add additional columns if required   
 
FROM patient p JOIN organization org ON p.organization_id = org.id
FROM patient p JOIN organization org ON p.organization_id = org.id
WHERE org.ods_code IN ('EMIS99','5PC64');  
WHERE org.ods_code IN ('EMIS99','5PC64');  


-- add ods codes inside the brackets e.g. EMIS99, 5PC64
-- add ods codes inside the brackets e.g. EMIS99, 5PC64
</syntaxhighlight>


===== Cohort with Filtered Observations =====
===== Cohort with Filtered Observations =====
Line 59: Line 51:
The observation table does not store these codes directly but store them as concept ids. On querying the data then, the concept table will be queried first to get the list of concept ids that you are interested in for this observation.
The observation table does not store these codes directly but store them as concept ids. On querying the data then, the concept table will be queried first to get the list of concept ids that you are interested in for this observation.


If you intend to filter observations by SNOMEDS then add in the concept map table which will map any legacy concept ids, that is, READ, CTV3 etc, to the SNOMED concept ids, then you can filter by SNOMED codes as shown:
If you intend to filter observations by SNOMEDS then add in the concept map table which will map any legacy concept ids, that is, READ, CTV3 etc, to the SNOMED concept ids, then you can filter by SNOMED codes as shown: <syntaxhighlight lang="sql">
 
 
 
SELECT
SELECT
  ...
  ...


  FROM patient p
FROM patient p
 
JOIN observation o on o.patient_id = p.id
     JOIN observation o on o.patient_id = p.id
JOIN concept_map cm on cm.legacy = o.non_core_concept_id
 
-- use the non core concept id for link to the concept table
     JOIN concept_map cm on cm.legacy = o.non_core_concept_id  
JOIN concept c on c.dbid = cm.core
 
WHERE c.code IN (46635009, 44054006) -- e.g. diabetes snomed codes
     -- use the non core concept id for link to the concept table
</syntaxhighlight>
 
     JOIN concept c on c.dbid = cm.core
 
     WHERE c.code IN (46635009, 44054006) -- e.g. diabetes snomed codes
 
From the episode of care, we can obtain whether the patient is currently registered and has the appropriate registration type for this study:
 
  SELECT


From the episode of care, we can obtain whether the patient is currently registered and has the appropriate registration type for this study:<syntaxhighlight lang="sql">
SELECT
  ...
  ...


  FROM patient p
FROM patient p
 
JOIN episode_of_care e ON e.patient_id = p.id
      JOIN episode_of_care e ON e.patient_id = p.id
JOIN concept c ON c.dbid = e.registration_type_concept_id
 
WHERE c.code = 'R'  -- registered
      JOIN concept c ON c.dbid = e.registration_type_concept_id
</syntaxhighlight>Some patients prefer to withhold consent and therefore are excluded from the cohort. We can check this in the observation table. Again we need to get a list of concept ids that you are interested in for this observation.
 
  WHERE c.code = 'R'  -- registered
 
Some patients prefer to withhold consent and therefore are excluded from the cohort. We can check this in the observation table. Again we need to get a list of concept ids that you are interested in for this observation.


For example, you could approach it in this way:
For example, you could approach it in this way:
Line 101: Line 79:
[[File:RSD_SQL_Guide_C2_Consent_Codes.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Consent%20Codes.png|alt=|frameless|448x448px]]
[[File:RSD_SQL_Guide_C2_Consent_Codes.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Consent%20Codes.png|alt=|frameless|448x448px]]


2.        Create a view to filter out the concept ids on the observation table with these codes:
2.        Create a view to filter out the concept ids on the observation table with these codes:<syntaxhighlight lang="sql">
 
-- filter out the observation table with the relevant consent codes
-- filter out the observation table with the relevant consent codes


DROP VIEW IF EXISTS obs_optout_view;
DROP VIEW IF EXISTS obs_optout_view;
       


CREATE VIEW obs_optout_view AS
CREATE VIEW obs_optout_view AS
SELECT
SELECT
       ob.id,
       ob.id,
       ob.patient_id,
       ob.patient_id,
       ob.organization_id,
       ob.organization_id,
       ob.non_core_concept_id,
       ob.non_core_concept_id,
       ob.clinical_effective_date,
       ob.clinical_effective_date,
       cpt.dbid,
       cpt.dbid,
       cpt.code,
       cpt.code,
       cpt.name,
       cpt.name,
       csc.status
       csc.status
FROM observation ob
FROM observation ob
JOIN concept cpt ON cpt.dbid = ob.non_core_concept_id
JOIN concept cpt ON cpt.dbid = ob.non_core_concept_id
JOIN consent_code csc ON cpt.id = csc.code
JOIN consent_code csc ON cpt.id = csc.code
AND cpt.id = BINARY csc.code;
AND cpt.id = BINARY csc.code;
</syntaxhighlight>


3.        Using the view to create a helper function to get the latest opt-out status of the patient:
3.        Using the view to create a helper function to get the latest opt-out status of the patient:<syntaxhighlight lang="sql">
 
-- function to retrieve the patient's latest opt out status
-- function to retrieve the patient's latest opt out status


DROP FUNCTION IF EXISTS get_optout_status;
DROP FUNCTION IF EXISTS get_optout_status;
DELIMITER //
DELIMITER //
CREATE FUNCTION get_optout_status(p_patient_id BIGINT,
CREATE FUNCTION get_optout_status(p_patient_id BIGINT,
 
                                  p_organization_id BIGINT)
                                p_organization_id BIGINT)
 
RETURNS VARCHAR(1)
RETURNS VARCHAR(1)
NOT DETERMINISTIC READS SQL DATA
NOT DETERMINISTIC READS SQL DATA


Line 158: Line 114:


DECLARE l_status          VARCHAR(1);
DECLARE l_status          VARCHAR(1);
DECLARE l_patient_id      BIGINT;
DECLARE l_patient_id      BIGINT;
DECLARE l_organization_id BIGINT;
DECLARE l_organization_id BIGINT;


  SELECT
  SELECT
 
        ob.patient_id,
      ob.patient_id,
        ob.organization_id,
 
        IF (ob.status = 'Opt-Out', 1, IF(ob.status = 'Opt-In',null,null))
      ob.organization_id,
        INTO l_patient_id, l_organization_id, l_status
 
      IF (ob.status = 'Opt-Out', 1, IF(ob.status = 'Opt-In',null,null))
 
      INTO l_patient_id, l_organization_id, l_status
 
  FROM obs_optout_view ob
  FROM obs_optout_view ob
  LEFT JOIN obs_optout_view ob2 ON ob2.patient_id = ob.patient_id
  LEFT JOIN obs_optout_view ob2 ON ob2.patient_id = ob.patient_id
  AND (ob.clinical_effective_date < ob2.clinical_effective_date
  AND (ob.clinical_effective_date < ob2.clinical_effective_date
  OR (ob.clinical_effective_date = ob2.clinical_effective_date
  OR (ob.clinical_effective_date = ob2.clinical_effective_date
       AND ob.id < ob2.id))
       AND ob.id < ob2.id))
 
  WHERE ob2.patient_id IS NULL
  WHERE ob2.patient_id IS NULL  
 
  AND ob.patient_id = p_patient_id
  AND ob.patient_id = p_patient_id
  AND ob.organization_id = p_organization_id LIMIT 1;
  AND ob.organization_id = p_organization_id LIMIT 1;


RETURN l_status;
RETURN l_status;
END//
END//
DELIMITER ;
DELIMITER ;
</syntaxhighlight>Where 1 = opt out, null = opt in or not recorded derived from the latest observation from the consent codes.


Where 1 = opt out, null = opt in or not recorded derived from the latest observation from the consent codes.
Putting all of the above together, patient cohort query becomes this:<syntaxhighlight lang="sql">
CREATE TABLE cohort_patient AS
  SELECT DISTINCT
         p.id         AS patient_id,
         p.person_id  AS group_by,
         get_optout_status (p.id, p.registered_practice_organization_id) AS optout_status,
         IF (p.date_of_death IS NULL,
              TIMESTAMPDIFF(YEAR, p.date_of_birth, CURDATE()),
              TIMESTAMPDIFF(YEAR, p.date_of_birth, p.date_of_death)) age_years
  FROM patient p
  JOIN observation o ON p.id = o.patient_id
  JOIN organization org ON org.id = o.organization_id
  JOIN episode_of_care e ON e.patient_id = o.patient_id
  JOIN concept c ON c.dbid = e.registration_type_concept_id
  JOIN (SELECT o.patient_id
        FROM observation o
        JOIN concept_map cm ON cm.legacy = o.non_core_concept_id
        JOIN concept c ON c.dbid = cm.core
        WHERE c.code IN (46635009, 44054006) -- e.g. snomed codes
        ) ob ON ob.patient_id = p.id
  WHERE org.ods_code IN ('EMIS99','5PC64') -- e.g. ods codes
  AND p.date_of_death IS NULL
  AND c.code = 'R'  -- registered patient
  AND e.date_registered <= now()
  AND (e.date_registered_end > now() OR
       e.date_registered_end IS NULL) -- active
  AND IF (p.date_of_death IS NULL,
         TIMESTAMPDIFF(YEAR, p.date_of_birth, CURDATE()),
         TIMESTAMPDIFF(YEAR, p.date_of_birth, p.date_of_death)) >= 12 -- age 12 and over
  AND get_optout_status(p.id, p.registered_practice_organization_id) IS NULL; -- opt in status
</syntaxhighlight>


Putting all of the above together, patient cohort query becomes this:
= Creating the Observations Cohort =
 
We create a separate observations cohort for query performance. Instead of scanning through the entire observation table, we create a sub-set of the observation table based on the patient cohort.
  CREATE TABLE cohort_patient AS


'''Example 2.1 Building the Observations Cohort SQL'''<syntaxhighlight lang="sql">
CREATE TABLE cohort_observations AS
  SELECT DISTINCT
  SELECT DISTINCT
         o.id,
         o.person_id AS group_by,
         o.patient_id,
         o.person_id,
         o.clinical_effective_date,
         o.non_core_concept_id,
         o.result_value,
         o.result_value_units,
         o.result_date,
         o.result_text,
         cr.age_year
  FROM observation o JOIN cohort_patient cr
  ON o.person_id = cr.person_id
  AND o.organization_id = cr.organization_id
  AND o.patient_id = cr.patient_id;


  p.id        AS patient_id,
CREATE INDEX obs_cpt_ix ON cohort_observation(non_core_concept_id);
CREATE INDEX obs_pat_ix ON cohort_observation(patient_id);
</syntaxhighlight>


  p.person_id AS group_by,
= Creating the Medications Cohort =
Again, we create the medications cohort for query performance. We create a sub-set of the medication statement/order table based on the patient cohort.


  get_optout_status (p.id, p.registered_practice_organization_id)
'''Fig 4.1 Entity diagram showing the relationships between patient, medication statement and organization'''


  AS optout_status,
[[File:Fig_4.1.png|link=https://wiki.discoverydataservice.org/File:Fig%204.1.png|alt=|frameless|633x633px]]


  IF (p.date_of_death IS NULL,
'''Example 4.1 Building the Medications Cohort SQL'''<syntaxhighlight lang="sql">
-- step 1: cohort of the medication statement


       TIMESTAMPDIFF(YEAR, p.date_of_birth, CURDATE()),
DROP TABLE IF EXISTS cohort_medications_stmt;


       TIMESTAMPDIFF(YEAR, p.date_of_birth, p.date_of_death)) age_years
CREATE TABLE cohort_medications_stmt AS
     SELECT DISTINCT
            m.id,
            m.non_core_concept_id,
            m.person_id,
            m.patient_id,
            m.clinical_effective_date,
            m.cancellation_date
     FROM medication_statement m
     JOIN cohort_patient cr
     ON m.organization_id = cr.organization_id
     AND m.person_id = cr.person_id
     AND m.patient_id = cr.patient_id;


  FROM patient p
CREATE INDEX med_stmt_ix ON cohort_medications_stmt(id);


  JOIN observation o ON p.id = o.patient_id
-- step 2: cohort of the medication oder


  JOIN organization org ON org.id = o.organization_id
DROP TABLE IF EXISTS cohort_medications_ord;


  JOIN episode_of_care e ON e.patient_id = o.patient_id
CREATE TABLE cohort_medications_ord AS  
     SELECT DISTINCT
            mo.id,
            mo.medication_statement_id
            mo.person_id,
            mo.patient_id,
            mo.clinical_effective_date
     FROM medication_order mo
     JOIN cohort_patient cr
     ON mo.organization_id = cr.organization_id
     AND mo.person_id = cr.person_id
     AND mo.patient_id = cr.patient_id;


  JOIN concept c ON c.dbid = e.registration_type_concept_id
CREATE INDEX med_ord_ix ON cohort_medications_ord(medication_statement_id);    


  JOIN (SELECT o.patient_id
-- step 3: cohort of the medication - statement plus order


       FROM observation o
DROP TABLE IF EXISTS cohort_medications;


       JOIN concept_map cm ON cm.legacy = o.non_core_concept_id
CREATE TABLE cohort_medications AS
     SELECT DISTINCT
            ms.id,
            ms.non_core_concept_id,
            ms.person_id,
            ms.patient_id,
            mo.clinical_effective_date,
            ms.cancellation_date
      FROM cohort_medications_stmt ms JOIN cohort_medications_ord mo
      ON ms.id = mo.medication_statement_id;


       JOIN concept c ON c.dbid = cm.core
CREATE INDEX med_ix ON cohort_medications(non_core_concept_id);
CREATE INDEX med_pat_ix ON cohort_medications(patient_id);        
</syntaxhighlight>


       WHERE c.code IN (46635009, 44054006) -- e.g. snomed codes
= Building Datasets =
This involves creating a number of temporary tables in the database.


       ) ob ON ob.patient_id = p.id
===== Patient Demographic Dataset =====
A typical patient demographic dataset may consist of the following information:


  WHERE org.ods_code IN ('EMIS99','5PC64') -- e.g. ods codes
[[File:RSD_SQL_Guide_C2_Example_Patient_Demographic_Dataset.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Example%20Patient%20Demographic%20Dataset.png|alt=|frameless|1294x1294px]]


  AND p.date_of_death IS NULL
For this example, we create the dataset to hold the patient demographic information as follows:<syntaxhighlight lang="sql">
 
CREATE TABLE demographicsDataset (
  AND c.code = 'R'  -- registered patient
  ExtractDate                    DATETIME     NULL,
 
  Pseudo_id                      VARCHAR(255) NULL,   -- unique identifier
  AND e.date_registered <= now()
  Pseudo_NHSNumber               VARCHAR(255) NULL,
  Gender                         VARCHAR(50)  NULL,
  Age                            VARCHAR(50)  NULL,
  DateOfBirth                    DATE         NULL,
  EthnicityLCode                 VARCHAR(50)  NULL,
  EthnicityLTerm                 VARCHAR(200) NULL,
  BirthCountryLCode              VARCHAR(50)  NULL,
  BirthCountryLTerm              VARCHAR(200) NULL,
  RegistrationStart              DATE         NULL,
  RegistrationEnd                DATE         NULL,
  IMD2010                        VARCHAR(50)  NULL,
  LSOA2011                       VARCHAR(50)  NULL,
  PracticeODSCode                VARCHAR(50)  NULL,
  PracticeODSName                VARCHAR(255) NULL,
  CCGName                        VARCHAR(100) NULL,
  YearOfDeath                    INT(4)       NULL,
  F2fVisits_Total                INT(11)      DEFAULT 0,
  F2fVisits_1year                INT(11)      DEFAULT 0,
  F2fVisits_5years               INT(11)      DEFAULT 0
);


  AND (e.date_registered_end > now() OR
ALTER TABLE demographicsDataset ADD INDEX demo_pseudoid_idx (pseudo_id);


      e.date_registered_end IS NULL) -- active
-- the unique identifier is usually the pseudo id
-- but to adapt for this schema we could use the person_id
</syntaxhighlight>


  AND IF (p.date_of_death IS NULL,
We pre-populate the dataset with the unique identifiers from the patient cohort:<syntaxhighlight lang="sql">
INSERT INTO demographicsDataset (pseudo_id, extractdate) SELECT DISTINCT group_by, now() FROM cohort_patient;
</syntaxhighlight>


  TIMESTAMPDIFF(YEAR, p.date_of_birth, CURDATE()),
Then we run separate update SQL queries to populate the columns of the dataset table.


  TIMESTAMPDIFF(YEAR, p.date_of_birth, p.date_of_death)) >= 12
For example, add registration start date, registration end date, practice ods code, practice ods name:<syntaxhighlight lang="sql">
DROP TEMPORARY TABLE IF EXISTS reg_sort;
DROP TEMPORARY TABLE IF EXISTS qry_reg;


  -- age 12 and over
CREATE TEMPORARY TABLE qry_reg AS
SELECT c.group_by,
       e.date_regirstered,
       e.date_registered_end,
       o.name,
       o.ods_code
FROM cohort_patient c
JOIN episode_of_care e ON e.person_id = c.group_by
JOIN organization o ON o.id = e.organization_id
WHERE o.ods_code IN ('EMIS99','5PC64'); -- e.g. ods codes


  AND get_optout_status(p.id, p.registered_practice_organization_id) IS NULL;


  -- opt in status
CREATE TEMPORARY TABLE reg_sort (
 
      row_id               INT,
= 3. Creating the Observations Cohort =
      group_by             BIGINT,
We create a separate observations cohort for query performance. Instead of scanning through the entire observation table, we create a sub-set of the observation table based on the patient cohort.
      date_registered      DATE,
 
      date_regiostered_end DATE,
'''Example 2.1 Building the Observations Cohort SQL'''
      ods_code             VARCHAR(50),
 
      name                 VARCHAR(255),
CREATE TABLE cohort_observations AS
      rnk                  INT, PRIMARY KEY(row_id)) AS
SELECT (@row_no := @row_no + 1) AS row_id,
       a.group_by,
       a.date_registered,
       a.date_registered_end,
       a.ods_code,      
       a.name,
       a.rnk
FROM (SELECT q.group_by,
            q.date_registered,
            q.date_registered_end,
            q.name,
            q.ods_code,
            @currank := IF(@curperson = q.group_by, @currank + 1, 1) AS rnk,
            @curperson := q.group_by AS cur_person
      FROM qry_reg q, (SELECT @currank := 0, @curperson := 0) r
      ORDER BY q.group_by, q.date_registered DESC ) a,
      (SELECT @row_no := 0) t
WHERE a.rnk = 1;


      SELECT DISTINCT
SET @row_id = 0;


             o.id,
WHILE EXISTS (SELECT row_id from reg_sort
             WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO


             o.person_id AS group_by,
       UPDATE demographicsDataset d
       JOIN reg_sort reg ON d.pseudo_id = reg.group_by
       SET d.RegistrationStart = reg.date_registered,
           d.RegistrationEnd = reg.date_registered_end,
           d.PracticeODSCode = reg.ods_code,
           d.PracticeODSName = reg.name
       WHERE reg.row_id > @row_id
       AND reg.row_id <= @row_id + 1000;


             o.patient_id,
       SET @row_id = @row_id + 1000;  


             o.person_id,
END WHILE;
</syntaxhighlight>In this example, we created 2 temporary tables. The first to hold the registration data. The second to select from the first the records which contain the latest registration date.


             o.clinical_effective_date,
If the update is large in terms of the number of records to update then it might be possible to optimise the update process by using batches as demonstrated in the code example above.


             o.non_core_concept_id,
===== Patient Observations Dataset =====
A typical patient observations dataset may ask for the following information:


             o.result_value,
[[File:RSD_SQL_Guide_C2_Example_Patient_Observations_dataset.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Example%20Patient%20Observations%20dataset.png|alt=|frameless|1122x1122px]]


             o.result_value_units,
A typical dataset can have over 100 to 200 fields. Usually, there will be several datasets to deliver. But this illustrates the kind of layout we use for constructing an observational dataset.


             o.result_date,
For this example, we create the dataset to hold the observational information as follows:<syntaxhighlight lang="sql">
DROP TABLE IF EXISTS diagnoses2dataset;


             o.result_text,
CREATE TABLE diagnoses2dataset (
 
  Pseudo_id                                        VARCHAR(255) NULL,
             cr.age_year
  Pseudo_NHSNumber                                 VARCHAR(255) NULL,
 
  AsthmaECode                                      VARCHAR(50) NULL,
     FROM observation o JOIN cohort_patient cr
  AsthmaETerm                                      VARCHAR(200) NULL,
 
  AsthmaEDate                                      VARCHAR(50) NULL,
     ON o.person_id = cr.person_id
  AsthmaEmergeECode                                VARCHAR(50) NULL,
 
  AsthmaEmergeETerm                                VARCHAR(200) NULL,
     AND o.organization_id = cr.organization_id
  AsthmaEmergeEDate                                VARCHAR(50) NULL,
 
  AsthmaResolvedECode                              VARCHAR(50) NULL,
     AND o.patient_id = cr.patient_id;
  AsthmaResolvedETerm                              VARCHAR(200) NULL,
 
  AsthmaResolvedEDate                              VARCHAR(50) NULL,
     
  COPDECode                                        VARCHAR(50) NULL,
 
  COPDETerm                                        VARCHAR(200) NULL,
CREATE INDEX obs_cpt_ix ON cohort_observation(non_core_concept_id);
  COPDEDate                                        VARCHAR(50) NULL,
 
  PulmonaryFibrosisECode                           VARCHAR(50) NULL,
CREATE INDEX obs_pat_ix ON cohort_observation(patient_id);
  PulmonaryFibrosisETerm                           VARCHAR(200) NULL,
 
  PulmonaryFibrosisEDate                           VARCHAR(50) NULL,
= 4. Creating the Medications Cohort =
  InterstitialLungDiseaseECode                     VARCHAR(50) NULL,
Again, we create the medications cohort for query performance. We create a sub-set of the medication statement/order table based on the patient cohort.
  InterstitialLungDiseaseETerm                     VARCHAR(200) NULL,
 
  InterstitialLungDiseaseEDate                     VARCHAR(50) NULL,
'''Fig 4.1 Entity diagram showing the relationships between patient, medication statement and organization'''
  AgeRelatedMuscularDegenerationECode              VARCHAR(50) NULL,
 
  AgeRelatedMuscularDegenerationETerm              VARCHAR(200) NULL,
[[File:Fig_4.1.png|link=https://wiki.discoverydataservice.org/File:Fig%204.1.png|alt=|frameless|633x633px]]
  AgeRelatedMuscularDegenerationEDate              VARCHAR(50) NULL,
 
  GlaucomaECode                                    VARCHAR(50) NULL,
'''Example 4.1 Building the Medications Cohort SQL'''
  GlaucomaETerm                                    VARCHAR(200) NULL,
 
  GlaucomaEDate                                    VARCHAR(50) NULL,
-- step 1: cohort of the medication statement
  RheumatoidArthritisECode                         VARCHAR(50) NULL,
 
  RheumatoidArthritisETerm                         VARCHAR(200) NULL,
DROP TABLE IF EXISTS cohort_medications_stmt;
  RheumatoidArthritisEDate                         VARCHAR(50) NULL,
 
  SystemicLupusECode                               VARCHAR(50) NULL,
CREATE TABLE cohort_medications_stmt AS
  SystemicLupusETerm                               VARCHAR(200) NULL,
 
  SystemicLupusEDate                               VARCHAR(50) NULL,
     SELECT DISTINCT
  InflammatoryBowelDiseaseECode                    VARCHAR(50) NULL,
 
  InflammatoryBowelDiseaseETerm                    VARCHAR(200) NULL,
            m.id,
  InflammatoryBowelDiseaseEDate                    VARCHAR(50) NULL,
 
  CrohnsDiseaseECode                               VARCHAR(50) NULL,
            m.non_core_concept_id,
  CrohnsDiseaseETerm                               VARCHAR(200) NULL,
 
  CrohnsDiseaseEDate                               VARCHAR(50) NULL,
            m.person_id,
  UlcerativeColitisCodeECode                       VARCHAR(50) NULL,
 
  UlcerativeColitisCodeETerm                       VARCHAR(200) NULL,
            m.patient_id,
  UlcerativeColitisCodeEDate                       VARCHAR(50) NULL,
 
  AtopicDermatitisECode                            VARCHAR(50) NULL,
            m.clinical_effective_date,
  AtopicDermatitisETerm                            VARCHAR(200) NULL,
 
  AtopicDermatitisEDate                            VARCHAR(50) NULL,
            m.cancellation_date
  InheritedMucociliaryClearanceECode               VARCHAR(50) NULL,
 
  InheritedMucociliaryClearanceETerm               VARCHAR(200) NULL,
     FROM medication_statement m
  InheritedMucociliaryClearanceEDate               VARCHAR(50) NULL,
 
  PrimaryCiliaryDyskinesiaECode                    VARCHAR(50) NULL,
     JOIN cohort_patient cr
  PrimaryCiliaryDyskinesiaETerm                    VARCHAR(200) NULL,
 
  PrimaryCiliaryDyskinesiaEDate                    VARCHAR(50) NULL,
     ON m.organization_id = cr.organization_id
  MelanomaECode                                    VARCHAR(50) NULL,
 
  MelanomaETerm                                    VARCHAR(200) NULL,
     AND m.person_id = cr.person_id
  MelanomaEDate                                    VARCHAR(50) NULL,
  ProstateCancerECode                              VARCHAR(50) NULL,
  ProstateCancerETerm                              VARCHAR(200) NULL,
  ProstateCancerEDate                              VARCHAR(50) NULL,
  LungCancerECode                                  VARCHAR(50) NULL,
  LungCancerETerm                                  VARCHAR(200) NULL,
  LungCancerEDate                                  VARCHAR(50) NULL,
  SmallBowelCancerECode                            VARCHAR(50) NULL,
  SmallBowelCancerETerm                            VARCHAR(200) NULL,
  SmallBowelCancerEDate                            VARCHAR(50) NULL,
  ColorectalCancerECode                            VARCHAR(50) NULL,
  ColorectalCancerETerm                            VARCHAR(200) NULL,
  ColorectalCancerEDate                            VARCHAR(50) NULL,
  BreastCancerECode                                VARCHAR(50) NULL,
  BreastCancerETerm                                VARCHAR(200) NULL,
  BreastCancerEDate                                VARCHAR(50) NULL,
  MiscarriageECode                                 VARCHAR(50) NULL,
  MiscarriageETerm                                 VARCHAR(200) NULL,
  MiscarriageEDate                                 VARCHAR(50) NULL
);


     AND m.patient_id = cr.patient_id;
ALTER TABLE diagnoses2dataset ADD INDEX pseudoid_idx (pseudo_id);


CREATE INDEX med_stmt_ix ON cohort_medications_stmt(id);  
INSERT INTO diagnoses2dataset (pseudo_id)
SELECT DISTINCT group_by FROM cohort_patient;
</syntaxhighlight>Each column of the dataset is named by joining pieces of identifiable information together by combining the observational type, the timeframe for the analysis and the requested field type.


-- step 2: cohort of the medication oder
'''Fig 5.1 Shows the naming convention of a dataset column'''


DROP TABLE IF EXISTS cohort_medications_ord;
[[File:RSD_SQL_Guide_C2_Fig_5.1.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%205.1.png|alt=|frameless|744x744px]]


CREATE TABLE cohort_medications_ord AS  
Again, we pre-populate the dataset with the unique identifiers from the patient cohort.


     SELECT DISTINCT
Then we run separate update SQL queries to populate the dataset table by calling multiple stored procedures.


            mo.id,
===== Patient Medications Dataset =====
A typical medication request would look like this:


            mo.medication_statement_id
[[File:RSD_SQL_Guide_C2_Medication_Request.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Medication%20Request.png|alt=|frameless|1235x1235px]]


            mo.person_id,
For this example, we create the dataset to hold the medication information as follows:<syntaxhighlight lang="sql">
CREATE TABLE medications (
  pseudo_id            VARCHAR(255) DEFAULT NULL,
  nhsnumber            VARCHAR(10)  DEFAULT NULL,
  codedate             VARCHAR(20)  DEFAULT NULL,
  codeterm             VARCHAR(255) DEFAULT NULL,
  code                 VARCHAR(100) DEFAULT NULL,
  codevalue            VARCHAR(100) DEFAULT NULL,
  codeunit             VARCHAR(100) DEFAULT NULL
);
</syntaxhighlight>Again, we pre-populate the dataset with the unique identifiers from the patient cohort.


            mo.patient_id,
Then we run separate update SQL queries to populate the dataset table by calling multiple stored procedures.


            mo.clinical_effective_date
= Calling Stored Procedures =
Once we have generated the observation and medication datasets and cohorts, the next step is to populate the datasets using stored procedures.


     FROM medication_order mo
We construct a series of populate stored procedures which in turn call on other stored procedures to complete the update process.


     JOIN cohort_patient cr
'''Fig 6.1 Shows the datasets update process using stored procedures'''


     ON mo.organization_id = cr.organization_id
[[File:RSD_SQL_Guide_C2_Fig_6.1.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%206.1.png|alt=|frameless|590x590px]]<br />
 
=== Calling the Populate Stored Procedures ===
     AND mo.person_id = cr.person_id
Create a SQL script to call the stored procedure as follows:<syntaxhighlight lang="sql">
 
CALL populateCodeDate
     AND mo.patient_id = cr.patient_id;
(0,'StillbirthE','diagnoses2dataset',0,'237364002,161743003',null,null,null,'N');
 
CALL populateCodeDate
   
(0,'PregnancyInducedHypertensionE','diagnoses2dataset',0,null,'48194001',null,null,'N');
 
CALL populateCodeDate
CREATE INDEX med_ord_ix ON cohort_medications_ord(medication_statement_id);    
(0,'PreEclampsiaE','diagnoses2dataset',0,'398254007',null,null,null,'N');
 
CALL populateCodeDate
-- step 3: cohort of the medication - statement plus order
(0,'CholestasisE','diagnoses2dataset',0,'235888006',null,null,null,'N');
 
CALL populateCodeDate
DROP TABLE IF EXISTS cohort_medications;
(0,'GallstonesE','diagnoses2dataset',0,'266474003,407637009',null,null,null,'N');
 
CALL populateCodeDate
CREATE TABLE cohort_medications AS
(0,'GoutE','diagnoses2dataset',0,'90560007,161451004',null,null,null,'N');
 
CALL populateCodeDate
     SELECT DISTINCT
(0,'AnkylosingSpondylitisE','diagnoses2dataset',0,'9631008',null,null,null,'N');
 
CALL populateCodeDate
            ms.id,
(0,'JaundiceE','diagnoses2dataset',0,'18165001,161536006',null,null,null,'N');
 
CALL populateCodeDate
            ms.non_core_concept_id,
(0,'PsoriasisE','diagnoses2dataset',0,'9014002',null,null,null,'N');
 
CALL populateCodeDate
            ms.person_id,
(0,'DeafnessE','diagnoses2dataset',0,'15188001',null,null,null,'N');
 
CALL populateCodeDate
            ms.patient_id,
(0,'HearingAidE','diagnoses2dataset',0,'365240006',null,null,null,'N');
 
</syntaxhighlight>The procedure accepts a list of input parameters (this may vary depending on requirements):
            mo.clinical_effective_date,
{| class="wikitable"
 
|Parameter
            ms.cancellation_date
|Data Type
 
|Description
      FROM cohort_medications_stmt ms JOIN cohort_medications_ord mo
|-
 
|filterType
      ON ms.id = mo.medication_statement_id;
|INT
 
|1 latest, 0 earliest, 2 ever, 3 pivot
CREATE INDEX med_ix ON cohort_medications(non_core_concept_id);
|-
 
|col
CREATE INDEX med_pat_ix ON cohort_medications(patient_id);        
|VARCHAR(100)
 
|the root of the column name
= 5. Building Datasets =
|-
This involves creating a number of temporary tables in the database.
|datasetTable
 
|VARCHAR(100)
===== Patient Demographic Dataset =====
|table name of dataset
A typical patient demographic dataset may consist of the following information:
|-
 
|reset
[[File:RSD_SQL_Guide_C2_Example_Patient_Demographic_Dataset.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Example%20Patient%20Demographic%20Dataset.png|alt=|frameless|1294x1294px]]
|BIT
 
|1 reset, 0 no reset of the dataset table
For this example, we create the dataset to hold the patient demographic information as follows:
|-
 
|codesToAdd1
CREATE TABLE demographicsDataset (
|VARCHAR(5000)
 
|include all snomed parents and their  children
  ExtractDate                    DATETIME     NULL,
|-
|codesToAdd2
|VARCHAR(5000)
|include snomed parents only
|-
|codesToRemove3
|VARCHAR(5000)
|exclude snomed parents and their children
|-
|codesToRemove4
|VARCHAR(5000)
|exclude only snomed parents
|-
|ignorenulls
|VARCHAR(1)
|Y or N - whether to ignore null result  value
|}
A typical populate procedure is outlined below. (Please note this is only an example and may need modifications to it depending on your requirements).<syntaxhighlight lang="sql">
DROP PROCEDURE IF EXISTS populateCodeDate;


  Pseudo_id                      VARCHAR(255) NULL,   -- unique identifier
DELIMITER //
CREATE PROCEDURE populateCodeDate (
IN filterType     INT, -- 1 latest, 0 earliest, 2 ever, 3 pivot
IN col            VARCHAR(100), -- the root of the column name
IN datasetTable   VARCHAR(100), -- table name of dataset
IN reset          BIT, -- 1 reset, 0 no reset
IN codesToAdd1    VARCHAR(5000), -- all parents and their children
IN codesToAdd2    VARCHAR(5000), -- parents only
IN codesToRemove3 VARCHAR(5000), -- parents and their children to be excluded
IN codesToRemove4 VARCHAR(5000), -- just parents to be excluded
IN ignorenulls    VARCHAR(1)
)


  Pseudo_NHSNumber               VARCHAR(255) NULL,
BEGIN


  Gender                         VARCHAR(50)  NULL,
-- reset snomeds table
DELETE FROM snomeds WHERE cat_id IN (1, 2, 3, 4);
-- reset store table
DELETE FROM store WHERE id IN (1, 2, 3, 4);


  Age                            VARCHAR(50)  NULL,
-- get snomed concept ids


  DateOfBirth                    DATE         NULL,
  IF codesToAdd1 IS NOT NULL THEN
      CALL storeSnomedString(codesToAdd1, 1);
      CALL getAllSnomedsConceptIds (1);
  END IF;


  EthnicityLCode                 VARCHAR(50NULL,
  IF codesToAdd2 IS NOT NULL THEN
      CALL storeSnomedString (codesToAdd2, 2);
      CALL getAllSnomedsConceptIds (2);
  END IF;


  EthnicityLTerm                 VARCHAR(200NULL,
  IF codesToRemove3 IS NOT NULL THEN
      CALL storeSnomedString (codesToRemove3, 3);
      CALL getAllSnomedsConceptIds (3);
  END IF;


  BirthCountryLCode              VARCHAR(50NULL,
  IF codesToRemove4 IS NOT NULL THEN
      CALL storeSnomedString (codesToRemove4, 4);
      CALL getAllSnomedsConceptIds (4);
  END IF;


  BirthCountryLTerm              VARCHAR(200)  NULL,
-- filter observation based on the snomed codes


  RegistrationStart              DATE  NULL,
CALL filterObservations(filterType, 1, ignorenulls);


  RegistrationEnd                DATE  NULL,
-- reset columns


  IMD2010                        VARCHAR(50)  NULL,
IF (reset = 1) THEN


  LSOA2011                       VARCHAR(50)  NULL,
  SET @reset_sql = CONCAT('UPDATE ', datasetTable, ' SET ',
  col, "Code = null, ",
  col, "Term = null, ",
  col, "Date = null");


  PracticeODSCode                VARCHAR(50)  NULL,
  PREPARE resetStmt FROM @reset_sql;
  EXECUTE resetStmt;
  DEALLOCATE PREPARE resetStmt;


  PracticeODSName                VARCHAR(255)  NULL,
END IF;


  CCGName                        VARCHAR(100) NULL,
  DROP TEMPORARY TABLE IF EXISTS qry_tmp;


  YearOfDeath                    INT(4NULL,
  CREATE TEMPORARY TABLE qry_tmp (
      row_id                  INT,
      group_by                VARCHAR(255),
      original_code           VARCHAR(20),
      original_term           VARCHAR(200),
      clinical_effective_date DATE, PRIMARY KEY(row_id)
  ) AS


  F2fVisits_Total                INT(11) DEFAULT 0,
  SELECT (@row_no := @row_no+1) AS row_id,
         f.group_by,
         f.original_code,
         f.original_term,
         f.clinical_effective_date
  FROM filteredObservations f, (SELECT @row_no := 0) t;


  F2fVisits_1year                INT(11) DEFAULT 0,
SET @row_id = 0;


  F2fVisits_5years               INT(11) DEFAULT 0
WHILE EXISTS (SELECT row_id from qry_tmp
              WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO


);
  SET @sql = CONCAT('UPDATE ', datasetTable,
 
             ' d JOIN qry_tmp f ON d.pseudo_id = f.group_by SET ',
ALTER TABLE demographicsDataset ADD INDEX demo_pseudoid_idx (pseudo_id);
             col, "Code = f.original_code, ",
             col, "Term = f.original_term, ",
             col, "Date = date_format(f.clinical_effective_date, '%d/%m/%Y')
             WHERE f.row_id > @row_id AND f.row_id <= @row_id + 1000");
   
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DEALLOCATE PREPARE stmt;


-- the unique identifier is usually the pseudo id
  SET @row_id = @row_id + 1000;


-- but to adapt for this schema we could use the person_id
END WHILE;
END//
DELIMITER ;
</syntaxhighlight>


We pre-populate the dataset with the unique identifiers from the patient cohort:
==== Retrieving the SNOMED Concept Ids ====
The procedure accepts a list of SNOMED codes as input parameter.


INSERT INTO demographicsDataset (pseudo_id, extractdate) SELECT DISTINCT group_by, now() FROM cohort_patient;
'''Fig 6.2 Shows how the SNOMED codes are being retrieved using stored procedures'''


Then we run separate update SQL queries to populate the columns of the dataset table.
[[File:RSD_SQL_Guide_C2_Fig_6.2.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%206.2.png|alt=|frameless|818x818px]]<br />
===== Calling the storeSnomedString procedure =====
We call the storeSnomedString procedure to separate the list of SNOMED codes into their individual values and store them into a temporary table:<syntaxhighlight lang="sql">
DROP TABLE IF EXISTS store;


For example, add registration start date, registration end date, practice ods code, practice ods name:
CREATE TABLE store (
  id            INT,
  org_snomed_id BIGINT
);


DROP TEMPORARY TABLE IF EXISTS reg_sort;
ALTER TABLE store ADD INDEX store_idx (org_snomed_id);


DROP TEMPORARY TABLE IF EXISTS qry_reg;
DROP PROCEDURE IF EXISTS storeSnomedString;


CREATE TEMPORARY TABLE qry_reg AS
DELIMITER //


SELECT c.group_by,
CREATE PROCEDURE storeSnomedString (
  IN stringValue VARCHAR(5000),
  IN cat_id      INT
)


      e.date_regirstered,
BEGIN


      e.date_registered_end,
  DECLARE front       VARCHAR(5000) DEFAULT NULL;
  DECLARE frontlen    INT           DEFAULT NULL;
  DECLARE TempValue   VARCHAR(5000) DEFAULT NULL;


      o.name,
  processloop:
  LOOP


      o.ods_code
      IF LENGTH(TRIM(stringValue)) = 0
      OR stringValue IS NULL THEN
        LEAVE process loop;
      END IF;
   
  SET front = SUBSTRING_INDEX(stringValue, ',', 1);
  SET frontlen = LENGTH(front);
  SET TempValue = TRIM(front);


FROM cohort_patient c
  INSERT INTO store (id, org_snomed_id)
  VALUES (cat_id, CAST(TempValue AS SIGNED));


  JOIN episode_of_care e ON e.person_id = c.group_by
  SET stringValue = INSERT(stringValue, 1, frontlen + 1, '');


  JOIN organization o ON o.id = e.organization_id
  END LOOP;


WHERE o.ods_code IN ('EMIS99','5PC64'); -- e.g. ods codes
END//
DELIMITER ;
</syntaxhighlight>


===== Calling the getAllSnomedsConceptIds procedure =====
We call another procedure getAllSnomedsConceptIdsFromSnomedCodes to loop through the temporary table. For each SNOMED code, we retrieve the corresponding SNOMED concept id.


CREATE TEMPORARY TABLE reg_sort (
We store the resultant set of concept ids into another temporary table where it will be used to query observational data.<syntaxhighlight lang="sql">
DROP TABLE IF EXISTS snomeds;


      row_id               INT,
CREATE TABLE snomeds (
 
  cat_id               INT,
      group_by             BIGINT,
  snomed_concept_id    BIGINT,
  core_concept         INT,
  snomed_code          BIGINT,
  term                 VARCHAR(255)
);


      date_registered      DATE,
ALTER TABLE snomeds ADD INDEX sno_cpt_idx(snomed_concept_id);
ALTER TABLE snomeds ADD INDEX sno_code_idx(snomed_code);


      date_regiostered_end DATE,
DROP PROCEDURE IF EXISTS getAllSnomedsConceptIds;


      ods_code             VARCHAR(50),
DELIMITER //
CREATE PROCEDURE getAllSnomedsConceptIds(p_cat_id INT)


      name                 VARCHAR(255),
BEGIN


      rnk                  INT, PRIMARY KEY(row_id)) AS
  DECLARE done           INT;
  DECLARE l_parent_id    BIGINT;
  DECLARE c_get_snomeds  CURSOR FOR SELECT org_snomed_id
                         FROM store WHERE id = p_cat_id;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;


SELECT (@row_no := @row_no + 1) AS row_id,
  SET done = 0;


      a.group_by,
       OPEN c_get_snomeds;


      a.date_registered,
       processloop:
       WHILE (done = 0) DO


      a.date_registered_end,
        FETCH c_get_snomeds INTO l_parent_id;


      a.ods_code,      
        IF done = 1 THEN
           LEAVE processloop;
        END IF;


      a.name,
        IF p_cat_id IN (1, 3) THEN


      a.rnk
          -- get parent snomed concept id based on the snomed code


FROM (SELECT q.group_by,
          INSERT INTO snomeds
          SELECT
                p_cat_id,
                cm.legacy,
                cm.core,
                c.code,
                c.name
           FROM concept_map cm JOIN concept c ON c.dbid = cm.core
           WHERE c.code = l_parent_id;


       q.date_registered,
          -- get all child snomed concept ids based on the parent snomed code


       q.date_registered_end,
           INSERT INTO snomeds
           SELECT
                 p_cat_id,
                 cptm.legacy,
                 cptm.core,
                 cpt.code,
                 cpt.name
            FROM concept_tct tc
            JOIN snomeds sn ON sn.core_concept = tc.target
            AND tc.property = '92842'
            JOIN concept_map cptm on cptm.core = tc.source
            AND tc.property = '92842'
            JOIN concept cpt on cpt.dbid = cptm.core
            WHERE sn.snomed_code = l_parent_id;


       q.name,
        ELSE


       q.ods_code,
        -- include or exclude parent snomed


       @currank := IF(@curperson = q.group_by, @currank + 1, 1) AS rnk,
            INSERT INTO snomeds
 
            SELECT
       @curperson := q.group_by AS cur_person
                  p_cat_id,
 
                  cm.legacy,
     FROM qry_reg q, (SELECT @currank := 0, @curperson := 0) r
                  cm.core,
 
                  c.code,
     ORDER BY q.group_by, q.date_registered DESC ) a,
                  c.name
             FROM concept_map cm JOIN concept c ON c.dbid = cm.core
             WHERE c.code = l_parent_id;


     (SELECT @row_no := 0) t
        END IF;


WHERE a.rnk = 1;
       END WHILE processloop;
       CLOSE c_get_snomeds;


SET @row_id = 0;
       SET done = 0;  


WHILE EXISTS (SELECT row_id from reg_sort
END //
DELIMITER ;
</syntaxhighlight>Using the concept_tct table, which is a transitive closure table, we are able to find the corresponding child concept ids based on the parent concept id.


             WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO
Match the parent against the value in the target column. This will return all the children in the source column.


           
[[File:RSD_SQL_Guide_C2_Concepts_tct.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Concepts%20tct.png|alt=|frameless]]


       UPDATE demographicsDataset d
==== Retrieving filtered Observations ====
Using the prepared observations cohort, the next step is to filter out this cohort based on the SNOMED code list and the timeframe, which have been passed in as input parameters from the populate procedure.


       JOIN reg_sort reg ON d.pseudo_id = reg.group_by
'''Fig 6.3 Shows the processing of the FilterObservations procedure'''


       SET d.RegistrationStart = reg.date_registered,
[[File:RSD_SQL_Guide_C2_Fig_6.3.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%206.3.png|alt=|frameless|596x596px]]<br />
 
===== Calling the filterObservations procedure =====
           d.RegistrationEnd = reg.date_registered_end,
By calling the filterObservations procedure, we pass in the following input parameters:
 
{| class="wikitable"
           d.PracticeODSCode = reg.ods_code,
|Parameter
|Data Type
|Description
|-
|filterType
|INT
|Timeframe of the analysis. 0 = earliest, 1  = latest, 2 = ever, 3 = pivot around cohort pivot date (6 months), 4 = All  since
|-
|toCreate
|INT
|1 is the default value - to call  createObservationsFromCohort
|-
|ignorenulls
|VARCHAR(1)
|Y or N - whether to ignore null result  value
|}
<syntaxhighlight lang="sql">
DROP PROCEDURE IF EXISTS filterObservations;


           d.PracticeODSName = reg.name
DELIMITER //


       WHERE reg.row_id > @row_id
CREATE PROCEDURE filterObservations (
     IN filterType INT,
-- 0 earliest, 1 latest, 2 ever,
-- 3 pivot around cohort pivot date (6 months),
-- 4 allSince
  IN toCreate   INT, -- 1 to createObservationsFromCohortv2
  IN ignorenulls VARCHAR(1) -- Y or N
  )


       AND reg.row_id <= @row_id + 1000;
BEGIN


     
IF (toCreate = 1) THEN
  CALL createObservationsFromCohort (filterType);
END IF;


       SET @row_id = @row_id + 1000;  
DROP TABLE IF EXISTS filteredObservations;


     
IF (filterType = 0) THEN -- earliest


END WHILE;
  IF (ignorenulls = 'Y') THEN  


In this example, we created 2 temporary tables. The first to hold the registration data. The second to select from the first the records which contain the latest registration date.
  CREATE TABLE filteredObservations AS
 
  SELECT
If the update is large in terms of the number of records to update then it might be possible to optimise the update process by using batches as demonstrated in the code example above.
          ob.id,
 
          ob.group_by,
===== Patient Observations Dataset =====
          ob.patient_id,
A typical patient observations dataset may ask for the following information:
          ob.person_id,
 
          ob.original_code,
[[File:RSD_SQL_Guide_C2_Example_Patient_Observations_dataset.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Example%20Patient%20Observations%20dataset.png|alt=|frameless|1122x1122px]]
          ob.original_term,
 
          ob.result_value,
A typical dataset can have over 100 to 200 fields. Usually, there will be several datasets to deliver. But this illustrates the kind of layout we use for constructing an observational dataset.
          ob.clinical_effective_date,
 
          ob.result_value_units,
For this example, we create the dataset to hold the observational information as follows:
          ob.age_years,
 
          ob.rnk
DROP TABLE IF EXISTS diagnoses2dataset;
  FROM (
          SELECT
            o.id,
            o.group_by,
            o.patient_id,
            o.person_id,
            o.original_code,
            o.original_term,
            o.result_value,
            o.clinical_effective_date,
            o.result_value_units,
            o.age_years,
            @currank :=
                 IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
            @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          WHERE o.result_value IS NOT NULL
          ORDER BY o.person_id,
                   o.clinical_effective_date ASC,
                   o.id ASC -- earliest
        ) ob
  WHERE ob.rnk = 1;


CREATE TABLE diagnoses2dataset (
  ELSE


  Pseudo_id                                       VARCHAR(255) NULL,
  CREATE TABLE filteredObservations AS
 
  SELECT
  Pseudo_NHSNumber                                 VARCHAR(255) NULL,
          ob.id,
 
          ob.group_by,
  AsthmaECode                                     VARCHAR(50) NULL,
          ob.patient_id,
          ob.person_id,
          ob.original_code,
          ob.original_term,
          ob.result_value,
          ob.clinical_effective_date,
          ob.result_value_units,
          ob.age_years,
          ob.rnk
  FROM (
          SELECT
            o.id,
            o.group_by,
            o.patient_id,
            o.person_id,
            o.original_code,
            o.original_term,
            o.result_value,
            o.clinical_effective_date,
            o.result_value_units,
            o.age_years,
            @currank :=
                  IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
            @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          ORDER BY o.person_id,
                   o.clinical_effective_date ASC,
                   o.id ASC -- earliest
        ) ob
  WHERE ob.rnk = 1;


  AsthmaETerm                                      VARCHAR(200) NULL,
  END IF;


  AsthmaEDate                                     VARCHAR(50) NULL,
ELSEIF (filterType = 2 OR filterType = 4) THEN -- ever or allSince


  AsthmaEmergeECode                                VARCHAR(50) NULL,
  IF (ignorenulls = 'Y') THEN


  AsthmaEmergeETerm                                VARCHAR(200) NULL,
  CREATE TABLE filteredObservations AS
  SELECT DISTINCT
          mc.group_by,
          mc.patient_id,
          mc.person_id,
          mc.original_code,
          mc.original_term,
          mc.result_value,
          mc.clinical_effective_date,
          mc.result_value_units,
          mc.age_years
  FROM observationsFromCohort mc
  WHERE mc.result_value IS NOT NULL;


  AsthmaEmergeEDate                                VARCHAR(50) NULL,
  ELSE


  AsthmaResolvedECode                              VARCHAR(50) NULL,
  CREATE TABLE filteredObservations AS
  SELECT DISTINCT
          mc.group_by,
          mc.patient_id,
          mc.person_id,
          mc.original_code,
          mc.original_term,
         mc.result_value,
          mc.clinical_effective_date,
          mc.result_value_units,
          mc.age_years
  FROM observationsFromCohort mc;


  AsthmaResolvedETerm                              VARCHAR(200) NULL,
  END IF;


  AsthmaResolvedEDate                              VARCHAR(50) NULL,


  COPDECode                                       VARCHAR(50) NULL,
ELSEIF (filterType = 1 or filterType = 3) THEN -- latest or pivot


  COPDETerm                                        VARCHAR(200) NULL,
IF (ignorenulls = 'Y') THEN


  COPDEDate                                       VARCHAR(50) NULL,
CREATE TABLE filteredObservations as
 
  SELECT
  PulmonaryFibrosisECode                           VARCHAR(50) NULL,
           ob.id,
 
           ob.group_by,
  PulmonaryFibrosisETerm                           VARCHAR(200) NULL,
           ob.patient_id,
 
           ob.person_id,
  PulmonaryFibrosisEDate                           VARCHAR(50) NULL,
           ob.original_code,
 
           ob.original_term,
  InterstitialLungDiseaseECode                     VARCHAR(50) NULL,
           ob.result_value,
 
           ob.clinical_effective_date,
  InterstitialLungDiseaseETerm                     VARCHAR(200) NULL,
           ob.result_value_units,
 
           ob.age_years,
  InterstitialLungDiseaseEDate                     VARCHAR(50) NULL,
           ob.rnk
  FROM (
          SELECT
            o.id,
            o.group_by,
            o.patient_id,
            o.person_id,
            o.original_code,
            o.original_term,
            o.result_value,
            o.clinical_effective_date,
            o.result_value_units,
            o.age_years,
            @currank :=
                IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
            @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          WHERE o.result_value IS NOT NULL
          ORDER BY o.person_id,
                   o.clinical_effective_date DESC,
                   o.id DESC -- latest
        ) ob
  WHERE ob.rnk = 1;


  AgeRelatedMuscularDegenerationECode              VARCHAR(50) NULL,
ELSE


  AgeRelatedMuscularDegenerationETerm              VARCHAR(200) NULL,
  CREATE TABLE filteredObservationsV2 as
 
  SELECT
  AgeRelatedMuscularDegenerationEDate              VARCHAR(50) NULL,
           ob.id,
           ob.group_by,
           ob.patient_id,
           ob.person_id,
           ob.original_code,
           ob.original_term,
           ob.result_value,
           ob.clinical_effective_date,
           ob.result_value_units,
           ob.age_years,
           ob.rnk
  FROM (
          SELECT
             o.id,
             o.group_by,
             o.patient_id,
             o.person_id,
             o.original_code,
             o.original_term,
             o.result_value,
             o.clinical_effective_date,
             o.result_value_units,
             o.age_years,
             @currank :=
                  IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
             @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          ORDER BY o.person_id,
                   o.clinical_effective_date DESC,
                   o.id DESC -- latest
         ) ob
  WHERE ob.rnk = 1;


  GlaucomaECode                                    VARCHAR(50) NULL,
END IF;


  GlaucomaETerm                                   VARCHAR(200) NULL,
ELSE


  GlaucomaEDate                                    VARCHAR(50) NULL,
  SIGNAL SQLSTATE '45000'


  RheumatoidArthritisECode                         VARCHAR(50) NULL,
  SET MESSAGE_TEXT = 'filterType not recognised';


  RheumatoidArthritisETerm                         VARCHAR(200) NULL,
END IF;


  RheumatoidArthritisEDate                         VARCHAR(50) NULL,
END//
DELIMITER ;
</syntaxhighlight>


  SystemicLupusECode                               VARCHAR(50) NULL,
===== Calling the createObservationsFromCohort procedure =====
From the filterObservations procedure, we call the createobservationsFromCohort procedure, passing in the filter type as the input parameter:<syntaxhighlight lang="sql">
CALL createObservationsFromCohort (filterType);
</syntaxhighlight><syntaxhighlight lang="sql">
DROP PROCEDURE IF EXISTS createObservationsFromCohort;


  SystemicLupusETerm                               VARCHAR(200) NULL,
DELIMITER //


  SystemicLupusEDate                               VARCHAR(50) NULL,
CREATE PROCEDURE createObservationsFromCohort (
  IN filterType INT
  -- 0 earliest, 1 latest, 2 ever,
  -- 3 pivot around cohort_patient pivot date (6 months)
)


  InflammatoryBowelDiseaseECode                    VARCHAR(50) NULL,
BEGIN


  InflammatoryBowelDiseaseETerm                    VARCHAR(200) NULL,
DROP TEMPORARY TABLE IF EXISTS snomeds_tmp;


  InflammatoryBowelDiseaseEDate                    VARCHAR(50) NULL,
CREATE TEMPORARY TABLE snomeds_tmp AS
 
SELECT cat_id,
  CrohnsDiseaseECode                               VARCHAR(50) NULL,
      snomed_concept_id
FROM snomeds
WHERE cat_id IN (3, 4);


  CrohnsDiseaseETerm                               VARCHAR(200) NULL,
-- delete from snomeds all snomed_concept ids to be excluded


  CrohnsDiseaseEDate                               VARCHAR(50) NULL,
DELETE t1 FROM snomeds t1 JOIN snomeds_tmp t2
ON t1.snomed_concept_id = t2.snomed_concept_id
WHERE t1.cat_id IN (1, 2);


  UlcerativeColitisCodeECode                       VARCHAR(50) NULL,
DROP TABLE IF EXISTS observationsFromCohort;


  UlcerativeColitisCodeETerm                       VARCHAR(200) NULL,
IF (filterType = 3) THEN


  UlcerativeColitisCodeEDate                       VARCHAR(50) NULL,
-- pivot over 6 months from pivot date (already set in cohort)


  AtopicDermatitisECode                            VARCHAR(50) NULL,
CREATE TABLE observationsFromCohort AS
 
      SELECT DISTINCT
  AtopicDermatitisETerm                            VARCHAR(200) NULL,
             o.id,
 
             o.patient_id,
  AtopicDermatitisEDate                            VARCHAR(50) NULL,
             o.person_id,
 
             o.group_by,
  InheritedMucociliaryClearanceECode               VARCHAR(50) NULL,
             o.clinical_effective_date,
             o.non_core_concept_id AS original_code,
             SUBSTRING(s.term, 1, 200) AS original_term,
             o.result_value,
             o.result_value_units,
             o.age_years
     FROM cohort_observations o
          JOIN snomeds s ON s.snomed_concept_id = o.non_core_concept_id
     WHERE o.clinical_effective_date IS NOT NULL
     AND o.clinical_effective_date
     BETWEEN DATE_SUB(c.pivot_date, INTERVAL 6 MONTH)
     AND DATE_SUB(c.pivot_date, INTERVAL -6 MONTH);


  InheritedMucociliaryClearanceETerm               VARCHAR(200) NULL,
ELSE


  InheritedMucociliaryClearanceEDate               VARCHAR(50) NULL,
-- latest or earliest or ever (so all observations)


  PrimaryCiliaryDyskinesiaECode                    VARCHAR(50) NULL,
CREATE TABLE observationsFromCohort AS
     SELECT DISTINCT
            o.id,
            o.patient_id,
            o.person_id,
            o.group_by,
            o.clinical_effective_date,
            o.non_core_concept_id AS original_code,
            SUBSTRING(s.term, 1, 200) AS original_term,
            o.result_value,
            o.result_value_units,
            o.age_years
     FROM cohort_observations o
          JOIN snomeds s ON s.snomed_concept_id = o.non_core_concept_id
     WHERE o.clinical_effective_date IS NOT NULL;


  PrimaryCiliaryDyskinesiaETerm                    VARCHAR(200) NULL,
END IF;


  PrimaryCiliaryDyskinesiaEDate                    VARCHAR(50) NULL,
ALTER TABLE observationsFromCohort ADD INDEX obv_pat_idx(patient_id);


  MelanomaECode                                    VARCHAR(50) NULL,
END//


  MelanomaETerm                                   VARCHAR(200) NULL,
DELIMITER ;
</syntaxhighlight>This returns a new observations cohort based on the SNOMED code list.


   MelanomaEDate                                    VARCHAR(50) NULL,
===== Filter Observations by Timeframe =====
 
Using the new observations cohort, we pass in the filter type to retrieve the observations based on the timeframe for the analysis, and using the clinical effective date to determine the time intervals. We store the result set into a temporary table. This is illustrated in the code snippet of the filterObservations procedure below: <syntaxhighlight lang="sql">
  ProstateCancerECode                              VARCHAR(50) NULL,
CREATE TABLE filteredObservationsV2 as
 
  SELECT
  ProstateCancerETerm                              VARCHAR(200) NULL,
          ob.id,
 
          ob.group_by,
  ProstateCancerEDate                              VARCHAR(50) NULL,
          ob.patient_id,
 
          ob.person_id,
  LungCancerECode                                  VARCHAR(50) NULL,
          ob.original_code,
 
          ob.original_term,
  LungCancerETerm                                  VARCHAR(200) NULL,
          ob.result_value,
 
          ob.clinical_effective_date,
  LungCancerEDate                                  VARCHAR(50) NULL,
          ob.result_value_units,
          ob.age_years,
          ob.rnk
  FROM (
          SELECT o.id,
                 o.group_by,
                 o.patient_id,
                 o.person_id,
                 o.original_code,
                 o.original_term,
                 o.result_value,
                 o.clinical_effective_date,
                 o.result_value_units,
                 o.age_years,
                 @currank :=
                    IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
                 @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          WHERE o.result_value IS NOT NULL
          ORDER BY o.person_id,
                   o.clinical_effective_date ASC,
                   o.id ASC  -- earliest
        ) ob


  SmallBowelCancerECode                            VARCHAR(50) NULL,
  WHERE ob.rnk = 1;
</syntaxhighlight>


  SmallBowelCancerETerm                            VARCHAR(200) NULL,
==== Updating Dataset Table Columns ====
Using the result set derived from the above step, we can now update the dataset table columns with the appropriate observational values. This is illustrated in the code snippet of the populateCodeDate procedure:<syntaxhighlight lang="sql">
DROP TEMPORARY TABLE IF EXISTS qry_tmp;


  SmallBowelCancerEDate                            VARCHAR(50) NULL,
  CREATE TEMPORARY TABLE qry_tmp (
 
      row_id                  INT,
  ColorectalCancerECode                            VARCHAR(50) NULL,
      group_by                VARCHAR(255),
      original_code           VARCHAR(20),
      original_term           VARCHAR(200),
      clinical_effective_date DATE, PRIMARY KEY(row_id)
  ) AS
  SELECT (@row_no := @row_no+1) AS row_id,
         f.group_by,
         f.original_code,
         f.original_term,
         f.clinical_effective_date
  FROM filteredObservationsV2 f, (SELECT @row_no := 0) t;


  ColorectalCancerETerm                            VARCHAR(200) NULL,
SET @row_id = 0;


  ColorectalCancerEDate                            VARCHAR(50) NULL,
WHILE EXISTS (SELECT row_id from qry_tmp
             WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO


  BreastCancerECode                                VARCHAR(50) NULL,
  SET @sql = CONCAT('UPDATE ', datasetTable,
              ' d JOIN qry_tmp f ON d.pseudo_id = f.group_by SET ',
              col, "Code = f.original_code, ",
              col, "Term = f.original_term, ",
              col, "Date = date_format(f.clinical_effective_date, '%d/%m/%Y')
              WHERE f.row_id > @row_id AND f.row_id <= @row_id + 1000");
 
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DEALLOCATE PREPARE stmt;


  BreastCancerETerm                                VARCHAR(200) NULL,
  SET @row_id = @row_id + 1000;


  BreastCancerEDate                                VARCHAR(50) NULL,
END WHILE;
</syntaxhighlight>To populate medication datasets, we use a similar approach as above.


  MiscarriageECode                                 VARCHAR(50) NULL,
= APPENDIX =
Some supplementary material that maybe helpful when developing SQL query, in particular, working with concept ids.


  MiscarriageETerm                                 VARCHAR(200) NULL,
=== Helper Functions ===
         Function to retrieve the description field on the concept table based on the concept id:<syntaxhighlight lang="sql">
DELIMITER //


  MiscarriageEDate                                 VARCHAR(50) NULL
CREATE FUNCTION get_concept_desc(p_concept_id int)
RETURNS VARCHAR(400)
NOT DETERMINISTIC READS SQL DATA
BEGIN


);
DECLARE l_description VARCHAR(400);


ALTER TABLE diagnoses2dataset ADD INDEX pseudoid_idx (pseudo_id);
SELECT description INTO l_description
FROM concept cpt
WHERE cpt.dbid = p_concept_id;


INSERT INTO diagnoses2dataset (pseudo_id)
RETURN l_description;


SELECT DISTINCT group_by FROM cohort_patient;
END//
DELIMITER ;


Each column of the dataset is named by joining pieces of identifiable information together by combining the observational type, the timeframe for the analysis and the requested field type.
Usage: SELECT get_concept_desc(pat.gender_concept_id) FROM patient pat;
</syntaxhighlight>•          Function to retrieve the code field on the concept table based on the concept id:<syntaxhighlight lang="sql">
DROP FUNCTION IF EXISTS get_concept_code;


'''Fig 5.1 Shows the naming convention of a dataset column'''
DELIMITER //


[[File:RSD_SQL_Guide_C2_Fig_5.1.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%205.1.png|alt=|frameless|744x744px]]
CREATE FUNCTION get_concept_code(p_concept_id int)
RETURNS VARCHAR(20)
NOT DETERMINISTIC READS SQL DATA
BEGIN


Again, we pre-populate the dataset with the unique identifiers from the patient cohort.
DECLARE l_code VARCHAR(20);


Then we run separate update SQL queries to populate the dataset table by calling multiple stored procedures.
SELECT code INTO l_code
FROM concept cpt
WHERE cpt.dbid = p_concept_id;


===== Patient Medications Dataset =====
RETURN l_code;
A typical medication request would look like this:


[[File:RSD_SQL_Guide_C2_Medication_Request.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Medication%20Request.png|alt=|frameless|1235x1235px]]
END//
DELIMITER ;


For this example, we create the dataset to hold the medication information as follows:
Usage: SELECT get_concept_code(epoc.registration_type_concept_id)
       FROM episode_of_care epoc;
</syntaxhighlight>•          Function to retrieve the scheme field on the concept table based on the concept id:<syntaxhighlight lang="sql">
DROP FUNCTION IF EXISTS get_concept_scheme;


CREATE TABLE medications (
DELIMITER //


  pseudo_id            VARCHAR(255) DEFAULT NULL,
CREATE FUNCTION get_concept_scheme(p_concept_id int)
RETURNS VARCHAR(50)
NOT DETERMINISTIC READS SQL DATA
BEGIN


  nhsnumber            VARCHAR(10)  DEFAULT NULL,
DECLARE l_scheme VARCHAR(50);


  codedate             VARCHAR(20)  DEFAULT NULL,
SELECT scheme INTO l_scheme
FROM concept cpt
WHERE cpt.dbid = p_concept_id;


  codeterm             VARCHAR(255) DEFAULT NULL,
RETURN l_scheme;


  code                 VARCHAR(100) DEFAULT NULL,
END//
DELIMITER ;


codevalue            VARCHAR(100) DEFAULT NULL,
Usage: SELECT get_concept_scheme(ob.non_core_concept_id) FROM observation ob;
</syntaxhighlight>•          Function to retrieve the ods code based on the organization id:<syntaxhighlight lang="sql">
DROP FUNCTION IF EXISTS get_ods;


  codeunit             VARCHAR(100) DEFAULT NULL
DELIMITER //


);
CREATE FUNCTION get_ods(p_org_id bigint)
RETURNS VARCHAR(50)
NOT DETERMINISTIC READS SQL DATA
BEGIN


Again, we pre-populate the dataset with the unique identifiers from the patient cohort.
DECLARE l_ods_code VARCHAR(50);


Then we run separate update SQL queries to populate the dataset table by calling multiple stored procedures.
SELECT ods_code INTO l_ods_code
FROM organization org
WHERE org.id = p_org_id;


= 6. Calling Stored Procedures =
RETURN l_ods_code;
Once we have generated the observation and medication datasets and cohorts, the next step is to populate the datasets using stored procedures.


We construct a series of populate stored procedures which in turn call on other stored procedures to complete the update process.
END//
DELIMITER ;


'''Fig 6.1 Shows the datasets update process using stored procedures'''
Usage: SELECT get_ods(pat.registered_practice_organization_id) FROM patient pat;
</syntaxhighlight>•          Function to retrieve the organization name based on the organization id:<syntaxhighlight lang="sql">
DROP FUNCTION IF EXISTS get_org_name;


[[File:RSD_SQL_Guide_C2_Fig_6.1.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%206.1.png|alt=|frameless|653x653px]]<br />
DELIMITER //
=== Calling the Populate Stored Procedures ===
Create a SQL script to call the stored procedure as follows:


CALL populateCodeDate
CREATE FUNCTION get_org_name(p_org_id bigint)
RETURNS VARCHAR(255)
NOT DETERMINISTIC READS SQL DATA


(0,'StillbirthE','diagnoses2dataset',0,'237364002,161743003',null,null,null,'N');
BEGIN


CALL populateCodeDate
DECLARE l_org_name VARCHAR(255);


(0,'PregnancyInducedHypertensionE','diagnoses2dataset',0,null,'48194001',null,null,'N');
SELECT name INTO l_org_name
FROM organization org
WHERE org.id = p_org_id;


CALL populateCodeDate
RETURN l_org_name;


(0,'PreEclampsiaE','diagnoses2dataset',0,'398254007',null,null,null,'N');
END//
DELIMITER ;


CALL populateCodeDate
Usage: SELECT get_ods(pat.registered_practice_organization_id) FROM patient pat;
</syntaxhighlight>•          Function to retrieve the event type from the description of the concept table:<syntaxhighlight lang="sql">
DROP FUNCTION IF EXISTS get_event_type;


(0,'CholestasisE','diagnoses2dataset',0,'235888006',null,null,null,'N');
DELIMITER //


CALL populateCodeDate
CREATE FUNCTION get_event_type(p_description VARCHAR(400))
RETURNS VARCHAR(50)
NOT DETERMINISTIC NO SQL
BEGIN


(0,'GallstonesE','diagnoses2dataset',0,'266474003,407637009',null,null,null,'N');
DECLARE l_event VARCHAR(50);


CALL populateCodeDate
SELECT substr(p_description,instr(p_description,"(") + 1,
       instr(p_description,")") - instr(p_description,"(") - 1)
       INTO l_event;


(0,'GoutE','diagnoses2dataset',0,'90560007,161451004',null,null,null,'N');
RETURN l_event;


CALL populateCodeDate
END//
DELIMITER ;


(0,'AnkylosingSpondylitisE','diagnoses2dataset',0,'9631008',null,null,null,'N');
Usage: SELECT get_event_type(get_concept_desc(ob.core_concept_id)) event_type
       FROM observation ob;
</syntaxhighlight>•          Function to retrieve the latest medication order issue date based on the medication statement id:<syntaxhighlight lang="sql">
DROP FUNCTION IF EXISTS get_lastest_med_issue_date;


CALL populateCodeDate
DELIMITER //


(0,'JaundiceE','diagnoses2dataset',0,'18165001,161536006',null,null,null,'N');
CREATE FUNCTION get_lastest_med_issue_date (p_medication_stmt_id bigint)
RETURNS DATE
NOT DETERMINISTIC READS SQL DATA
BEGIN


CALL populateCodeDate
DECLARE l_date DATE;
DECLARE l_rnk  INT;


(0,'PsoriasisE','diagnoses2dataset',0,'9014002',null,null,null,'N');
-- get latest issue date


CALL populateCodeDate
  SELECT
          med.clinical_effective_date, med.rnk INTO l_date, l_rnk
  FROM (
          SELECT
            mo.id,
            mo.medication_statement_id,
            mo.clinical_effective_date,
            @currank :=
            IF(@curmedstmt = mo.medication_statement_id,@currank + 1,1) AS rnk,
            @curmedstmt := mo.medication_statement_id AS cur_med_stmt
          FROM medication_order mo,(SELECT @currank := 0, @curmedstmt := 0) r
          WHERE mo.medication_statement_id = p_medication_stmt_id
          ORDER BY mo.medication_statement_id DESC,
                   mo.clinical_effective_date DESC,
                   mo.id DESC -- latest
        ) med
  WHERE med.rnk = 1;


(0,'DeafnessE','diagnoses2dataset',0,'15188001',null,null,null,'N');
RETURN l_date;
 
END//
DELIMITER ;
 
Usage: SELECT DATE_FORMAT(get_lastest_med_issue_date(med.id),"%Y%m%d") date_of_last_issue
       FROM medication_statement med;
</syntaxhighlight>•          Function to retrieve the earliest medication order issue date based on the medication statement id:<syntaxhighlight lang="sql">
DROP FUNCTION IF EXISTS get_earliest_med_issue_date;
 
DELIMITER //
 
CREATE FUNCTION get_earliest_med_issue_date (p_medication_stmt_id bigint)
RETURNS DATE
NOT DETERMINISTIC READS SQL DATA


CALL populateCodeDate
BEGIN


(0,'HearingAidE','diagnoses2dataset',0,'365240006',null,null,null,'N');
DECLARE l_date DATE;
DECLARE l_rnk  INT;


The procedure accepts a list of input parameters (this may vary depending on requirements):
-- get earliest issue date
{| class="wikitable"
 
|Parameter
  SELECT
|Data Type
          med.clinical_effective_date, med.rnk INTO l_date, l_rnk
|Description
  FROM (
|-
          SELECT
|filterType
            mo.id,
|INT
            mo.medication_statement_id,
|1 latest, 0 earliest, 2 ever, 3 pivot
            mo.clinical_effective_date,
|-
            @currank :=
|col
            IF(@curmedstmt = mo.medication_statement_id,@currank + 1,1) AS rnk,
|VARCHAR(100)
            @curmedstmt := mo.medication_statement_id AS cur_med_stmt
|the root of the column name
          FROM medication_order mo,(SELECT @currank := 0, @curmedstmt := 0) r
|-
          WHERE mo.medication_statement_id = p_medication_stmt_id
|datasetTable
          ORDER BY mo.medication_statement_id ASC,
|VARCHAR(100)
                   mo.clinical_effective_date ASC,
|table name of dataset
                   mo.id ASC -- earliest
|-
        ) med
|reset
  WHERE med.rnk = 1;
|BIT
|1 reset, 0 no reset of the dataset table
|-
|codesToAdd1
|VARCHAR(5000)
|include all snomed parents and their  children
|-
|codesToAdd2
|VARCHAR(5000)
|include snomed parents only
|-
|codesToRemove3
|VARCHAR(5000)
|exclude snomed parents and their children
|-
|codesToRemove4
|VARCHAR(5000)
|exclude only snomed parents
|-
|ignorenulls
|VARCHAR(1)
|Y or N - whether to ignore null result  value
|}
A typical populate procedure is outlined below. (Please note this is only an example and may need modifications to it depending on your requirements).


DROP PROCEDURE IF EXISTS populateCodeDate;
RETURN l_date;


DELIMITER //
END//
DELIMITER ;


CREATE PROCEDURE populateCodeDate (
Usage: SELECT DATE_FORMAT(get_earliest_med_issue_date(med.id),"%Y%m%d") issue_date
       FROM medication_statement med;


IN filterType     INT, -- 1 latest, 0 earliest, 2 ever, 3 pivot
</syntaxhighlight>•          Function to retrieve the event log last changed date based on the record id:<syntaxhighlight lang="sql">
DROP FUNCTION IF EXISTS get_event_maxdate;


IN col            VARCHAR(100), -- the root of the column name
DELIMITER //


IN datasetTable   VARCHAR(100), -- table name of dataset
CREATE FUNCTION get_event_maxdate (p_table_id tinyint, p_record_id bigint)
RETURNS DATETIME
READS SQL DATA
BEGIN


IN reset          BIT, -- 1 reset, 0 no reset
DECLARE l_max_date DATETIME;


IN codesToAdd1    VARCHAR(5000), -- all parents and their children
SELECT  MAX(etlg.dt_change) modified_date INTO l_max_date
FROM event_log etlg
WHERE etlg.table_id = p_table_id AND record_id = p_record_id;


IN codesToAdd2    VARCHAR(5000), -- parents only
RETURN l_max_date;


IN codesToRemove3 VARCHAR(5000), -- parents and their children to be excluded
END//
DELIMITER ;


IN codesToRemove4 VARCHAR(5000), -- just parents to be excluded
Usage: SELECT get_event_maxdate(2,pat.id) FROM patient pat;
      Where 2 is the table id for the patient table.
</syntaxhighlight>•          Function to retrieve the event log last changed date for medication order based on the medication statement id:<syntaxhighlight lang="sql">
DROP FUNCTION IF EXISTS get_event_maxdate_medo;


IN ignorenulls    VARCHAR(1)
DELIMITER //
 
)


CREATE FUNCTION get_event_maxdate_medo (p_table_id tinyint, p_record_id bigint)
RETURNS datetime
READS SQL DATA
BEGIN
BEGIN


-- reset snomeds table
DECLARE l_max_date datetime;


DELETE FROM snomeds WHERE cat_id IN (1, 2, 3, 4);
SELECT MAX(etlg.dt_change) INTO l_max_date
FROM event_log etlg JOIN medication_order mo
ON mo.medication_statement_id = p_record_id
AND mo.id = etlg.record_id
WHERE etlg.table_id = p_table_id;


-- reset store table
RETURN l_max_date;


DELETE FROM store WHERE id IN (1, 2, 3, 4);
END//
DELIMITER ;


-- get snomed concept ids
Usage: SELECT get_event_maxdate_medo (9,med.id) FROM medication_statement med  
 
</syntaxhighlight>
  IF codesToAdd1 IS NOT NULL THEN
 
      CALL storeSnomedString(codesToAdd1, 1);
 
      CALL getAllSnomedsConceptIds (1);
 
  END IF;
 
  IF codesToAdd2 IS NOT NULL THEN
 
      CALL storeSnomedString (codesToAdd2, 2);
 
      CALL getAllSnomedsConceptIds (2);
 
  END IF;
 
  IF codesToRemove3 IS NOT NULL THEN
 
      CALL storeSnomedString (codesToRemove3, 3);
 
      CALL getAllSnomedsConceptIds (3);
 
  END IF;
 
  IF codesToRemove4 IS NOT NULL THEN
 
      CALL storeSnomedString (codesToRemove4, 4);
 
      CALL getAllSnomedsConceptIds (4);
 
  END IF;
 
 
 
-- filter observation based on the snomed codes
 
CALL filterObservations(filterType, 1, ignorenulls);
 
-- reset columns
 
IF (reset = 1) THEN
 
  SET @reset_sql = CONCAT('UPDATE ', datasetTable, ' SET ',
 
  col, "Code = null, ",
 
  col, "Term = null, ",
 
  col, "Date = null");
 
 
  PREPARE resetStmt FROM @reset_sql;
 
  EXECUTE resetStmt;
 
  DEALLOCATE PREPARE resetStmt;
 
 
END IF;
 
  DROP TEMPORARY TABLE IF EXISTS qry_tmp;
 
 
  CREATE TEMPORARY TABLE qry_tmp (
 
      row_id                  INT,
 
      group_by                VARCHAR(255),
 
      original_code           VARCHAR(20),
 
      original_term           VARCHAR(200),
 
      clinical_effective_date DATE, PRIMARY KEY(row_id)
 
  ) AS
 
  SELECT (@row_no := @row_no+1) AS row_id,
 
         f.group_by,
 
         f.original_code,
 
         f.original_term,
 
         f.clinical_effective_date
 
  FROM filteredObservations f, (SELECT @row_no := 0) t;
 
 
SET @row_id = 0;
 
WHILE EXISTS (SELECT row_id from qry_tmp
 
             WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO
 
  SET @sql = CONCAT('UPDATE ', datasetTable,
 
             ' d JOIN qry_tmp f ON d.pseudo_id = f.group_by SET ',
 
             col, "Code = f.original_code, ",
 
             col, "Term = f.original_term, ",
 
             col, "Date = date_format(f.clinical_effective_date, '%d/%m/%Y')
 
             WHERE f.row_id > @row_id AND f.row_id <= @row_id + 1000");
 
                   
 
  PREPARE stmt FROM @sql;
 
  EXECUTE stmt;
 
  DEALLOCATE PREPARE stmt;
 
 
  SET @row_id = @row_id + 1000;
 
 
END WHILE;
 
END//
 
DELIMITER ;
 
==== '''Retrieving the SNOMED Concept Ids''' ====
The procedure accepts a list of SNOMED codes as input parameter.
 
'''Fig 6.2 Shows how the SNOMED codes are being retrieved using stored procedures'''
 
[[File:RSD_SQL_Guide_C2_Fig_6.2.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%206.2.png|alt=|frameless|871x871px]]<br />
===== Calling the storeSnomedString procedure =====
We call the storeSnomedString procedure to separate the list of SNOMED codes into their individual values and store them into a temporary table:
 
DROP TABLE IF EXISTS store;
 
CREATE TABLE store (
 
  id           INT,
 
  org_snomed_id BIGINT
 
);
 
ALTER TABLE store ADD INDEX store_idx (org_snomed_id);
 
DROP PROCEDURE IF EXISTS storeSnomedString;
 
DELIMITER //
 
CREATE PROCEDURE storeSnomedString (
 
  IN stringValue VARCHAR(5000),
 
  IN cat_id      INT
 
)
 
BEGIN
 
      DECLARE front       VARCHAR(5000) DEFAULT NULL;
 
      DECLARE frontlen    INT           DEFAULT NULL;
 
      DECLARE TempValue   VARCHAR(5000) DEFAULT NULL;
 
     
 
  processloop:
 
  LOOP
 
      IF LENGTH(TRIM(stringValue)) = 0
 
      OR stringValue IS NULL THEN
 
        LEAVE process loop;
 
      END IF;
 
     
 
  SET front = SUBSTRING_INDEX(stringValue, ',', 1);
 
  SET frontlen = LENGTH(front);
 
  SET TempValue = TRIM(front);
 
 
 
  INSERT INTO store (id, org_snomed_id)
 
  VALUES (cat_id, CAST(TempValue AS SIGNED));
 
  SET stringValue = INSERT(stringValue, 1, frontlen + 1, <nowiki>''</nowiki>);
 
  END LOOP;
 
 
 
END//
 
DELIMITER ;
 
===== Calling the getAllSnomedsConceptIds procedure =====
We call another procedure getAllSnomedsConceptIdsFromSnomedCodes to loop through the temporary table. For each SNOMED code, we retrieve the corresponding SNOMED concept id.
 
We store the resultant set of concept ids into another temporary table where it will be used to query observational data.
 
DROP TABLE IF EXISTS snomeds;
 
CREATE TABLE snomeds (
 
  cat_id               INT,
 
  snomed_concept_id    BIGINT,
 
  core_concept         INT,
 
  snomed_code          BIGINT,
 
  term                 VARCHAR(255)
 
);
 
ALTER TABLE snomeds ADD INDEX sno_cpt_idx(snomed_concept_id);
 
ALTER TABLE snomeds ADD INDEX sno_code_idx(snomed_code);
 
DROP PROCEDURE IF EXISTS getAllSnomedsConceptIds;
 
DELIMITER //
 
CREATE PROCEDURE getAllSnomedsConceptIds(p_cat_id INT)
 
BEGIN
 
  DECLARE done           INT;
 
  DECLARE l_parent_id    BIGINT;
 
  DECLARE c_get_snomeds  CURSOR FOR SELECT org_snomed_id
 
                        FROM store WHERE id = p_cat_id;
 
 
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
 
 
  SET done = 0;
 
 
       OPEN c_get_snomeds;
 
       processloop:
 
       WHILE (done = 0) DO
 
     
 
        FETCH c_get_snomeds INTO l_parent_id;
 
        IF done = 1 THEN
 
           LEAVE processloop;
 
        END IF;
 
           
 
        IF p_cat_id IN (1, 3) THEN
 
           
 
    -- get parent snomed concept id based on the snomed code
 
          INSERT INTO snomeds
 
          SELECT
 
                p_cat_id,
 
                cm.legacy,
 
                cm.core,
 
                c.code,
 
                c.name
 
           FROM concept_map cm JOIN concept c ON c.dbid = cm.core
 
           WHERE c.code = l_parent_id;
 
    -- get all child snomed concept ids based on the parent snomed code
 
           INSERT INTO snomeds
 
           SELECT
 
                 p_cat_id,
 
                 cptm.legacy,
 
                 cptm.core,
 
                 cpt.code,
 
                 cpt.name
 
            FROM concept_tct tc
 
            JOIN snomeds sn ON sn.core_concept = tc.target
 
            AND tc.property = '92842'
 
            JOIN concept_map cptm on cptm.core = tc.source
 
            AND tc.property = '92842'
 
            JOIN concept cpt on cpt.dbid = cptm.core
 
               WHERE sn.snomed_code = l_parent_id;
 
        ELSE
 
               
 
        -- include or exclude parent snomed
 
            INSERT INTO snomeds
 
            SELECT
 
                  p_cat_id,
 
                  cm.legacy,
 
                  cm.core,
 
                  c.code,
 
                  c.name
 
             FROM concept_map cm JOIN concept c ON c.dbid = cm.core
 
             WHERE c.code = l_parent_id;
 
             
 
        END IF;
 
       END WHILE processloop;
 
       CLOSE c_get_snomeds;
 
       SET done = 0;  
 
END //
 
DELIMITER ;
 
Using the concept_tct table, which is a transitive closure table, we are able to find the corresponding child concept ids based on the parent concept id.
 
Match the parent against the value in the target column. This will return all the children in the source column.
 
[[File:RSD_SQL_Guide_C2_Concepts_tct.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Concepts%20tct.png|alt=|frameless]]
 
==== Retrieving filtered Observations ====
Using the prepared observations cohort, the next step is to filter out this cohort based on the SNOMED code list and the timeframe, which have been passed in as input parameters from the populate procedure.
 
'''Fig 6.3 Shows the processing of the FilterObservations procedure'''
 
[[File:RSD_SQL_Guide_C2_Fig_6.3.png|link=https://wiki.discoverydataservice.org/File:RSD%20SQL%20Guide%20C2%20Fig%206.3.png|alt=|frameless|729x729px]]<br />
===== Calling the filterObservations procedure =====
By calling the filterObservations procedure, we pass in the following input parameters:
{| class="wikitable"
|Parameter
|Data Type
|Description
|-
|filterType
|INT
|Timeframe of the analysis. 0 = earliest, 1  = latest, 2 = ever, 3 = pivot around cohort pivot date (6 months), 4 = All  since
|-
|toCreate
|INT
|1 is the default value - to call  createObservationsFromCohort
|-
|ignorenulls
|VARCHAR(1)
|Y or N - whether to ignore null result  value
|}
DROP PROCEDURE IF EXISTS filterObservations;
 
DELIMITER //
 
CREATE PROCEDURE filterObservations (
 
     IN filterType INT,
 
-- 0 earliest, 1 latest, 2 ever,
 
-- 3 pivot around cohort pivot date (6 months),
 
-- 4 allSince
 
  IN toCreate   INT, -- 1 to createObservationsFromCohortv2
 
  IN ignorenulls VARCHAR(1) -- Y or N
 
  )
 
BEGIN
 
IF (toCreate = 1) THEN
 
  CALL createObservationsFromCohort (filterType);
 
END IF;
 
DROP TABLE IF EXISTS filteredObservations;
 
IF (filterType = 0) THEN -- earliest
 
  IF (ignorenulls = 'Y') THEN  
 
  CREATE TABLE filteredObservations AS
 
  SELECT
 
          ob.id,
 
          ob.group_by,
 
          ob.patient_id,
 
          ob.person_id,
 
          ob.original_code,
 
          ob.original_term,
 
          ob.result_value,
 
          ob.clinical_effective_date,
 
          ob.result_value_units,
 
          ob.age_years,
 
          ob.rnk
 
  FROM (
 
          SELECT
 
            o.id,
 
            o.group_by,
 
            o.patient_id,
 
            o.person_id,
 
            o.original_code,
 
            o.original_term,
 
           o.result_value,
 
            o.clinical_effective_date,
 
            o.result_value_units,
 
            o.age_years,
 
            @currank :=
 
                 IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
 
            @curperson := o.person_id AS cur_person
 
          FROM observationsFromCohort o,
 
          (SELECT @currank := 0, @curperson := 0) r
 
          WHERE o.result_value IS NOT NULL
 
          ORDER BY o.person_id,
 
                   o.clinical_effective_date ASC,
 
                   o.id ASC -- earliest
 
        ) ob
 
  WHERE ob.rnk = 1;
 
 
 
  ELSE
 
 
  CREATE TABLE filteredObservations AS
 
  SELECT
 
          ob.id,
 
          ob.group_by,
 
          ob.patient_id,
 
          ob.person_id,
 
          ob.original_code,
 
          ob.original_term,
 
          ob.result_value,
 
          ob.clinical_effective_date,
 
          ob.result_value_units,
 
          ob.age_years,
 
          ob.rnk
 
  FROM (
 
          SELECT
 
            o.id,
 
            o.group_by,
 
            o.patient_id,
 
            o.person_id,
 
            o.original_code,
 
            o.original_term,
 
            o.result_value,
 
            o.clinical_effective_date,
 
            o.result_value_units,
 
            o.age_years,
 
            @currank :=
 
                  IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
 
            @curperson := o.person_id AS cur_person
 
          FROM observationsFromCohort o,
 
          (SELECT @currank := 0, @curperson := 0) r
 
          ORDER BY o.person_id,
 
                   o.clinical_effective_date ASC,
 
                   o.id ASC -- earliest
 
        ) ob
 
  WHERE ob.rnk = 1;
 
 
 
  END IF;
 
 
ELSEIF (filterType = 2 OR filterType = 4) THEN -- ever or allSince
 
 
  IF (ignorenulls = 'Y') THEN
 
 
  CREATE TABLE filteredObservations AS
 
  SELECT DISTINCT
 
          mc.group_by,
 
          mc.patient_id,
 
          mc.person_id,
 
          mc.original_code,
 
          mc.original_term,
 
          mc.result_value,
 
          mc.clinical_effective_date,
 
          mc.result_value_units,
 
          mc.age_years
 
  FROM observationsFromCohort mc
 
  WHERE mc.result_value IS NOT NULL;
 
 
  ELSE
 
 
  CREATE TABLE filteredObservations AS
 
  SELECT DISTINCT
 
          mc.group_by,
 
          mc.patient_id,
 
          mc.person_id,
 
          mc.original_code,
 
          mc.original_term,
 
          mc.result_value,
 
          mc.clinical_effective_date,
 
          mc.result_value_units,
 
          mc.age_years
 
  FROM observationsFromCohort mc;
 
 
 
  END IF;
 
 
ELSEIF (filterType = 1 or filterType = 3) THEN -- latest or pivot
 
IF (ignorenulls = 'Y') THEN
 
CREATE TABLE filteredObservations as
 
  SELECT
 
           ob.id,
 
           ob.group_by,
 
           ob.patient_id,
 
           ob.person_id,
 
           ob.original_code,
 
           ob.original_term,
 
           ob.result_value,
 
           ob.clinical_effective_date,
 
           ob.result_value_units,
 
           ob.age_years,
 
           ob.rnk
 
  FROM (
 
          SELECT
 
            o.id,
 
            o.group_by,
 
            o.patient_id,
 
            o.person_id,
 
            o.original_code,
 
            o.original_term,
 
            o.result_value,
 
            o.clinical_effective_date,
 
            o.result_value_units,
 
            o.age_years,
 
            @currank :=
 
                IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
 
            @curperson := o.person_id AS cur_person
 
          FROM observationsFromCohort o,
 
          (SELECT @currank := 0, @curperson := 0) r
 
          WHERE o.result_value IS NOT NULL
 
          ORDER BY o.person_id,
 
                   o.clinical_effective_date DESC,
 
                   o.id DESC -- latest
 
        ) ob
 
  WHERE ob.rnk = 1;
 
 
 
ELSE
 
  CREATE TABLE filteredObservationsV2 as
 
  SELECT
 
           ob.id,
 
           ob.group_by,
 
           ob.patient_id,
 
           ob.person_id,
 
           ob.original_code,
 
           ob.original_term,
 
           ob.result_value,
 
           ob.clinical_effective_date,
 
           ob.result_value_units,
 
           ob.age_years,
 
           ob.rnk
 
  FROM (
 
          SELECT
 
             o.id,
 
             o.group_by,
 
             o.patient_id,
 
             o.person_id,
 
             o.original_code,
 
             o.original_term,
 
             o.result_value,
 
             o.clinical_effective_date,
 
             o.result_value_units,
 
             o.age_years,
 
             @currank :=
 
                  IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
 
             @curperson := o.person_id AS cur_person
 
          FROM observationsFromCohort o,
 
          (SELECT @currank := 0, @curperson := 0) r
 
          ORDER BY o.person_id,
 
                   o.clinical_effective_date DESC,
 
                   o.id DESC -- latest
 
         ) ob
 
  WHERE ob.rnk = 1;
 
 
 
END IF;
 
ELSE
 
  SIGNAL SQLSTATE '45000'
 
  SET MESSAGE_TEXT = 'filterType not recognised';
 
END IF;
 
END//
 
DELIMITER ;
 
===== Calling the createObservationsFromCohort procedure =====
From the filterObservations procedure, we call the createobservationsFromCohort procedure, passing in the filter type as the input parameter:
 
CALL createObservationsFromCohort (filterType);
 
DROP PROCEDURE IF EXISTS createObservationsFromCohort;
 
DELIMITER //
 
CREATE PROCEDURE createObservationsFromCohort (
 
  IN filterType INT
 
  -- 0 earliest, 1 latest, 2 ever,
 
  -- 3 pivot around cohort_patient pivot date (6 months)
 
)
 
BEGIN
 
DROP TEMPORARY TABLE IF EXISTS snomeds_tmp;
 
CREATE TEMPORARY TABLE snomeds_tmp AS
 
SELECT cat_id,
 
      snomed_concept_id
 
FROM snomeds
 
WHERE cat_id IN (3, 4);
 
-- delete from snomeds all snomed_concept ids to be excluded
 
DELETE t1 FROM snomeds t1 JOIN snomeds_tmp t2
 
ON t1.snomed_concept_id = t2.snomed_concept_id
 
WHERE t1.cat_id IN (1, 2);
 
DROP TABLE IF EXISTS observationsFromCohort;
 
IF (filterType = 3) THEN
 
-- pivot over 6 months from pivot date (already set in cohort)
 
CREATE TABLE observationsFromCohort AS
 
      SELECT DISTINCT
 
             o.id,
 
             o.patient_id,
 
             o.person_id,
 
             o.group_by,
 
             o.clinical_effective_date,
 
             o.non_core_concept_id AS original_code,
 
             SUBSTRING(s.term, 1, 200) AS original_term,
 
             o.result_value,
 
             o.result_value_units,
 
             o.age_years
 
     FROM cohort_observations o
 
          JOIN snomeds s ON s.snomed_concept_id = o.non_core_concept_id
 
     WHERE o.clinical_effective_date IS NOT NULL
 
     AND o.clinical_effective_date
 
     BETWEEN DATE_SUB(c.pivot_date, INTERVAL 6 MONTH)
 
     AND DATE_SUB(c.pivot_date, INTERVAL -6 MONTH);
 
ELSE
 
-- latest or earliest or ever (so all observations)
 
CREATE TABLE observationsFromCohort AS
 
     SELECT DISTINCT
 
            o.id,
 
            o.patient_id,
 
            o.person_id,
 
            o.group_by,
 
            o.clinical_effective_date,
 
            o.non_core_concept_id AS original_code,
 
            SUBSTRING(s.term, 1, 200) AS original_term,
 
            o.result_value,
 
            o.result_value_units,
 
            o.age_years
 
     FROM cohort_observations o
 
          JOIN snomeds s ON s.snomed_concept_id = o.non_core_concept_id
 
     WHERE o.clinical_effective_date IS NOT NULL;
 
END IF;
 
ALTER TABLE observationsFromCohort ADD INDEX obv_pat_idx(patient_id);
 
END//
 
DELIMITER ;
 
This returns a new observations cohort based on the SNOMED code list.
 
===== Filter Observations by Timeframe =====
Using the new observations cohort, we pass in the filter type to retrieve the observations based on the timeframe for the analysis, and using the clinical effective date to determine the time intervals. We store the result set into a temporary table. This is illustrated in the code snippet of the filterObservations procedure below:
 
  CREATE TABLE filteredObservationsV2 as
 
  SELECT
 
          ob.id,
 
          ob.group_by,
 
          ob.patient_id,
 
          ob.person_id,
 
          ob.original_code,
 
          ob.original_term,
 
          ob.result_value,
 
          ob.clinical_effective_date,
 
          ob.result_value_units,
 
          ob.age_years,
 
          ob.rnk
 
  FROM (
 
          SELECT o.id,
 
                 o.group_by,
 
                 o.patient_id,
 
                 o.person_id,
 
                 o.original_code,
 
                 o.original_term,
 
                 o.result_value,
 
                 o.clinical_effective_date,
 
                 o.result_value_units,
 
                 o.age_years,
 
                 @currank :=
 
                    IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
 
                 @curperson := o.person_id AS cur_person
 
          FROM observationsFromCohort o,
 
          (SELECT @currank := 0, @curperson := 0) r
 
          WHERE o.result_value IS NOT NULL
 
          ORDER BY o.person_id,
 
                   o.clinical_effective_date ASC,
 
                   o.id ASC  -- earliest
 
        ) ob
 
  WHERE ob.rnk = 1;
 
==== Updating Dataset Table Columns ====
Using the result set derived from the above step, we can now update the dataset table columns with the appropriate observational values. This is illustrated in the code snippet of the populateCodeDate procedure:
 
  DROP TEMPORARY TABLE IF EXISTS qry_tmp;
 
 
  CREATE TEMPORARY TABLE qry_tmp (
 
      row_id                  INT,
 
      group_by                VARCHAR(255),
 
      original_code           VARCHAR(20),
 
      original_term           VARCHAR(200),
 
      clinical_effective_date DATE, PRIMARY KEY(row_id)
 
  ) AS
 
  SELECT (@row_no := @row_no+1) AS row_id,
 
         f.group_by,
 
         f.original_code,
 
         f.original_term,
 
         f.clinical_effective_date
 
  FROM filteredObservationsV2 f, (SELECT @row_no := 0) t;
 
 
SET @row_id = 0;
 
WHILE EXISTS (SELECT row_id from qry_tmp
 
             WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO
 
           
 
  SET @sql = CONCAT('UPDATE ', datasetTable,
 
              ' d JOIN qry_tmp f ON d.pseudo_id = f.group_by SET ',
 
              col, "Code = f.original_code, ",
 
              col, "Term = f.original_term, ",
 
              col, "Date = date_format(f.clinical_effective_date, '%d/%m/%Y')
 
              WHERE f.row_id > @row_id AND f.row_id <= @row_id + 1000");
 
             
 
  PREPARE stmt FROM @sql;
 
  EXECUTE stmt;
 
  DEALLOCATE PREPARE stmt;
 
  SET @row_id = @row_id + 1000;
 
 
END WHILE;
 
To populate medication datasets, we use a similar approach as above.
 
= APPENDIX =
Some supplementary material that maybe helpful when developing SQL query, in particular, working with concept ids.
 
=== Helper Functions ===
•          Function to retrieve the description field on the concept table based on the concept id:
 
DELIMITER //
 
CREATE FUNCTION get_concept_desc(p_concept_id int)
 
RETURNS VARCHAR(400)
 
NOT DETERMINISTIC READS SQL DATA
 
BEGIN
 
DECLARE l_description VARCHAR(400);
 
SELECT description INTO l_description
 
FROM concept cpt
 
WHERE cpt.dbid = p_concept_id;
 
RETURN l_description;
 
END//
 
DELIMITER ;
 
Usage: SELECT get_concept_desc(pat.gender_concept_id) FROM patient pat;
 
•          Function to retrieve the code field on the concept table based on the concept id:
 
DROP FUNCTION IF EXISTS get_concept_code;
 
DELIMITER //
 
CREATE FUNCTION get_concept_code(p_concept_id int)
 
RETURNS VARCHAR(20)
 
NOT DETERMINISTIC READS SQL DATA
 
BEGIN
 
DECLARE l_code VARCHAR(20);
 
SELECT code INTO l_code
 
FROM concept cpt
 
WHERE cpt.dbid = p_concept_id;
 
RETURN l_code;
 
END//
 
DELIMITER ;
 
Usage: SELECT get_concept_code(epoc.registration_type_concept_id)
 
      FROM episode_of_care epoc;
 
•          Function to retrieve the scheme field on the concept table based on the concept id:
 
DROP FUNCTION IF EXISTS get_concept_scheme;
 
DELIMITER //
 
CREATE FUNCTION get_concept_scheme(p_concept_id int)
 
RETURNS VARCHAR(50)
 
NOT DETERMINISTIC READS SQL DATA
 
BEGIN
 
DECLARE l_scheme VARCHAR(50);
 
SELECT scheme INTO l_scheme
 
FROM concept cpt
 
WHERE cpt.dbid = p_concept_id;
 
RETURN l_scheme;
 
END//
 
DELIMITER ;
 
Usage: SELECT get_concept_scheme(ob.non_core_concept_id)
 
      FROM observation ob;
 
•          Function to retrieve the ods code based on the organization id:
 
DROP FUNCTION IF EXISTS get_ods;
 
DELIMITER //
 
CREATE FUNCTION get_ods(p_org_id bigint)
 
RETURNS VARCHAR(50)
 
NOT DETERMINISTIC READS SQL DATA
 
BEGIN
 
DECLARE l_ods_code VARCHAR(50);
 
SELECT ods_code INTO l_ods_code
 
FROM organization org
 
WHERE org.id = p_org_id;
 
RETURN l_ods_code;
 
END//
 
DELIMITER ;
 
Usage: SELECT get_ods(pat.registered_practice_organization_id) FROM patient pat;
 
•          Function to retrieve the organization name based on the organization id:
 
DROP FUNCTION IF EXISTS get_org_name;
 
DELIMITER //
 
CREATE FUNCTION get_org_name(p_org_id bigint)
 
RETURNS VARCHAR(255)
 
NOT DETERMINISTIC READS SQL DATA
 
BEGIN
 
DECLARE l_org_name VARCHAR(255);
 
SELECT name INTO l_org_name
 
FROM organization org
 
WHERE org.id = p_org_id;
 
RETURN l_org_name;
 
END//
 
DELIMITER ;
 
Usage: SELECT get_ods(pat.registered_practice_organization_id)
 
      FROM patient pat;
 
•          Function to retrieve the event type from the description of the concept table:
 
DROP FUNCTION IF EXISTS get_event_type;
 
DELIMITER //
 
CREATE FUNCTION get_event_type(p_description VARCHAR(400))
 
RETURNS VARCHAR(50)
 
NOT DETERMINISTIC NO SQL
 
BEGIN
 
DECLARE l_event VARCHAR(50);
 
SELECT substr(p_description,instr(p_description,"(") + 1,
 
            instr(p_description,")") - instr(p_description,"(") - 1)
 
             INTO l_event;
 
RETURN l_event;
 
END//
 
DELIMITER ;
 
Usage: SELECT get_event_type(get_concept_desc(ob.core_concept_id)) event_type
 
      FROM observation ob;
 
•          Function to retrieve the latest medication order issue date based on the medication statement id:
 
DROP FUNCTION IF EXISTS get_lastest_med_issue_date;
 
DELIMITER //
 
CREATE FUNCTION get_lastest_med_issue_date (p_medication_stmt_id bigint)
 
RETURNS DATE
 
NOT DETERMINISTIC READS SQL DATA
 
BEGIN
 
DECLARE l_date DATE;
 
DECLARE l_rnk  INT;
 
-- get latest issue date
 
  SELECT
 
          med.clinical_effective_date, med.rnk INTO l_date, l_rnk
 
  FROM (
 
          SELECT
 
            mo.id,
 
            mo.medication_statement_id,
 
            mo.clinical_effective_date,
 
            @currank :=
 
            IF(@curmedstmt = mo.medication_statement_id,@currank + 1,1) AS rnk,
 
            @curmedstmt := mo.medication_statement_id AS cur_med_stmt
 
          FROM medication_order mo,(SELECT @currank := 0, @curmedstmt := 0) r
 
          WHERE mo.medication_statement_id = p_medication_stmt_id
 
          ORDER BY mo.medication_statement_id DESC,
 
                   mo.clinical_effective_date DESC,
 
                   mo.id DESC -- latest
 
        ) med
 
  WHERE med.rnk = 1;
 
RETURN l_date;
 
END//
 
DELIMITER ;
 
Usage: SELECT DATE_FORMAT(get_lastest_med_issue_date(med.id),"%Y%m%d") date_of_last_issue
 
      FROM medication_statement med;
 
•          Function to retrieve the earliest medication order issue date based on the medication statement id:
 
DROP FUNCTION IF EXISTS get_earliest_med_issue_date;
 
DELIMITER //
 
CREATE FUNCTION get_earliest_med_issue_date (p_medication_stmt_id bigint)
 
RETURNS DATE
 
NOT DETERMINISTIC READS SQL DATA
 
BEGIN
 
DECLARE l_date DATE;
 
DECLARE l_rnk  INT;
 
-- get earliest issue date
 
  SELECT
 
          med.clinical_effective_date, med.rnk INTO l_date, l_rnk
 
  FROM (
 
          SELECT
 
            mo.id,
 
            mo.medication_statement_id,
 
            mo.clinical_effective_date,
 
            @currank :=
 
            IF(@curmedstmt = mo.medication_statement_id,@currank + 1,1) AS rnk,
 
            @curmedstmt := mo.medication_statement_id AS cur_med_stmt
 
          FROM medication_order mo,(SELECT @currank := 0, @curmedstmt := 0) r
 
          WHERE mo.medication_statement_id = p_medication_stmt_id
 
          ORDER BY mo.medication_statement_id ASC,
 
                   mo.clinical_effective_date ASC,
 
                   mo.id ASC -- earliest
 
        ) med
 
  WHERE med.rnk = 1;
 
RETURN l_date;
 
END//
 
DELIMITER ;
 
Usage: SELECT DATE_FORMAT(get_earliest_med_issue_date(med.id),"%Y%m%d") issue_date
 
      FROM medication_statement med;
 
•          Function to retrieve the event log last changed date based on the record id:
 
DROP FUNCTION IF EXISTS get_event_maxdate;
 
DELIMITER //
 
CREATE FUNCTION get_event_maxdate (p_table_id tinyint, p_record_id bigint)
 
RETURNS DATETIME
 
READS SQL DATA
 
BEGIN
 
DECLARE l_max_date DATETIME;
 
SELECT  MAX(etlg.dt_change) modified_date INTO l_max_date
 
FROM event_log etlg
 
WHERE etlg.table_id = p_table_id AND record_id = p_record_id;
 
RETURN l_max_date;
 
END//
 
DELIMITER ;
 
Usage: SELECT get_event_maxdate(2,pat.id) FROM patient pat;
 
Where 2 is the table id for the patient table.
 
•          Function to retrieve the event log last changed date for medication order based on the medication statement id:
 
DROP FUNCTION IF EXISTS get_event_maxdate_medo;
 
DELIMITER //
 
CREATE FUNCTION get_event_maxdate_medo (p_table_id tinyint, p_record_id bigint)
 
RETURNS datetime
 
READS SQL DATA
 
BEGIN
 
DECLARE l_max_date datetime;
 
SELECT MAX(etlg.dt_change) INTO l_max_date
 
FROM event_log etlg JOIN medication_order mo
 
ON mo.medication_statement_id = p_record_id
 
AND mo.id = etlg.record_id
 
WHERE etlg.table_id = p_table_id;
 
RETURN l_max_date;
 
END//
 
DELIMITER ;
 
=== SQL Statement Examples ===
'''Patient Information'''
 
SELECT
 
     pat.id patient_id,
 
    get_ods(pat.registered_practice_organization_id) source_organization,
 
     IF (pat.date_of_death IS NULL,
 
        TIMESTAMPDIFF(YEAR,pat.date_of_birth,CURDATE()),
 
        TIMESTAMPDIFF(YEAR,pat.date_of_birth,pat.date_of_death)) age,
 
     get_concept_desc(pat.gender_concept_id) sex,
 
    DATE_FORMAT(epoc.startregdate,"%Y%m%d") start_regdate,
 
    get_concept_code(epoc.registration_type_concept_id)  reg_status,
 
     pat.nhs_number nhsnumber,
 
     pad.postcode,
 
     DATE_FORMAT(pat.date_of_birth,"%Y%m%d") date_of_birth,
 
     epoc.usualgp,
 
     pat.first_names forename,
 
     pat.last_name surname,
 
    get_concept_desc(pat.ethnic_code_concept_id) ethnic_category,
 
    DATE_FORMAT(pat.date_of_death,"%Y%m%d") date_of_death,
 
    DATE_FORMAT(epoc.endregdate,"%Y%m%d") end_reg_date,
 
     get_optout_status(pat.id, pat.registered_practice_organization_id) optout_status,
 
     pad.address_line_1,
 
     pad.address_line_2,
 
     pad.address_line_3,
 
     pad.address_line_4,
 
    pad.city,
 
    NULLIF(GREATEST(COALESCE(get_event_maxdate(2,pat.id),0),
 
                    COALESCE(get_event_maxdate(20,pad.id),0),
 
                    COALESCE(get_event_maxdate(6,epoc.id),0)),0) last_modified_date
 
FROM patient pat
 
LEFT JOIN patient_address pad ON pat.current_address_id = pad.id AND pat.id = pad.patient_id
 
LEFT JOIN
 
  (SELECT
 
          epc.id,
 
          epc.organization_id,
 
          epc.patient_id,
 
          epc.registration_type_concept_id,
 
          epc.registration_status_concept_id,
 
          epc.date_registered startregdate,
 
          epc.date_registered_end endregdate,
 
          epc.usual_gp_practitioner_id,
 
          prac.name usualgp
 
  FROM episode_of_care epc LEFT JOIN practitioner prac
 
  ON prac.id = epc.usual_gp_practitioner_id
 
  AND prac.organization_id = epc.organization_id) epoc
 
  ON epoc.patient_id = pat.id
 
  AND epoc.organization_id = pat.registered_practice_organization_id;
 
'''Prescription Information'''
 
SELECT
 
     med.id event_id,
 
     get_ods(med.organization_id) source_organization,
 
     med.patient_id patient_id,
 
     pat.nhs_number nhs_number,
 
    get_concept_desc(med.authorisation_type_concept_id)  issue_type,
 
    DATE_FORMAT(get_earliest_med_issue_date(med.id),"%Y%m%d") issue_date,
 
    DATE_FORMAT(get_lastest_med_issue_date(med.id),"%Y%m%d") date_of_last_issue,
 
    DATE_FORMAT(med.clinical_effective_date,"%Y%m%d") recorded_date,
 
     get_concept_code(med.core_concept_id) dmdcode,
 
     med.dose dosage,
 
    med.quantity_value quantity,
 
     med.quantity_unit quantity_unit,
 
     get_concept_desc(med.core_concept_id) dmd_description,
 
    NULLIF(GREATEST(COALESCE(get_event_maxdate(10,med.id),0),
 
                    COALESCE(get_event_maxdate_medo(9,med.id),0)),0) last_modified_date
 
FROM medication_statement med JOIN patient pat ON med.patient_id = pat.id;
 
'''Clinical Event Information'''
 
SELECT
 
     ob.id event_id,
 
     get_ods(ob.organization_id) source_organization,
 
     ob.patient_id,
 
     pat.nhs_number nhs_number,
 
    DATE_FORMAT(ob.clinical_effective_date,"%Y%m%d") effective_date,
 
     get_concept_code(ob.non_core_concept_id) original_code,
 
     get_concept_desc(ob.non_core_concept_id) original_term,
 
    get_concept_desc(get_concept_scheme(ob.non_core_concept_id)) original_code_scheme,
 
     ob.result_value result_value,
 
     CASE
 
     WHEN ob.is_problem = 1 AND ob.is_review = 0 THEN 'N - New'
 
    WHEN ob.is_problem = 1 AND ob.is_review = 1 THEN 'F - Follow Up'
 
     ELSE 'O - Other'
 
     END AS episode_type,
 
     prac.role_desc hcp_role,
 
     enc.type encounter_type,
 
     get_concept_code(ob.core_concept_id) snomed_concept_id,
 
     get_concept_desc(ob.core_concept_id) snomed_term,
 
     ob.parent_observation_id parent_event,
 
     get_ods(enc.organization_id) event_location,
 
     get_org_name(enc.organization_id) location_name,
 
     enc.appointment_id appointment_slot_id,
 
     get_event_type(get_concept_desc(ob.core_concept_id)) event_type,
 
    NULLIF(GREATEST(COALESCE(get_event_maxdate(11,ob.id),0),
 
                    COALESCE(get_event_maxdate(5,enc.id),0),
 
                    COALESCE(get_event_maxdate(13,prac.id),0)),0) last_modified_date
 
FROM observation ob JOIN patient pat ON ob.patient_id = pat.id
 
LEFT JOIN practitioner prac ON prac.id = ob.practitioner_id
 
LEFT JOIN encounter enc ON enc.id = ob.encounter_id;
 
'''Patient Registration History Information'''
 
SELECT
 
     get_ods(epoc.organization_id) source_origanization,
 
     pat.nhs_number,
 
    get_concept_code(epoc.registration_status_concept_id) reg_status,
 
    get_concept_desc(epoc.registration_status_concept_id) reg_status_desc,
 
     get_concept_code(epoc.registration_type_concept_id) registration_type,
 
    get_concept_desc(epoc.registration_type_concept_id) registration_type_desc,
 
    DATE_FORMAT(epoc.date_registered,"%Y%m%d") date_registered,
 
    DATE_FORMAT(epoc.date_registered_end,"%Y%m%d") date_registered_end,
 
     epoc.id sequence_number,
 
    NULLIF(COALESCE(get_event_maxdate(6,epoc.id),0),0)  last_modified_date
 
FROM episode_of_care epoc JOIN patient pat ON epoc.patient_id = pat.id;
 
'''Patient Address Information'''
 
SELECT
 
     padr.id address_id,
 
     padr.address_line_1 address1,
 
     padr.address_line_2 address2,
 
     padr.address_line_3 address3,
 
     padr.address_line_4 address4,
 
     padr.city address5,
 
     padr.postcode postcode,
 
     parn.uprn,
 
     DATE_FORMAT(padr.start_date,"%Y%m%d") start_date,
 
    DATE_FORMAT(padr.end_date,"%Y%m%d") end_date,
 
     padr.lsoa_2001_code,
 
     padr.lsoa_2011_code,
 
     padr.msoa_2001_code,
 
     padr.msoa_2011_code,
 
     pat.nhs_number,
 
     parn.abp_address_number matched_address_1,
 
     parn.abp_address_street matched_address_2,
 
     parn.abp_address_locality matched_address_3,
 
     parn.abp_address_town matched_address_4,
 
     parn.abp_address_postcode matched_postcode,
 
     parn.classification,
 
     parn.abp_address_organization business_name,
 
     parn.match_date uprn_match_date,
 
     parn.status uprn_status,
 
     parn.latitude,
 
     parn.longitude,
 
    NULLIF(COALESCE(get_event_maxdate(20,padr.id),0),0)  last_modified_date
 
FROM patient_address padr JOIN patient pat ON padr.patient_id = pat.id
 
LEFT JOIN patient_address_match parn ON padr.id = parn.id;
 
'''Appointment Slot Information'''
 
SELECT
 
     appt.id slot_id,
 
     get_ods(appt.organization_id) source_origanization,
 
     appt.schedule_id session_id,
 
     DATE_FORMAT(appt.start_date,"%Y%m%d %T") start_time,
 
     appt.planned_duration planned_duration,
 
     appt.actual_duration actual_duration,
 
    get_concept_desc(appt.appointment_status_concept_id) status,
 
    DATE_FORMAT(appt.date_time_sent_in,"%Y%m%d %T") actual_start_time,
 
     appt.patient_wait wait_time,
 
    DATE_FORMAT(appt.cancelled_date,"%Y%m%d %T") date_cancelled,
 
    DATE_FORMAT(appt.date_time_left,"%Y%m%d %T") time_left,
 
     pat.nhs_number,
 
     NULLIF(COALESCE(get_event_maxdate(18,appt.id),0),0)  last_modified_date
 
FROM appointment appt JOIN patient pat ON appt.patient_id = pat.id;
 
'''Encounter Information'''
 
SELECT
 
     enc.id encounter_id,
 
     get_ods(enc.organization_id) source_organization,
 
     pat.nhs_number,
 
      DATE_FORMAT(enc.clinical_effective_date,"%Y%m%d %T") encounter_startdate,
 
     DATE_FORMAT(enc.end_date,"%Y%m%d %T") encounter_enddate,
 
     get_concept_desc(enc.non_core_concept_id) original_type,
 
     prac.role_desc hcp_role,
 
     enc.type encounter_type,
 
    get_ods(enc.service_provider_organization_id) event_organization,
 
     enc.appointment_id appointment_slot_id,
 
    NULLIF(GREATEST(COALESCE(get_event_maxdate(5,enc.id),0),
 
                     COALESCE(get_event_maxdate(13,prac.id),0)),0) last_modified_date
 
FROM encounter enc JOIN patient pat ON enc.patient_id = pat.id


=== SQL Statement Examples ===
'''Patient Information'''<syntaxhighlight lang="sql">
SELECT
     pat.id patient_id,
     get_ods(pat.registered_practice_organization_id) source_organization,
     IF (pat.date_of_death IS NULL,
        TIMESTAMPDIFF(YEAR,pat.date_of_birth,CURDATE()),
        TIMESTAMPDIFF(YEAR,pat.date_of_birth,pat.date_of_death)) age,
     get_concept_desc(pat.gender_concept_id) sex,
     DATE_FORMAT(epoc.startregdate,"%Y%m%d") start_regdate,
     get_concept_code(epoc.registration_type_concept_id)  reg_status,
     pat.nhs_number nhsnumber,
     pad.postcode,
     DATE_FORMAT(pat.date_of_birth,"%Y%m%d") date_of_birth,
     epoc.usualgp,
     pat.first_names forename,
     pat.last_name surname,
     get_concept_desc(pat.ethnic_code_concept_id) ethnic_category,
     DATE_FORMAT(pat.date_of_death,"%Y%m%d") date_of_death,
     DATE_FORMAT(epoc.endregdate,"%Y%m%d") end_reg_date,
     get_optout_status(pat.id, pat.registered_practice_organization_id) optout_status,
     pad.address_line_1,
     pad.address_line_2,
     pad.address_line_3,
     pad.address_line_4,
     pad.city,
     NULLIF(GREATEST(COALESCE(get_event_maxdate(2,pat.id),0),
                     COALESCE(get_event_maxdate(20,pad.id),0),
                     COALESCE(get_event_maxdate(6,epoc.id),0)),0) last_modified_date
FROM patient pat
LEFT JOIN patient_address pad ON pat.current_address_id = pad.id AND pat.id = pad.patient_id
LEFT JOIN
  (SELECT
          epc.id,
          epc.organization_id,
          epc.patient_id,
          epc.registration_type_concept_id,
          epc.registration_status_concept_id,
          epc.date_registered startregdate,
          epc.date_registered_end endregdate,
          epc.usual_gp_practitioner_id,
          prac.name usualgp
  FROM episode_of_care epc LEFT JOIN practitioner prac
  ON prac.id = epc.usual_gp_practitioner_id
  AND prac.organization_id = epc.organization_id) epoc
  ON epoc.patient_id = pat.id
  AND epoc.organization_id = pat.registered_practice_organization_id;
</syntaxhighlight>'''Prescription Information'''<syntaxhighlight lang="sql">
SELECT
     med.id event_id,
     get_ods(med.organization_id) source_organization,
     med.patient_id patient_id,
     pat.nhs_number nhs_number,
     get_concept_desc(med.authorisation_type_concept_id)  issue_type,
     DATE_FORMAT(get_earliest_med_issue_date(med.id),"%Y%m%d") issue_date,
     DATE_FORMAT(get_lastest_med_issue_date(med.id),"%Y%m%d") date_of_last_issue,
     DATE_FORMAT(med.clinical_effective_date,"%Y%m%d") recorded_date,
     get_concept_code(med.core_concept_id) dmdcode,
     med.dose dosage,
     med.quantity_value quantity,
     med.quantity_unit quantity_unit,
     get_concept_desc(med.core_concept_id) dmd_description,
     NULLIF(GREATEST(COALESCE(get_event_maxdate(10,med.id),0),
                     COALESCE(get_event_maxdate_medo(9,med.id),0)),0) last_modified_date
FROM medication_statement med JOIN patient pat ON med.patient_id = pat.id;
</syntaxhighlight>'''Clinical Event Information'''<syntaxhighlight lang="sql">
SELECT
     ob.id event_id,
     get_ods(ob.organization_id) source_organization,
     ob.patient_id,
     pat.nhs_number nhs_number,
     DATE_FORMAT(ob.clinical_effective_date,"%Y%m%d") effective_date,
     get_concept_code(ob.non_core_concept_id) original_code,
     get_concept_desc(ob.non_core_concept_id) original_term,
     get_concept_desc(get_concept_scheme(ob.non_core_concept_id)) original_code_scheme,
     ob.result_value result_value,
     CASE
     WHEN ob.is_problem = 1 AND ob.is_review = 0 THEN 'N - New'
     WHEN ob.is_problem = 1 AND ob.is_review = 1 THEN 'F - Follow Up'
     ELSE 'O - Other'
     END AS episode_type,
     prac.role_desc hcp_role,
     enc.type encounter_type,
     get_concept_code(ob.core_concept_id) snomed_concept_id,
     get_concept_desc(ob.core_concept_id) snomed_term,
     ob.parent_observation_id parent_event,
     get_ods(enc.organization_id) event_location,
     get_org_name(enc.organization_id) location_name,
     enc.appointment_id appointment_slot_id,
     get_event_type(get_concept_desc(ob.core_concept_id)) event_type,
    NULLIF(GREATEST(COALESCE(get_event_maxdate(11,ob.id),0),
                    COALESCE(get_event_maxdate(5,enc.id),0),
                    COALESCE(get_event_maxdate(13,prac.id),0)),0) last_modified_date
FROM observation ob JOIN patient pat ON ob.patient_id = pat.id
LEFT JOIN practitioner prac ON prac.id = ob.practitioner_id
LEFT JOIN encounter enc ON enc.id = ob.encounter_id;
</syntaxhighlight>'''Patient Registration History Information'''<syntaxhighlight lang="sql">
SELECT
     get_ods(epoc.organization_id) source_origanization,
     pat.nhs_number,
     get_concept_code(epoc.registration_status_concept_id) reg_status,
     get_concept_desc(epoc.registration_status_concept_id) reg_status_desc,
     get_concept_code(epoc.registration_type_concept_id) registration_type,
     get_concept_desc(epoc.registration_type_concept_id) registration_type_desc,
     DATE_FORMAT(epoc.date_registered,"%Y%m%d") date_registered,
     DATE_FORMAT(epoc.date_registered_end,"%Y%m%d") date_registered_end,
     epoc.id sequence_number,
     NULLIF(COALESCE(get_event_maxdate(6,epoc.id),0),0)  last_modified_date
FROM episode_of_care epoc JOIN patient pat ON epoc.patient_id = pat.id;
</syntaxhighlight>'''Patient Address Information'''<syntaxhighlight lang="sql">
SELECT
     padr.id address_id,
     padr.address_line_1 address1,
     padr.address_line_2 address2,
     padr.address_line_3 address3,
     padr.address_line_4 address4,
     padr.city address5,
     padr.postcode postcode,
     parn.uprn,
     DATE_FORMAT(padr.start_date,"%Y%m%d") start_date,
     DATE_FORMAT(padr.end_date,"%Y%m%d") end_date,
     padr.lsoa_2001_code,
     padr.lsoa_2011_code,
     padr.msoa_2001_code,
     padr.msoa_2011_code,
     pat.nhs_number,
     parn.abp_address_number matched_address_1,
     parn.abp_address_street matched_address_2,
     parn.abp_address_locality matched_address_3,
     parn.abp_address_town matched_address_4,
     parn.abp_address_postcode matched_postcode,
     parn.classification,
     parn.abp_address_organization business_name,
     parn.match_date uprn_match_date,
     parn.status uprn_status,
     parn.latitude,
     parn.longitude,
     NULLIF(COALESCE(get_event_maxdate(20,padr.id),0),0)  last_modified_date
FROM patient_address padr JOIN patient pat ON padr.patient_id = pat.id
LEFT JOIN patient_address_match parn ON padr.id = parn.id;
</syntaxhighlight>'''Appointment Slot Information'''<syntaxhighlight lang="sql">
SELECT
     appt.id slot_id,
     get_ods(appt.organization_id) source_origanization,
     appt.schedule_id session_id,
     DATE_FORMAT(appt.start_date,"%Y%m%d %T") start_time,
     appt.planned_duration planned_duration,
     appt.actual_duration actual_duration,
     get_concept_desc(appt.appointment_status_concept_id) status,
     DATE_FORMAT(appt.date_time_sent_in,"%Y%m%d %T") actual_start_time,
     appt.patient_wait wait_time,
     DATE_FORMAT(appt.cancelled_date,"%Y%m%d %T") date_cancelled,
     DATE_FORMAT(appt.date_time_left,"%Y%m%d %T") time_left,
     pat.nhs_number,
     NULLIF(COALESCE(get_event_maxdate(18,appt.id),0),0)  last_modified_date
FROM appointment appt JOIN patient pat ON appt.patient_id = pat.id;
</syntaxhighlight>'''Encounter Information'''<syntaxhighlight lang="sql">
SELECT
     enc.id encounter_id,
     get_ods(enc.organization_id) source_organization,
     pat.nhs_number,
     DATE_FORMAT(enc.clinical_effective_date,"%Y%m%d %T") encounter_startdate,
     DATE_FORMAT(enc.end_date,"%Y%m%d %T") encounter_enddate,
     get_concept_desc(enc.non_core_concept_id) original_type,
     prac.role_desc hcp_role,
     enc.type encounter_type,
     get_ods(enc.service_provider_organization_id) event_organization,
     enc.appointment_id appointment_slot_id,
    NULLIF(GREATEST(COALESCE(get_event_maxdate(5,enc.id),0),
                    COALESCE(get_event_maxdate(13,prac.id),0)),0) last_modified_date
FROM encounter enc JOIN patient pat ON enc.patient_id = pat.id
LEFT JOIN practitioner prac ON prac.id = enc.practitioner_id;
LEFT JOIN practitioner prac ON prac.id = enc.practitioner_id;
 
</syntaxhighlight>'''Organization Information'''<syntaxhighlight lang="sql">
'''Organization Information'''
SELECT
 
     org.id organization_id,
     org.ods_code,
     org.name organization_name,
     org.type_code,
     org.type_desc,
     org.postcode,
     org.parent_organization_id,
     NULLIF(COALESCE(get_event_maxdate(12,org.id),0),0)  last_modified_date
FROM organization org;
</syntaxhighlight>'''Appointment Session Information'''<syntaxhighlight lang="sql">
SELECT
SELECT
 
     sch.id session_id,
     org.id organization_id,
 
     org.ods_code,
 
     org.name organization_name,
 
     org.type_code,
 
     org.type_desc,
 
     org.postcode,
 
     org.parent_organization_id,
 
    NULLIF(COALESCE(get_event_maxdate(12,org.id),0),0)  last_modified_date
 
FROM organization org;
 
'''Appointment Session Information'''
 
SELECT
 
     sch.id session_id,  
 
     sch.name session_name,
     sch.name session_name,
     get_ods(sch.organization_id) source_organization,
     get_ods(sch.organization_id) source_organization,
     prac.role_desc hpc_role,
     prac.role_desc hpc_role,
     DATE_FORMAT(sch.start_date,"%Y%m%d %T:%f") start_time,
     DATE_FORMAT(sch.start_date,"%Y%m%d %T:%f") start_time,
 
     sch.type session_type,
     sch.type session_type,  
 
     sch.location location,
     sch.location location,
 
    NULLIF(GREATEST(COALESCE(get_event_maxdate(17,sch.id),0),
    NULLIF(GREATEST(COALESCE(get_event_maxdate(17,sch.id),0),  
                    COALESCE(get_event_maxdate(13,prac.id),0)),0) last_modified_date
 
                    COALESCE(get_event_maxdate(13,prac.id),0)),0)
 
last_modified_date
 
FROM schedule sch JOIN practitioner prac ON prac.id = sch.practitioner_id;
FROM schedule sch JOIN practitioner prac ON prac.id = sch.practitioner_id;
</syntaxhighlight>

Latest revision as of 09:38, 18 June 2020

This article describes the steps needed to create, and analyse, patient data selections from the Discovery database.

Introduction

This outlines the steps we take to create the patient selection for analysis from the Discovery database.

To add this data to a report, we create datasets. Each dataset represents the result set from running a combination of SQL commands and stored procedures on a data source.

Fig 1.1 shows the algorithm we follow to generate these datasets

RSD SQL Guide C2 Fig 1.1.jpg

Creating the Patient Cohort

This is the first step when we receive a list of patients marked for observational study.

General Patient Cohort

Fig 2.1 Entity diagram showing the relationships between patient and organization

RSD SQL Guide C2 Fig 2.1.png

We filter patients by their organization ODS codes to ensure only patients of the required GP practices are selected.

Example 1.1 Building patient cohort SQL with NHS numbers

DROP TABLE IF EXISTS cohort_patient;

CREATE TABLE cohort_patient AS
SELECT
       p.id AS patient_id,
       p.nhs_number,
       p.person_id AS group_by,
       p.organization_id,
       p.date_of_birth -- add additional columns if required   
FROM patient p JOIN organization org ON p.organization_id = org.id
WHERE org.ods_code IN ('EMIS99','5PC64');  

-- add ods codes inside the brackets e.g. EMIS99, 5PC64
Cohort with Filtered Observations

Sometimes, the cohort is created with filtered observations.

In the example below, we create a cohort where the patients are registered, alive and have diabetes. They belong to age 12 and over.

Here we have introduced additional tables for the query: observation, concept and the episode of care.

Fig 2.2 Entity diagram showing the relationships between patient, observation and episode of care

RSD SQL Guide C2 Fig 2.2.png

By joining the observation table we can obtain the observational information about the patient. If specific information is required, like in this case we are looking for patients with diabetes, we would need to filter the observation based on that specific clinical term. That could be in the form of a stand alone code or a list of codes: SNOMED, READ, CTV3, etc.

The observation table does not store these codes directly but store them as concept ids. On querying the data then, the concept table will be queried first to get the list of concept ids that you are interested in for this observation.

If you intend to filter observations by SNOMEDS then add in the concept map table which will map any legacy concept ids, that is, READ, CTV3 etc, to the SNOMED concept ids, then you can filter by SNOMED codes as shown: 

SELECT
  ...

FROM patient p
JOIN observation o on o.patient_id = p.id
JOIN concept_map cm on cm.legacy = o.non_core_concept_id
-- use the non core concept id for link to the concept table
JOIN concept c on c.dbid = cm.core
WHERE c.code IN (46635009, 44054006) -- e.g. diabetes snomed codes

From the episode of care, we can obtain whether the patient is currently registered and has the appropriate registration type for this study:

SELECT
  ...

FROM patient p
JOIN episode_of_care e ON e.patient_id = p.id
JOIN concept c ON c.dbid = e.registration_type_concept_id
WHERE c.code = 'R'  -- registered

Some patients prefer to withhold consent and therefore are excluded from the cohort. We can check this in the observation table. Again we need to get a list of concept ids that you are interested in for this observation.

For example, you could approach it in this way:

1.        Create a lookup table to hold the consent codes:

2.        Create a view to filter out the concept ids on the observation table with these codes:

-- filter out the observation table with the relevant consent codes

DROP VIEW IF EXISTS obs_optout_view;

CREATE VIEW obs_optout_view AS
SELECT
       ob.id,
       ob.patient_id,
       ob.organization_id,
       ob.non_core_concept_id,
       ob.clinical_effective_date,
       cpt.dbid,
       cpt.code,
       cpt.name,
       csc.status
FROM observation ob
JOIN concept cpt ON cpt.dbid = ob.non_core_concept_id
JOIN consent_code csc ON cpt.id = csc.code
AND cpt.id = BINARY csc.code;

3.        Using the view to create a helper function to get the latest opt-out status of the patient:

-- function to retrieve the patient's latest opt out status

DROP FUNCTION IF EXISTS get_optout_status;
DELIMITER //
CREATE FUNCTION get_optout_status(p_patient_id BIGINT,
                                  p_organization_id BIGINT)
RETURNS VARCHAR(1)
NOT DETERMINISTIC READS SQL DATA

BEGIN

DECLARE l_status          VARCHAR(1);
DECLARE l_patient_id      BIGINT;
DECLARE l_organization_id BIGINT;

  SELECT
         ob.patient_id,
         ob.organization_id,
         IF (ob.status = 'Opt-Out', 1, IF(ob.status = 'Opt-In',null,null))
         INTO l_patient_id, l_organization_id, l_status
  FROM obs_optout_view ob
  LEFT JOIN obs_optout_view ob2 ON ob2.patient_id = ob.patient_id
  AND (ob.clinical_effective_date < ob2.clinical_effective_date
  OR (ob.clinical_effective_date = ob2.clinical_effective_date
       AND ob.id < ob2.id))
  WHERE ob2.patient_id IS NULL
  AND ob.patient_id = p_patient_id
  AND ob.organization_id = p_organization_id LIMIT 1;

RETURN l_status;
END//
DELIMITER ;

Where 1 = opt out, null = opt in or not recorded derived from the latest observation from the consent codes. Putting all of the above together, patient cohort query becomes this:

CREATE TABLE cohort_patient AS
  SELECT DISTINCT
         p.id         AS patient_id,
         p.person_id  AS group_by,
         get_optout_status (p.id, p.registered_practice_organization_id) AS optout_status,
         IF (p.date_of_death IS NULL,
              TIMESTAMPDIFF(YEAR, p.date_of_birth, CURDATE()),
              TIMESTAMPDIFF(YEAR, p.date_of_birth, p.date_of_death)) age_years
  FROM patient p
  JOIN observation o ON p.id = o.patient_id
  JOIN organization org ON org.id = o.organization_id
  JOIN episode_of_care e ON e.patient_id = o.patient_id
  JOIN concept c ON c.dbid = e.registration_type_concept_id
  JOIN (SELECT o.patient_id
        FROM observation o
        JOIN concept_map cm ON cm.legacy = o.non_core_concept_id
        JOIN concept c ON c.dbid = cm.core
        WHERE c.code IN (46635009, 44054006) -- e.g. snomed codes
        ) ob ON ob.patient_id = p.id
  WHERE org.ods_code IN ('EMIS99','5PC64') -- e.g. ods codes
  AND p.date_of_death IS NULL
  AND c.code = 'R'  -- registered patient
  AND e.date_registered <= now()
  AND (e.date_registered_end > now() OR
       e.date_registered_end IS NULL) -- active
  AND IF (p.date_of_death IS NULL,
          TIMESTAMPDIFF(YEAR, p.date_of_birth, CURDATE()),
          TIMESTAMPDIFF(YEAR, p.date_of_birth, p.date_of_death)) >= 12 -- age 12 and over
  AND get_optout_status(p.id, p.registered_practice_organization_id) IS NULL; -- opt in status

Creating the Observations Cohort

We create a separate observations cohort for query performance. Instead of scanning through the entire observation table, we create a sub-set of the observation table based on the patient cohort.

Example 2.1 Building the Observations Cohort SQL

CREATE TABLE cohort_observations AS
  SELECT DISTINCT
         o.id,
         o.person_id AS group_by,
         o.patient_id,
         o.person_id,
         o.clinical_effective_date,
         o.non_core_concept_id,
         o.result_value,
         o.result_value_units,
         o.result_date,
         o.result_text,
         cr.age_year
  FROM observation o JOIN cohort_patient cr
  ON o.person_id = cr.person_id
  AND o.organization_id = cr.organization_id
  AND o.patient_id = cr.patient_id;

CREATE INDEX obs_cpt_ix ON cohort_observation(non_core_concept_id);
CREATE INDEX obs_pat_ix ON cohort_observation(patient_id);

Creating the Medications Cohort

Again, we create the medications cohort for query performance. We create a sub-set of the medication statement/order table based on the patient cohort.

Fig 4.1 Entity diagram showing the relationships between patient, medication statement and organization

Example 4.1 Building the Medications Cohort SQL

-- step 1: cohort of the medication statement

DROP TABLE IF EXISTS cohort_medications_stmt;

CREATE TABLE cohort_medications_stmt AS
     SELECT DISTINCT
            m.id,
            m.non_core_concept_id,
            m.person_id,
            m.patient_id,
            m.clinical_effective_date,
            m.cancellation_date
     FROM medication_statement m
     JOIN cohort_patient cr
     ON m.organization_id = cr.organization_id
     AND m.person_id = cr.person_id
     AND m.patient_id = cr.patient_id;

CREATE INDEX med_stmt_ix ON cohort_medications_stmt(id);

-- step 2: cohort of the medication oder

DROP TABLE IF EXISTS cohort_medications_ord;

CREATE TABLE cohort_medications_ord AS  
     SELECT DISTINCT
            mo.id,
            mo.medication_statement_id
            mo.person_id,
            mo.patient_id,
            mo.clinical_effective_date
     FROM medication_order mo
     JOIN cohort_patient cr
     ON mo.organization_id = cr.organization_id
     AND mo.person_id = cr.person_id
     AND mo.patient_id = cr.patient_id;

CREATE INDEX med_ord_ix ON cohort_medications_ord(medication_statement_id);    

-- step 3: cohort of the medication - statement plus order

DROP TABLE IF EXISTS cohort_medications;

CREATE TABLE cohort_medications AS
     SELECT DISTINCT
            ms.id,
            ms.non_core_concept_id,
            ms.person_id,
            ms.patient_id,
            mo.clinical_effective_date,
            ms.cancellation_date
      FROM cohort_medications_stmt ms JOIN cohort_medications_ord mo
      ON ms.id = mo.medication_statement_id;

CREATE INDEX med_ix ON cohort_medications(non_core_concept_id);
CREATE INDEX med_pat_ix ON cohort_medications(patient_id);        

Building Datasets

This involves creating a number of temporary tables in the database.

Patient Demographic Dataset

A typical patient demographic dataset may consist of the following information:

For this example, we create the dataset to hold the patient demographic information as follows:

CREATE TABLE demographicsDataset (
  ExtractDate                    DATETIME     NULL,
  Pseudo_id                      VARCHAR(255) NULL,   -- unique identifier
  Pseudo_NHSNumber               VARCHAR(255) NULL,
  Gender                         VARCHAR(50)  NULL,
  Age                            VARCHAR(50)  NULL,
  DateOfBirth                    DATE         NULL,
  EthnicityLCode                 VARCHAR(50)  NULL,
  EthnicityLTerm                 VARCHAR(200) NULL,
  BirthCountryLCode              VARCHAR(50)  NULL,
  BirthCountryLTerm              VARCHAR(200) NULL,
  RegistrationStart              DATE         NULL,
  RegistrationEnd                DATE         NULL,
  IMD2010                        VARCHAR(50)  NULL,
  LSOA2011                       VARCHAR(50)  NULL,
  PracticeODSCode                VARCHAR(50)  NULL,
  PracticeODSName                VARCHAR(255) NULL,
  CCGName                        VARCHAR(100) NULL,
  YearOfDeath                    INT(4)       NULL,
  F2fVisits_Total                INT(11)      DEFAULT 0,
  F2fVisits_1year                INT(11)      DEFAULT 0,
  F2fVisits_5years               INT(11)      DEFAULT 0
);

ALTER TABLE demographicsDataset ADD INDEX demo_pseudoid_idx (pseudo_id);

-- the unique identifier is usually the pseudo id
-- but to adapt for this schema we could use the person_id

We pre-populate the dataset with the unique identifiers from the patient cohort:

INSERT INTO demographicsDataset (pseudo_id, extractdate) SELECT DISTINCT group_by, now() FROM cohort_patient;

Then we run separate update SQL queries to populate the columns of the dataset table.

For example, add registration start date, registration end date, practice ods code, practice ods name:

DROP TEMPORARY TABLE IF EXISTS reg_sort;
DROP TEMPORARY TABLE IF EXISTS qry_reg;

CREATE TEMPORARY TABLE qry_reg AS
SELECT c.group_by,
       e.date_regirstered,
       e.date_registered_end,
       o.name,
       o.ods_code
FROM cohort_patient c
JOIN episode_of_care e ON e.person_id = c.group_by
JOIN organization o ON o.id = e.organization_id
WHERE o.ods_code IN ('EMIS99','5PC64'); -- e.g. ods codes


CREATE TEMPORARY TABLE reg_sort (
      row_id               INT,
      group_by             BIGINT,
      date_registered      DATE,
      date_regiostered_end DATE,
      ods_code             VARCHAR(50),
      name                 VARCHAR(255),
      rnk                  INT, PRIMARY KEY(row_id)) AS
SELECT (@row_no := @row_no + 1) AS row_id,
        a.group_by,
        a.date_registered,
        a.date_registered_end,
        a.ods_code,      
        a.name,
        a.rnk
FROM (SELECT q.group_by,
             q.date_registered,
             q.date_registered_end,
             q.name,
             q.ods_code,
             @currank := IF(@curperson = q.group_by, @currank + 1, 1) AS rnk,
             @curperson := q.group_by AS cur_person
      FROM qry_reg q, (SELECT @currank := 0, @curperson := 0) r
      ORDER BY q.group_by, q.date_registered DESC ) a,
      (SELECT @row_no := 0) t
WHERE a.rnk = 1;

SET @row_id = 0;

WHILE EXISTS (SELECT row_id from reg_sort
              WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO

       UPDATE demographicsDataset d
       JOIN reg_sort reg ON d.pseudo_id = reg.group_by
       SET d.RegistrationStart = reg.date_registered,
           d.RegistrationEnd = reg.date_registered_end,
           d.PracticeODSCode = reg.ods_code,
           d.PracticeODSName = reg.name
       WHERE reg.row_id > @row_id
       AND reg.row_id <= @row_id + 1000;

       SET @row_id = @row_id + 1000;  

END WHILE;

In this example, we created 2 temporary tables. The first to hold the registration data. The second to select from the first the records which contain the latest registration date.

If the update is large in terms of the number of records to update then it might be possible to optimise the update process by using batches as demonstrated in the code example above.

Patient Observations Dataset

A typical patient observations dataset may ask for the following information:

A typical dataset can have over 100 to 200 fields. Usually, there will be several datasets to deliver. But this illustrates the kind of layout we use for constructing an observational dataset.

For this example, we create the dataset to hold the observational information as follows:

DROP TABLE IF EXISTS diagnoses2dataset;

CREATE TABLE diagnoses2dataset (
  Pseudo_id                                        VARCHAR(255) NULL,
  Pseudo_NHSNumber                                 VARCHAR(255) NULL,
  AsthmaECode                                      VARCHAR(50) NULL,
  AsthmaETerm                                      VARCHAR(200) NULL,
  AsthmaEDate                                      VARCHAR(50) NULL,
  AsthmaEmergeECode                                VARCHAR(50) NULL,
  AsthmaEmergeETerm                                VARCHAR(200) NULL,
  AsthmaEmergeEDate                                VARCHAR(50) NULL,
  AsthmaResolvedECode                              VARCHAR(50) NULL,
  AsthmaResolvedETerm                              VARCHAR(200) NULL,
  AsthmaResolvedEDate                              VARCHAR(50) NULL,
  COPDECode                                        VARCHAR(50) NULL,
  COPDETerm                                        VARCHAR(200) NULL,
  COPDEDate                                        VARCHAR(50) NULL,
  PulmonaryFibrosisECode                           VARCHAR(50) NULL,
  PulmonaryFibrosisETerm                           VARCHAR(200) NULL,
  PulmonaryFibrosisEDate                           VARCHAR(50) NULL,
  InterstitialLungDiseaseECode                     VARCHAR(50) NULL,
  InterstitialLungDiseaseETerm                     VARCHAR(200) NULL,
  InterstitialLungDiseaseEDate                     VARCHAR(50) NULL,
  AgeRelatedMuscularDegenerationECode              VARCHAR(50) NULL,
  AgeRelatedMuscularDegenerationETerm              VARCHAR(200) NULL,
  AgeRelatedMuscularDegenerationEDate              VARCHAR(50) NULL,
  GlaucomaECode                                    VARCHAR(50) NULL,
  GlaucomaETerm                                    VARCHAR(200) NULL,
  GlaucomaEDate                                    VARCHAR(50) NULL,
  RheumatoidArthritisECode                         VARCHAR(50) NULL,
  RheumatoidArthritisETerm                         VARCHAR(200) NULL,
  RheumatoidArthritisEDate                         VARCHAR(50) NULL,
  SystemicLupusECode                               VARCHAR(50) NULL,
  SystemicLupusETerm                               VARCHAR(200) NULL,
  SystemicLupusEDate                               VARCHAR(50) NULL,
  InflammatoryBowelDiseaseECode                    VARCHAR(50) NULL,
  InflammatoryBowelDiseaseETerm                    VARCHAR(200) NULL,
  InflammatoryBowelDiseaseEDate                    VARCHAR(50) NULL,
  CrohnsDiseaseECode                               VARCHAR(50) NULL,
  CrohnsDiseaseETerm                               VARCHAR(200) NULL,
  CrohnsDiseaseEDate                               VARCHAR(50) NULL,
  UlcerativeColitisCodeECode                       VARCHAR(50) NULL,
  UlcerativeColitisCodeETerm                       VARCHAR(200) NULL,
  UlcerativeColitisCodeEDate                       VARCHAR(50) NULL,
  AtopicDermatitisECode                            VARCHAR(50) NULL,
  AtopicDermatitisETerm                            VARCHAR(200) NULL,
  AtopicDermatitisEDate                            VARCHAR(50) NULL,
  InheritedMucociliaryClearanceECode               VARCHAR(50) NULL,
  InheritedMucociliaryClearanceETerm               VARCHAR(200) NULL,
  InheritedMucociliaryClearanceEDate               VARCHAR(50) NULL,
  PrimaryCiliaryDyskinesiaECode                    VARCHAR(50) NULL,
  PrimaryCiliaryDyskinesiaETerm                    VARCHAR(200) NULL,
  PrimaryCiliaryDyskinesiaEDate                    VARCHAR(50) NULL,
  MelanomaECode                                    VARCHAR(50) NULL,
  MelanomaETerm                                    VARCHAR(200) NULL,
  MelanomaEDate                                    VARCHAR(50) NULL,
  ProstateCancerECode                              VARCHAR(50) NULL,
  ProstateCancerETerm                              VARCHAR(200) NULL,
  ProstateCancerEDate                              VARCHAR(50) NULL,
  LungCancerECode                                  VARCHAR(50) NULL,
  LungCancerETerm                                  VARCHAR(200) NULL,
  LungCancerEDate                                  VARCHAR(50) NULL,
  SmallBowelCancerECode                            VARCHAR(50) NULL,
  SmallBowelCancerETerm                            VARCHAR(200) NULL,
  SmallBowelCancerEDate                            VARCHAR(50) NULL,
  ColorectalCancerECode                            VARCHAR(50) NULL,
  ColorectalCancerETerm                            VARCHAR(200) NULL,
  ColorectalCancerEDate                            VARCHAR(50) NULL,
  BreastCancerECode                                VARCHAR(50) NULL,
  BreastCancerETerm                                VARCHAR(200) NULL,
  BreastCancerEDate                                VARCHAR(50) NULL,
  MiscarriageECode                                 VARCHAR(50) NULL,
  MiscarriageETerm                                 VARCHAR(200) NULL,
  MiscarriageEDate                                 VARCHAR(50) NULL
);

ALTER TABLE diagnoses2dataset ADD INDEX pseudoid_idx (pseudo_id);

INSERT INTO diagnoses2dataset (pseudo_id)
SELECT DISTINCT group_by FROM cohort_patient;

Each column of the dataset is named by joining pieces of identifiable information together by combining the observational type, the timeframe for the analysis and the requested field type.

Fig 5.1 Shows the naming convention of a dataset column

Again, we pre-populate the dataset with the unique identifiers from the patient cohort.

Then we run separate update SQL queries to populate the dataset table by calling multiple stored procedures.

Patient Medications Dataset

A typical medication request would look like this:

For this example, we create the dataset to hold the medication information as follows:

CREATE TABLE medications (
  pseudo_id            VARCHAR(255) DEFAULT NULL,
  nhsnumber            VARCHAR(10)  DEFAULT NULL,
  codedate             VARCHAR(20)  DEFAULT NULL,
  codeterm             VARCHAR(255) DEFAULT NULL,
  code                 VARCHAR(100) DEFAULT NULL,
  codevalue            VARCHAR(100) DEFAULT NULL,
  codeunit             VARCHAR(100) DEFAULT NULL
);

Again, we pre-populate the dataset with the unique identifiers from the patient cohort.

Then we run separate update SQL queries to populate the dataset table by calling multiple stored procedures.

Calling Stored Procedures

Once we have generated the observation and medication datasets and cohorts, the next step is to populate the datasets using stored procedures.

We construct a series of populate stored procedures which in turn call on other stored procedures to complete the update process.

Fig 6.1 Shows the datasets update process using stored procedures


Calling the Populate Stored Procedures

Create a SQL script to call the stored procedure as follows:

CALL populateCodeDate
(0,'StillbirthE','diagnoses2dataset',0,'237364002,161743003',null,null,null,'N');
CALL populateCodeDate
(0,'PregnancyInducedHypertensionE','diagnoses2dataset',0,null,'48194001',null,null,'N');
CALL populateCodeDate
(0,'PreEclampsiaE','diagnoses2dataset',0,'398254007',null,null,null,'N');
CALL populateCodeDate
(0,'CholestasisE','diagnoses2dataset',0,'235888006',null,null,null,'N');
CALL populateCodeDate
(0,'GallstonesE','diagnoses2dataset',0,'266474003,407637009',null,null,null,'N');
CALL populateCodeDate
(0,'GoutE','diagnoses2dataset',0,'90560007,161451004',null,null,null,'N');
CALL populateCodeDate
(0,'AnkylosingSpondylitisE','diagnoses2dataset',0,'9631008',null,null,null,'N');
CALL populateCodeDate
(0,'JaundiceE','diagnoses2dataset',0,'18165001,161536006',null,null,null,'N');
CALL populateCodeDate
(0,'PsoriasisE','diagnoses2dataset',0,'9014002',null,null,null,'N');
CALL populateCodeDate
(0,'DeafnessE','diagnoses2dataset',0,'15188001',null,null,null,'N');
CALL populateCodeDate
(0,'HearingAidE','diagnoses2dataset',0,'365240006',null,null,null,'N');

The procedure accepts a list of input parameters (this may vary depending on requirements):

Parameter Data Type Description
filterType INT 1 latest, 0 earliest, 2 ever, 3 pivot
col VARCHAR(100) the root of the column name
datasetTable VARCHAR(100) table name of dataset
reset BIT 1 reset, 0 no reset of the dataset table
codesToAdd1 VARCHAR(5000) include all snomed parents and their children
codesToAdd2 VARCHAR(5000) include snomed parents only
codesToRemove3 VARCHAR(5000) exclude snomed parents and their children
codesToRemove4 VARCHAR(5000) exclude only snomed parents
ignorenulls VARCHAR(1) Y or N - whether to ignore null result value

A typical populate procedure is outlined below. (Please note this is only an example and may need modifications to it depending on your requirements).

DROP PROCEDURE IF EXISTS populateCodeDate;

DELIMITER //
CREATE PROCEDURE populateCodeDate (
IN filterType     INT, -- 1 latest, 0 earliest, 2 ever, 3 pivot
IN col            VARCHAR(100), -- the root of the column name
IN datasetTable   VARCHAR(100), -- table name of dataset
IN reset          BIT, -- 1 reset, 0 no reset
IN codesToAdd1    VARCHAR(5000), -- all parents and their children
IN codesToAdd2    VARCHAR(5000), -- parents only
IN codesToRemove3 VARCHAR(5000), -- parents and their children to be excluded
IN codesToRemove4 VARCHAR(5000), -- just parents to be excluded
IN ignorenulls    VARCHAR(1)
)

BEGIN

-- reset snomeds table
DELETE FROM snomeds WHERE cat_id IN (1, 2, 3, 4);
-- reset store table
DELETE FROM store WHERE id IN (1, 2, 3, 4);

-- get snomed concept ids

  IF codesToAdd1 IS NOT NULL THEN
      CALL storeSnomedString(codesToAdd1, 1);
      CALL getAllSnomedsConceptIds (1);
  END IF;

  IF codesToAdd2 IS NOT NULL THEN
      CALL storeSnomedString (codesToAdd2, 2);
      CALL getAllSnomedsConceptIds (2);
  END IF;

  IF codesToRemove3 IS NOT NULL THEN
      CALL storeSnomedString (codesToRemove3, 3);
      CALL getAllSnomedsConceptIds (3);
  END IF;

  IF codesToRemove4 IS NOT NULL THEN
      CALL storeSnomedString (codesToRemove4, 4);
      CALL getAllSnomedsConceptIds (4);
  END IF;

-- filter observation based on the snomed codes

CALL filterObservations(filterType, 1, ignorenulls);

-- reset columns

IF (reset = 1) THEN

  SET @reset_sql = CONCAT('UPDATE ', datasetTable, ' SET ',
  col, "Code = null, ",
  col, "Term = null, ",
  col, "Date = null");

  PREPARE resetStmt FROM @reset_sql;
  EXECUTE resetStmt;
  DEALLOCATE PREPARE resetStmt;

END IF;

  DROP TEMPORARY TABLE IF EXISTS qry_tmp;

  CREATE TEMPORARY TABLE qry_tmp (
      row_id                  INT,
      group_by                VARCHAR(255),
      original_code           VARCHAR(20),
      original_term           VARCHAR(200),
      clinical_effective_date DATE, PRIMARY KEY(row_id)
  ) AS

  SELECT (@row_no := @row_no+1) AS row_id,
         f.group_by,
         f.original_code,
         f.original_term,
         f.clinical_effective_date
  FROM filteredObservations f, (SELECT @row_no := 0) t;

SET @row_id = 0;

WHILE EXISTS (SELECT row_id from qry_tmp
              WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO

  SET @sql = CONCAT('UPDATE ', datasetTable,
             ' d JOIN qry_tmp f ON d.pseudo_id = f.group_by SET ',
             col, "Code = f.original_code, ",
             col, "Term = f.original_term, ",
             col, "Date = date_format(f.clinical_effective_date, '%d/%m/%Y')
             WHERE f.row_id > @row_id AND f.row_id <= @row_id + 1000");
    
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DEALLOCATE PREPARE stmt;

  SET @row_id = @row_id + 1000;

END WHILE;
END//
DELIMITER ;

Retrieving the SNOMED Concept Ids

The procedure accepts a list of SNOMED codes as input parameter.

Fig 6.2 Shows how the SNOMED codes are being retrieved using stored procedures


Calling the storeSnomedString procedure

We call the storeSnomedString procedure to separate the list of SNOMED codes into their individual values and store them into a temporary table:

DROP TABLE IF EXISTS store;

CREATE TABLE store (
  id            INT,
  org_snomed_id BIGINT
);

ALTER TABLE store ADD INDEX store_idx (org_snomed_id);

DROP PROCEDURE IF EXISTS storeSnomedString;

DELIMITER //

CREATE PROCEDURE storeSnomedString (
  IN stringValue VARCHAR(5000),
  IN cat_id      INT
)

BEGIN

  DECLARE front       VARCHAR(5000) DEFAULT NULL;
  DECLARE frontlen    INT           DEFAULT NULL;
  DECLARE TempValue   VARCHAR(5000) DEFAULT NULL;

  processloop:
  LOOP

      IF LENGTH(TRIM(stringValue)) = 0
      OR stringValue IS NULL THEN
        LEAVE process loop;
      END IF;
   
  SET front = SUBSTRING_INDEX(stringValue, ',', 1);
  SET frontlen = LENGTH(front);
  SET TempValue = TRIM(front);

  INSERT INTO store (id, org_snomed_id)
  VALUES (cat_id, CAST(TempValue AS SIGNED));

  SET stringValue = INSERT(stringValue, 1, frontlen + 1, '');

  END LOOP;

END//
DELIMITER ;
Calling the getAllSnomedsConceptIds procedure

We call another procedure getAllSnomedsConceptIdsFromSnomedCodes to loop through the temporary table. For each SNOMED code, we retrieve the corresponding SNOMED concept id.

We store the resultant set of concept ids into another temporary table where it will be used to query observational data.

DROP TABLE IF EXISTS snomeds;

CREATE TABLE snomeds (
  cat_id               INT,
  snomed_concept_id    BIGINT,
  core_concept         INT,
  snomed_code          BIGINT,
  term                 VARCHAR(255)
);

ALTER TABLE snomeds ADD INDEX sno_cpt_idx(snomed_concept_id);
ALTER TABLE snomeds ADD INDEX sno_code_idx(snomed_code);

DROP PROCEDURE IF EXISTS getAllSnomedsConceptIds;

DELIMITER //
CREATE PROCEDURE getAllSnomedsConceptIds(p_cat_id INT)

BEGIN

  DECLARE done           INT;
  DECLARE l_parent_id    BIGINT;
  DECLARE c_get_snomeds  CURSOR FOR SELECT org_snomed_id
                         FROM store WHERE id = p_cat_id;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

  SET done = 0;

       OPEN c_get_snomeds;

       processloop:
       WHILE (done = 0) DO

        FETCH c_get_snomeds INTO l_parent_id;

        IF done = 1 THEN
           LEAVE processloop;
        END IF;

        IF p_cat_id IN (1, 3) THEN

          -- get parent snomed concept id based on the snomed code

          INSERT INTO snomeds
          SELECT
                p_cat_id,
                cm.legacy,
                cm.core,
                c.code,
                c.name
           FROM concept_map cm JOIN concept c ON c.dbid = cm.core
           WHERE c.code = l_parent_id;

           -- get all child snomed concept ids based on the parent snomed code

           INSERT INTO snomeds
           SELECT
                 p_cat_id,
                 cptm.legacy,
                 cptm.core,
                 cpt.code,
                 cpt.name
            FROM concept_tct tc
            JOIN snomeds sn ON sn.core_concept = tc.target
            AND tc.property = '92842'
            JOIN concept_map cptm on cptm.core = tc.source
            AND tc.property = '92842'
            JOIN concept cpt on cpt.dbid = cptm.core
            WHERE sn.snomed_code = l_parent_id;

        ELSE

        -- include or exclude parent snomed

            INSERT INTO snomeds
            SELECT
                  p_cat_id,
                  cm.legacy,
                  cm.core,
                  c.code,
                  c.name
             FROM concept_map cm JOIN concept c ON c.dbid = cm.core
             WHERE c.code = l_parent_id;

        END IF;

       END WHILE processloop;
       CLOSE c_get_snomeds;

       SET done = 0;  

END //
DELIMITER ;

Using the concept_tct table, which is a transitive closure table, we are able to find the corresponding child concept ids based on the parent concept id.

Match the parent against the value in the target column. This will return all the children in the source column.

Retrieving filtered Observations

Using the prepared observations cohort, the next step is to filter out this cohort based on the SNOMED code list and the timeframe, which have been passed in as input parameters from the populate procedure.

Fig 6.3 Shows the processing of the FilterObservations procedure


Calling the filterObservations procedure

By calling the filterObservations procedure, we pass in the following input parameters:

Parameter Data Type Description
filterType INT Timeframe of the analysis. 0 = earliest, 1 = latest, 2 = ever, 3 = pivot around cohort pivot date (6 months), 4 = All since
toCreate INT 1 is the default value - to call createObservationsFromCohort
ignorenulls VARCHAR(1) Y or N - whether to ignore null result value
DROP PROCEDURE IF EXISTS filterObservations;

DELIMITER //

CREATE PROCEDURE filterObservations (
     IN filterType INT,
-- 0 earliest, 1 latest, 2 ever,
-- 3 pivot around cohort pivot date (6 months),
-- 4 allSince
  IN toCreate   INT, -- 1 to createObservationsFromCohortv2
  IN ignorenulls VARCHAR(1) -- Y or N
  )

BEGIN

IF (toCreate = 1) THEN
  CALL createObservationsFromCohort (filterType);
END IF;

DROP TABLE IF EXISTS filteredObservations;

IF (filterType = 0) THEN -- earliest

  IF (ignorenulls = 'Y') THEN  

  CREATE TABLE filteredObservations AS
  SELECT
          ob.id,
          ob.group_by,
          ob.patient_id,
          ob.person_id,
          ob.original_code,
          ob.original_term,
          ob.result_value,
          ob.clinical_effective_date,
          ob.result_value_units,
          ob.age_years,
          ob.rnk
  FROM (
          SELECT
            o.id,
            o.group_by,
            o.patient_id,
            o.person_id,
            o.original_code,
            o.original_term,
            o.result_value,
            o.clinical_effective_date,
            o.result_value_units,
            o.age_years,
            @currank :=
                 IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
            @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          WHERE o.result_value IS NOT NULL
          ORDER BY o.person_id,
                   o.clinical_effective_date ASC,
                   o.id ASC -- earliest
        ) ob
  WHERE ob.rnk = 1;

  ELSE

  CREATE TABLE filteredObservations AS
  SELECT
          ob.id,
          ob.group_by,
          ob.patient_id,
          ob.person_id,
          ob.original_code,
          ob.original_term,
          ob.result_value,
          ob.clinical_effective_date,
          ob.result_value_units,
          ob.age_years,
          ob.rnk
  FROM (
          SELECT
            o.id,
            o.group_by,
            o.patient_id,
            o.person_id,
            o.original_code,
            o.original_term,
            o.result_value,
            o.clinical_effective_date,
            o.result_value_units,
            o.age_years,
            @currank :=
                  IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
            @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          ORDER BY o.person_id,
                   o.clinical_effective_date ASC,
                   o.id ASC -- earliest
        ) ob
  WHERE ob.rnk = 1;

  END IF;

ELSEIF (filterType = 2 OR filterType = 4) THEN -- ever or allSince

  IF (ignorenulls = 'Y') THEN

  CREATE TABLE filteredObservations AS
  SELECT DISTINCT
          mc.group_by,
          mc.patient_id,
          mc.person_id,
          mc.original_code,
          mc.original_term,
          mc.result_value,
          mc.clinical_effective_date,
          mc.result_value_units,
          mc.age_years
  FROM observationsFromCohort mc
  WHERE mc.result_value IS NOT NULL;

  ELSE

  CREATE TABLE filteredObservations AS
  SELECT DISTINCT
          mc.group_by,
          mc.patient_id,
          mc.person_id,
          mc.original_code,
          mc.original_term,
          mc.result_value,
          mc.clinical_effective_date,
          mc.result_value_units,
          mc.age_years
  FROM observationsFromCohort mc;

  END IF;


ELSEIF (filterType = 1 or filterType = 3) THEN -- latest or pivot

IF (ignorenulls = 'Y') THEN

CREATE TABLE filteredObservations as
  SELECT
           ob.id,
           ob.group_by,
           ob.patient_id,
           ob.person_id,
           ob.original_code,
           ob.original_term,
           ob.result_value,
           ob.clinical_effective_date,
           ob.result_value_units,
           ob.age_years,
           ob.rnk
  FROM (
          SELECT
            o.id,
            o.group_by,
            o.patient_id,
            o.person_id,
            o.original_code,
            o.original_term,
            o.result_value,
            o.clinical_effective_date,
            o.result_value_units,
            o.age_years,
            @currank :=
                IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
            @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          WHERE o.result_value IS NOT NULL
          ORDER BY o.person_id,
                   o.clinical_effective_date DESC,
                   o.id DESC -- latest
        ) ob
  WHERE ob.rnk = 1;

ELSE

  CREATE TABLE filteredObservationsV2 as
  SELECT
           ob.id,
           ob.group_by,
           ob.patient_id,
           ob.person_id,
           ob.original_code,
           ob.original_term,
           ob.result_value,
           ob.clinical_effective_date,
           ob.result_value_units,
           ob.age_years,
           ob.rnk
  FROM (
          SELECT
             o.id,
             o.group_by,
             o.patient_id,
             o.person_id,
             o.original_code,
             o.original_term,
             o.result_value,
             o.clinical_effective_date,
             o.result_value_units,
             o.age_years,
             @currank :=
                  IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
             @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          ORDER BY o.person_id,
                   o.clinical_effective_date DESC,
                   o.id DESC -- latest
         ) ob
  WHERE ob.rnk = 1;

END IF;

ELSE

  SIGNAL SQLSTATE '45000'

  SET MESSAGE_TEXT = 'filterType not recognised';

END IF;

END//
DELIMITER ;
Calling the createObservationsFromCohort procedure

From the filterObservations procedure, we call the createobservationsFromCohort procedure, passing in the filter type as the input parameter:

CALL createObservationsFromCohort (filterType);
DROP PROCEDURE IF EXISTS createObservationsFromCohort;

DELIMITER //

CREATE PROCEDURE createObservationsFromCohort (
  IN filterType INT
  -- 0 earliest, 1 latest, 2 ever,
  -- 3 pivot around cohort_patient pivot date (6 months)
)

BEGIN

DROP TEMPORARY TABLE IF EXISTS snomeds_tmp;

CREATE TEMPORARY TABLE snomeds_tmp AS
SELECT cat_id,
      snomed_concept_id
FROM snomeds
WHERE cat_id IN (3, 4);

-- delete from snomeds all snomed_concept ids to be excluded

DELETE t1 FROM snomeds t1 JOIN snomeds_tmp t2
ON t1.snomed_concept_id = t2.snomed_concept_id
WHERE t1.cat_id IN (1, 2);

DROP TABLE IF EXISTS observationsFromCohort;

IF (filterType = 3) THEN

-- pivot over 6 months from pivot date (already set in cohort)

CREATE TABLE observationsFromCohort AS
      SELECT DISTINCT
             o.id,
             o.patient_id,
             o.person_id,
             o.group_by,
             o.clinical_effective_date,
             o.non_core_concept_id AS original_code,
             SUBSTRING(s.term, 1, 200) AS original_term,
             o.result_value,
             o.result_value_units,
             o.age_years
     FROM cohort_observations o
          JOIN snomeds s ON s.snomed_concept_id = o.non_core_concept_id
     WHERE o.clinical_effective_date IS NOT NULL
     AND o.clinical_effective_date
     BETWEEN DATE_SUB(c.pivot_date, INTERVAL 6 MONTH)
     AND DATE_SUB(c.pivot_date, INTERVAL -6 MONTH);

ELSE

-- latest or earliest or ever (so all observations)

CREATE TABLE observationsFromCohort AS
     SELECT DISTINCT
            o.id,
            o.patient_id,
            o.person_id,
            o.group_by,
            o.clinical_effective_date,
            o.non_core_concept_id AS original_code,
            SUBSTRING(s.term, 1, 200) AS original_term,
            o.result_value,
            o.result_value_units,
            o.age_years
     FROM cohort_observations o
          JOIN snomeds s ON s.snomed_concept_id = o.non_core_concept_id
     WHERE o.clinical_effective_date IS NOT NULL;

END IF;

ALTER TABLE observationsFromCohort ADD INDEX obv_pat_idx(patient_id);

END//

DELIMITER ;

This returns a new observations cohort based on the SNOMED code list.

Filter Observations by Timeframe

Using the new observations cohort, we pass in the filter type to retrieve the observations based on the timeframe for the analysis, and using the clinical effective date to determine the time intervals. We store the result set into a temporary table. This is illustrated in the code snippet of the filterObservations procedure below: 

CREATE TABLE filteredObservationsV2 as
  SELECT
          ob.id,
          ob.group_by,
          ob.patient_id,
          ob.person_id,
          ob.original_code,
          ob.original_term,
          ob.result_value,
          ob.clinical_effective_date,
          ob.result_value_units,
          ob.age_years,
          ob.rnk
  FROM (
          SELECT o.id,
                 o.group_by,
                 o.patient_id,
                 o.person_id,
                 o.original_code,
                 o.original_term,
                 o.result_value,
                 o.clinical_effective_date,
                 o.result_value_units,
                 o.age_years,
                 @currank :=
                    IF(@curperson = o.person_id, @currank + 1, 1) AS rnk,
                 @curperson := o.person_id AS cur_person
          FROM observationsFromCohort o,
          (SELECT @currank := 0, @curperson := 0) r
          WHERE o.result_value IS NOT NULL
          ORDER BY o.person_id,
                   o.clinical_effective_date ASC,
                   o.id ASC  -- earliest
        ) ob

  WHERE ob.rnk = 1;

Updating Dataset Table Columns

Using the result set derived from the above step, we can now update the dataset table columns with the appropriate observational values. This is illustrated in the code snippet of the populateCodeDate procedure:

DROP TEMPORARY TABLE IF EXISTS qry_tmp;

  CREATE TEMPORARY TABLE qry_tmp (
      row_id                  INT,
      group_by                VARCHAR(255),
      original_code           VARCHAR(20),
      original_term           VARCHAR(200),
      clinical_effective_date DATE, PRIMARY KEY(row_id)
  ) AS
  SELECT (@row_no := @row_no+1) AS row_id,
         f.group_by,
         f.original_code,
         f.original_term,
         f.clinical_effective_date
  FROM filteredObservationsV2 f, (SELECT @row_no := 0) t;

SET @row_id = 0;

WHILE EXISTS (SELECT row_id from qry_tmp
              WHERE row_id > @row_id AND row_id <= @row_id + 1000) DO

  SET @sql = CONCAT('UPDATE ', datasetTable,
              ' d JOIN qry_tmp f ON d.pseudo_id = f.group_by SET ',
              col, "Code = f.original_code, ",
              col, "Term = f.original_term, ",
              col, "Date = date_format(f.clinical_effective_date, '%d/%m/%Y')
              WHERE f.row_id > @row_id AND f.row_id <= @row_id + 1000");
 
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DEALLOCATE PREPARE stmt;

  SET @row_id = @row_id + 1000;

END WHILE;

To populate medication datasets, we use a similar approach as above.

APPENDIX

Some supplementary material that maybe helpful when developing SQL query, in particular, working with concept ids.

Helper Functions

•          Function to retrieve the description field on the concept table based on the concept id:

DELIMITER //

CREATE FUNCTION get_concept_desc(p_concept_id int)
RETURNS VARCHAR(400)
NOT DETERMINISTIC READS SQL DATA
BEGIN

DECLARE l_description VARCHAR(400);

SELECT description INTO l_description
FROM concept cpt
WHERE cpt.dbid = p_concept_id;

RETURN l_description;

END//
DELIMITER ;

Usage: SELECT get_concept_desc(pat.gender_concept_id) FROM patient pat;

•          Function to retrieve the code field on the concept table based on the concept id:

DROP FUNCTION IF EXISTS get_concept_code;

DELIMITER //

CREATE FUNCTION get_concept_code(p_concept_id int)
RETURNS VARCHAR(20)
NOT DETERMINISTIC READS SQL DATA
BEGIN

DECLARE l_code VARCHAR(20);

SELECT code INTO l_code
FROM concept cpt
WHERE cpt.dbid = p_concept_id;

RETURN l_code;

END//
DELIMITER ;

Usage: SELECT get_concept_code(epoc.registration_type_concept_id)
       FROM episode_of_care epoc;

•          Function to retrieve the scheme field on the concept table based on the concept id:

DROP FUNCTION IF EXISTS get_concept_scheme;

DELIMITER //

CREATE FUNCTION get_concept_scheme(p_concept_id int)
RETURNS VARCHAR(50)
NOT DETERMINISTIC READS SQL DATA
BEGIN

DECLARE l_scheme VARCHAR(50);

SELECT scheme INTO l_scheme
FROM concept cpt
WHERE cpt.dbid = p_concept_id;

RETURN l_scheme;

END//
DELIMITER ;

Usage: SELECT get_concept_scheme(ob.non_core_concept_id) FROM observation ob;

•          Function to retrieve the ods code based on the organization id:

DROP FUNCTION IF EXISTS get_ods;

DELIMITER //

CREATE FUNCTION get_ods(p_org_id bigint)
RETURNS VARCHAR(50)
NOT DETERMINISTIC READS SQL DATA
BEGIN

DECLARE l_ods_code VARCHAR(50);

SELECT ods_code INTO l_ods_code
FROM organization org
WHERE org.id = p_org_id;

RETURN l_ods_code;

END//
DELIMITER ;

Usage: SELECT get_ods(pat.registered_practice_organization_id) FROM patient pat;

•          Function to retrieve the organization name based on the organization id:

DROP FUNCTION IF EXISTS get_org_name;

DELIMITER //

CREATE FUNCTION get_org_name(p_org_id bigint)
RETURNS VARCHAR(255)
NOT DETERMINISTIC READS SQL DATA

BEGIN

DECLARE l_org_name VARCHAR(255);

SELECT name INTO l_org_name
FROM organization org
WHERE org.id = p_org_id;

RETURN l_org_name;

END//
DELIMITER ;

Usage: SELECT get_ods(pat.registered_practice_organization_id) FROM patient pat;

•          Function to retrieve the event type from the description of the concept table:

DROP FUNCTION IF EXISTS get_event_type;

DELIMITER //

CREATE FUNCTION get_event_type(p_description VARCHAR(400))
RETURNS VARCHAR(50)
NOT DETERMINISTIC NO SQL
BEGIN

DECLARE l_event VARCHAR(50);

SELECT substr(p_description,instr(p_description,"(") + 1,
       instr(p_description,")") - instr(p_description,"(") - 1)
       INTO l_event;

RETURN l_event;

END//
DELIMITER ;

Usage: SELECT get_event_type(get_concept_desc(ob.core_concept_id)) event_type
       FROM observation ob;

•          Function to retrieve the latest medication order issue date based on the medication statement id:

DROP FUNCTION IF EXISTS get_lastest_med_issue_date;

DELIMITER //

CREATE FUNCTION get_lastest_med_issue_date (p_medication_stmt_id bigint)
RETURNS DATE
NOT DETERMINISTIC READS SQL DATA
BEGIN

DECLARE l_date DATE;
DECLARE l_rnk  INT;

-- get latest issue date

  SELECT
          med.clinical_effective_date, med.rnk INTO l_date, l_rnk
  FROM (
          SELECT
            mo.id,
            mo.medication_statement_id,
            mo.clinical_effective_date,
            @currank :=
            IF(@curmedstmt = mo.medication_statement_id,@currank + 1,1) AS rnk,
            @curmedstmt := mo.medication_statement_id AS cur_med_stmt
          FROM medication_order mo,(SELECT @currank := 0, @curmedstmt := 0) r
          WHERE mo.medication_statement_id = p_medication_stmt_id
          ORDER BY mo.medication_statement_id DESC,
                   mo.clinical_effective_date DESC,
                   mo.id DESC -- latest
        ) med
  WHERE med.rnk = 1;

RETURN l_date;

END//
DELIMITER ;

Usage: SELECT DATE_FORMAT(get_lastest_med_issue_date(med.id),"%Y%m%d") date_of_last_issue
       FROM medication_statement med;

•          Function to retrieve the earliest medication order issue date based on the medication statement id:

DROP FUNCTION IF EXISTS get_earliest_med_issue_date;

DELIMITER //

CREATE FUNCTION get_earliest_med_issue_date (p_medication_stmt_id bigint)
RETURNS DATE
NOT DETERMINISTIC READS SQL DATA

BEGIN

DECLARE l_date DATE;
DECLARE l_rnk  INT;

-- get earliest issue date

  SELECT
          med.clinical_effective_date, med.rnk INTO l_date, l_rnk
  FROM (
          SELECT
            mo.id,
            mo.medication_statement_id,
            mo.clinical_effective_date,
            @currank :=
            IF(@curmedstmt = mo.medication_statement_id,@currank + 1,1) AS rnk,
            @curmedstmt := mo.medication_statement_id AS cur_med_stmt
          FROM medication_order mo,(SELECT @currank := 0, @curmedstmt := 0) r
          WHERE mo.medication_statement_id = p_medication_stmt_id
          ORDER BY mo.medication_statement_id ASC,
                   mo.clinical_effective_date ASC,
                   mo.id ASC -- earliest
        ) med
  WHERE med.rnk = 1;

RETURN l_date;

END//
DELIMITER ;

Usage: SELECT DATE_FORMAT(get_earliest_med_issue_date(med.id),"%Y%m%d") issue_date
       FROM medication_statement med;

•          Function to retrieve the event log last changed date based on the record id:

DROP FUNCTION IF EXISTS get_event_maxdate;

DELIMITER //

CREATE FUNCTION get_event_maxdate (p_table_id tinyint, p_record_id bigint)
RETURNS DATETIME
READS SQL DATA
BEGIN

DECLARE l_max_date DATETIME;

SELECT  MAX(etlg.dt_change) modified_date INTO l_max_date
FROM event_log etlg
WHERE etlg.table_id = p_table_id AND record_id = p_record_id;

RETURN l_max_date;

END//
DELIMITER ;

Usage: SELECT get_event_maxdate(2,pat.id) FROM patient pat;
       Where 2 is the table id for the patient table.

•          Function to retrieve the event log last changed date for medication order based on the medication statement id:

DROP FUNCTION IF EXISTS get_event_maxdate_medo;

DELIMITER //

CREATE FUNCTION get_event_maxdate_medo (p_table_id tinyint, p_record_id bigint)
RETURNS datetime
READS SQL DATA
BEGIN

DECLARE l_max_date datetime;

SELECT MAX(etlg.dt_change) INTO l_max_date
FROM event_log etlg JOIN medication_order mo
ON mo.medication_statement_id = p_record_id
AND mo.id = etlg.record_id
WHERE etlg.table_id = p_table_id;

RETURN l_max_date;

END//
DELIMITER ;

Usage: SELECT get_event_maxdate_medo (9,med.id) FROM medication_statement med

SQL Statement Examples

Patient Information

SELECT
     pat.id patient_id,
     get_ods(pat.registered_practice_organization_id) source_organization,
     IF (pat.date_of_death IS NULL,
        TIMESTAMPDIFF(YEAR,pat.date_of_birth,CURDATE()),
        TIMESTAMPDIFF(YEAR,pat.date_of_birth,pat.date_of_death)) age,
     get_concept_desc(pat.gender_concept_id) sex,
     DATE_FORMAT(epoc.startregdate,"%Y%m%d") start_regdate,
     get_concept_code(epoc.registration_type_concept_id)  reg_status,
     pat.nhs_number nhsnumber,
     pad.postcode,
     DATE_FORMAT(pat.date_of_birth,"%Y%m%d") date_of_birth,
     epoc.usualgp,
     pat.first_names forename,
     pat.last_name surname,
     get_concept_desc(pat.ethnic_code_concept_id) ethnic_category,
     DATE_FORMAT(pat.date_of_death,"%Y%m%d") date_of_death,
     DATE_FORMAT(epoc.endregdate,"%Y%m%d") end_reg_date,
     get_optout_status(pat.id, pat.registered_practice_organization_id) optout_status,
     pad.address_line_1,
     pad.address_line_2,
     pad.address_line_3,
     pad.address_line_4,
     pad.city,
     NULLIF(GREATEST(COALESCE(get_event_maxdate(2,pat.id),0),
                     COALESCE(get_event_maxdate(20,pad.id),0),
                     COALESCE(get_event_maxdate(6,epoc.id),0)),0) last_modified_date
FROM patient pat
LEFT JOIN patient_address pad ON pat.current_address_id = pad.id AND pat.id = pad.patient_id
LEFT JOIN
  (SELECT
          epc.id,
          epc.organization_id,
          epc.patient_id,
          epc.registration_type_concept_id,
          epc.registration_status_concept_id,
          epc.date_registered startregdate,
          epc.date_registered_end endregdate,
          epc.usual_gp_practitioner_id,
          prac.name usualgp
  FROM episode_of_care epc LEFT JOIN practitioner prac
  ON prac.id = epc.usual_gp_practitioner_id
  AND prac.organization_id = epc.organization_id) epoc
  ON epoc.patient_id = pat.id
  AND epoc.organization_id = pat.registered_practice_organization_id;

Prescription Information

SELECT
     med.id event_id,
     get_ods(med.organization_id) source_organization,
     med.patient_id patient_id,
     pat.nhs_number nhs_number,
     get_concept_desc(med.authorisation_type_concept_id)  issue_type,
     DATE_FORMAT(get_earliest_med_issue_date(med.id),"%Y%m%d") issue_date,
     DATE_FORMAT(get_lastest_med_issue_date(med.id),"%Y%m%d") date_of_last_issue,
     DATE_FORMAT(med.clinical_effective_date,"%Y%m%d") recorded_date,
     get_concept_code(med.core_concept_id) dmdcode,
     med.dose dosage,
     med.quantity_value quantity,
     med.quantity_unit quantity_unit,
     get_concept_desc(med.core_concept_id) dmd_description,
     NULLIF(GREATEST(COALESCE(get_event_maxdate(10,med.id),0),
                     COALESCE(get_event_maxdate_medo(9,med.id),0)),0) last_modified_date
FROM medication_statement med JOIN patient pat ON med.patient_id = pat.id;

Clinical Event Information

SELECT
     ob.id event_id,
     get_ods(ob.organization_id) source_organization,
     ob.patient_id,
     pat.nhs_number nhs_number,
     DATE_FORMAT(ob.clinical_effective_date,"%Y%m%d") effective_date,
     get_concept_code(ob.non_core_concept_id) original_code,
     get_concept_desc(ob.non_core_concept_id) original_term,
     get_concept_desc(get_concept_scheme(ob.non_core_concept_id)) original_code_scheme,
     ob.result_value result_value,
     CASE
     WHEN ob.is_problem = 1 AND ob.is_review = 0 THEN 'N - New'
     WHEN ob.is_problem = 1 AND ob.is_review = 1 THEN 'F - Follow Up'
     ELSE 'O - Other'
     END AS episode_type,
     prac.role_desc hcp_role,
     enc.type encounter_type,
     get_concept_code(ob.core_concept_id) snomed_concept_id,
     get_concept_desc(ob.core_concept_id) snomed_term,
     ob.parent_observation_id parent_event,
     get_ods(enc.organization_id) event_location,
     get_org_name(enc.organization_id) location_name,
     enc.appointment_id appointment_slot_id,
     get_event_type(get_concept_desc(ob.core_concept_id)) event_type,
    NULLIF(GREATEST(COALESCE(get_event_maxdate(11,ob.id),0),
                    COALESCE(get_event_maxdate(5,enc.id),0),
                    COALESCE(get_event_maxdate(13,prac.id),0)),0) last_modified_date
FROM observation ob JOIN patient pat ON ob.patient_id = pat.id
LEFT JOIN practitioner prac ON prac.id = ob.practitioner_id
LEFT JOIN encounter enc ON enc.id = ob.encounter_id;

Patient Registration History Information

SELECT
     get_ods(epoc.organization_id) source_origanization,
     pat.nhs_number,
     get_concept_code(epoc.registration_status_concept_id) reg_status,
     get_concept_desc(epoc.registration_status_concept_id) reg_status_desc,
     get_concept_code(epoc.registration_type_concept_id) registration_type,
     get_concept_desc(epoc.registration_type_concept_id) registration_type_desc,
     DATE_FORMAT(epoc.date_registered,"%Y%m%d") date_registered,
     DATE_FORMAT(epoc.date_registered_end,"%Y%m%d") date_registered_end,
     epoc.id sequence_number,
     NULLIF(COALESCE(get_event_maxdate(6,epoc.id),0),0)  last_modified_date
FROM episode_of_care epoc JOIN patient pat ON epoc.patient_id = pat.id;

Patient Address Information

SELECT
     padr.id address_id,
     padr.address_line_1 address1,
     padr.address_line_2 address2,
     padr.address_line_3 address3,
     padr.address_line_4 address4,
     padr.city address5,
     padr.postcode postcode,
     parn.uprn,
     DATE_FORMAT(padr.start_date,"%Y%m%d") start_date,
     DATE_FORMAT(padr.end_date,"%Y%m%d") end_date,
     padr.lsoa_2001_code,
     padr.lsoa_2011_code,
     padr.msoa_2001_code,
     padr.msoa_2011_code,
     pat.nhs_number,
     parn.abp_address_number matched_address_1,
     parn.abp_address_street matched_address_2,
     parn.abp_address_locality matched_address_3,
     parn.abp_address_town matched_address_4,
     parn.abp_address_postcode matched_postcode,
     parn.classification,
     parn.abp_address_organization business_name,
     parn.match_date uprn_match_date,
     parn.status uprn_status,
     parn.latitude,
     parn.longitude,
     NULLIF(COALESCE(get_event_maxdate(20,padr.id),0),0)  last_modified_date
FROM patient_address padr JOIN patient pat ON padr.patient_id = pat.id
LEFT JOIN patient_address_match parn ON padr.id = parn.id;

Appointment Slot Information

SELECT
     appt.id slot_id,
     get_ods(appt.organization_id) source_origanization,
     appt.schedule_id session_id,
     DATE_FORMAT(appt.start_date,"%Y%m%d %T") start_time,
     appt.planned_duration planned_duration,
     appt.actual_duration actual_duration,
     get_concept_desc(appt.appointment_status_concept_id) status,
     DATE_FORMAT(appt.date_time_sent_in,"%Y%m%d %T") actual_start_time,
     appt.patient_wait wait_time,
     DATE_FORMAT(appt.cancelled_date,"%Y%m%d %T") date_cancelled,
     DATE_FORMAT(appt.date_time_left,"%Y%m%d %T") time_left,
     pat.nhs_number,
     NULLIF(COALESCE(get_event_maxdate(18,appt.id),0),0)  last_modified_date
FROM appointment appt JOIN patient pat ON appt.patient_id = pat.id;

Encounter Information

SELECT
     enc.id encounter_id,
     get_ods(enc.organization_id) source_organization,
     pat.nhs_number,
     DATE_FORMAT(enc.clinical_effective_date,"%Y%m%d %T") encounter_startdate,
     DATE_FORMAT(enc.end_date,"%Y%m%d %T") encounter_enddate,
     get_concept_desc(enc.non_core_concept_id) original_type,
     prac.role_desc hcp_role,
     enc.type encounter_type,
     get_ods(enc.service_provider_organization_id) event_organization,
     enc.appointment_id appointment_slot_id,
    NULLIF(GREATEST(COALESCE(get_event_maxdate(5,enc.id),0),
                    COALESCE(get_event_maxdate(13,prac.id),0)),0) last_modified_date
FROM encounter enc JOIN patient pat ON enc.patient_id = pat.id
LEFT JOIN practitioner prac ON prac.id = enc.practitioner_id;

Organization Information

SELECT
     org.id organization_id,
     org.ods_code,
     org.name organization_name,
     org.type_code,
     org.type_desc,
     org.postcode,
     org.parent_organization_id,
     NULLIF(COALESCE(get_event_maxdate(12,org.id),0),0)  last_modified_date
FROM organization org;

Appointment Session Information

SELECT
     sch.id session_id,
     sch.name session_name,
     get_ods(sch.organization_id) source_organization,
     prac.role_desc hpc_role,
     DATE_FORMAT(sch.start_date,"%Y%m%d %T:%f") start_time,
     sch.type session_type,
     sch.location location,
     NULLIF(GREATEST(COALESCE(get_event_maxdate(17,sch.id),0),
                     COALESCE(get_event_maxdate(13,prac.id),0)),0) last_modified_date
FROM schedule sch JOIN practitioner prac ON prac.id = sch.practitioner_id;