Source code
The source code, available under the Apache 2.0 open source license is hosted on github at
Background and Purpose of ASSIGN
In partnership with researchers at Queen Mary University of London’s Clinical Effectiveness Group, Endeavour Health has developed an address-matching algorithm to link patient health records to geospatial information. Linking people to places can help researchers understand how health is impacted by social and environmental factors, like the characteristics of a household, green space or air pollution. But patient addresses are entered into GP records as free text so the same address can be written in different ways, making data linkage very difficult.
The algorithm, known as ASSIGN (AddreSS MatchInG to Unique Property Reference Numbers), allocates a Unique Property Reference Number (UPRN) to patient records. Every property in the UK already has a UPRN. They are allocated by local authorities and made nationally available by Ordnance Survey. A UPRN gives every address a standardised format, enabling pseudonymised linkage to other sources of data.
ASSIGN compares addresses in the NHS record with the Ordnance Survey's "Address Base Premium" UPRN database, one element at a time, and decides whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations. After rigorous testing and adjustments, ASSIGN correctly matches 98.6% of patient addresses at 38,000 records per minute. It also includes patients’ past addresses, making it possible to study addresses across the life span.
Researchers at Queen Mary are using the pseudonymised UPRNs to study:
· how the health of household members impacts childhood obesity
· whether overcrowded or multi-generational households are at greater risk from Covid-19
· how to support GPs to identify people living in care homes so they can provide more effective care.
As ASSIGN is open source, it is hoped the algorithm will also be used by other researchers to link data, inform policy and improve population health across England.
A paper is published at https://ijpds.org/article/view/1674 which describes the use of the algorithm in a project to allocate UPRNs to GP patient addresses.
Component services
ASSIGN supports two main services
- A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)
- A REST API that systems can use to request a matched address and a UPRN for an address.
UPRN data source
Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.
Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.
That being the case, if there was a service or application that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person.
Application and API Functionality
To access the application the user must either be a user of Discovery, or must create their own account in line with level 1 authentication process. An authenticated user has authority to access the UPRN match user interface. In addition to address matching, if the user wishes to have additional data, the user must have rights to access the OS ABP data, usually via the open Government license.
Instructions of the use of the API from for example, Python is available at https://github.com/endeavourhealth-discovery/ASSIGN/blob/master/documentation/ASSIGN_Api.md
Inputting an address
There are two main functions available to the user via the App or the API:
- Enter an address of some kind and attempt to get a match
- Upload a list of addresses (between 1 and 100,000) and attempt to get a list of matches
An additional flag entered by the user indicates whether to match only on residential properties or include commercial matches.
A match is presented in one of three forms and the user can select either/ or:
- Plain English address : Simply displaying the address fields and their value,together with information
- JSON structure. This is a machine readable structure which includes the same information and can be processed by a computer more easily than plain English
- CSV file. Suitable for importing into Excel (with a note on converting UPRNs to text) or to a database.
The matching details for an address are described in more detail below
Address match response
The following table explains the information returned following a match attempt.
The term 'Candidate address' is the address entered by the user or submitted in the file. The 'authority address' is an address provided by the Ordnance survery address based premium and may be a post office address or a local authority address.
No match responses
No match message | Value | Explanation |
---|---|---|
Address format | Null address lines | No address lines submitted |
Insufficient characters | The address format does not appear to contain enough characters to attempt to match (<9) | |
Post code quality | invalid post code | The post code is an invalid format |
missing post code | Post code is missing. The address may be matched without a post code but normally a post code is necessary even if incorrect in order to simply narrow down the potential matches | |
Out of area | The post code is valid but unrecognised and appears to be out of area It should be noted that there will still be an attempt to match the address without a post code or a partial match on a post code. | |
Matched | false | No match |
Match responses
Whenever there is a positive match, the response includes the qualifier of the match and the pattern of match,
The following information is provided when there is a match.
Field | Value | Description |
---|---|---|
Address format | good | Sufficient characters and if there is a post code it is valid |
See unmatched | The address quality may still be poor (post code etc) even though there is a match | |
Matched | True | There was a match. Note that this may not be an exact match |
UPRN | The unique property reference number provided by the Ordnance survey via the Address Based Premium service | |
Match qualifier | See values in match algorithm article | The qualifier of the match i.e. whether exact or close etc |
Classification | The classification code of the property | |
ClassTerm | The term of the code of the property | |
Algorithm | the code of the algorithm that ended up with the best match. This is a combination of rule order number and sub matching algorithm when applied to a preformatted address. For example 1-match1 indicates the first rule and match1 algorithm. EAn exact case independent match. | |
ABPAddress | Address components as below: | The Address Base Premium address that match to the candidate address |
Flat | flat number | |
Building | name of building | |
Building number | number of building in street | |
Dependent thoroughfare | A sub-street or something that is dependent on the street to get to the address | |
Street | The street the building is on | |
Dependent locality | An area smaller than a localitu | |
Locality | A locality or area within a town or village | |
Town | town | |
Post code | post code | |
Organisation | name of organisation if recorded by ABP | |
Match pattern | see values in article on match algorithm | For the purposes of match pattern reporting the 9 main address fields are rationalised to 5 and the match pattern indicates the approximation of the match and any field manipulation that took polace |
Property classification
Properties are classified by assigning classification codes. Each code has a term and indicates the type of property that the database thinks it is. Some of these properties are obviously residential (e.g. detached house) and some are obviously commercial (e.g. a slaughter house). Some may be both residential or commercial (e.g. a public house). Some properties cannot be either (e.g. advertising hoarding)
Assign flags the property types it thinks are usually residential and this information is used in the algorithm
ASSIGN algorithm
main article UPRN address matching algorithms
the algorithms used to match addresses are described in more detail by the UPRN address matching algorithms article.
Example use of UPRN data
Place of residence information
In this example, the UPRN match to an address has been used to determine a place of residence for a patient at a particular point in time.
Implementation has consisted of creating a user defined function, in SQL a "table valued function" which returns a wealth of information about the place the person is likely to have resided at at a date of event (e.g. the date of onset of a disease or symptom)
In this case the data analyst is seeking the following information, including the quality of the match between the address and the UPRN
Field | Description | Values |
Person | A set of organisational patients with the same NHS number considered to be one person | Identifier |
Patient | A uniquely identified person (identified by NHS number) for a particular organisation | Identifier |
Alive and GMS registered | Whether GMS registered and alive at the event date | GMS registered and alive
Not GMS registered Unclear whether GMS registered/ alive or not |
Address candidate count | The number of addresses that would appear to indicate the patient’s home address at the date of event | Number
Normally 1, but in some cases, there may be more than one that appear as possibilities (see below for optional inclusion) |
Address start date | The address start date according to the GP record. | Date or null |
Address end date | The address end date or null according to the GP record | Date or null |
UPRN | The most valid UPRN for each address recorded in records | https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm
|
RALF | The most valid RALF for each address derived from the UPRN | Pseudonymised number or null |
Address line 1 | Address line as recorded in source system | Text or null |
Address line 2 | Address line as recorded in source system | Text or null |
Address line 3 | Address line as recorded in source system | Text or null |
Address line 4 | Address line as recorded in source system | Text or null |
Town or city | As recorded in source system | Text or null |
Post code | As recorded in source system | Text or null |
LSOA | Lower super output area linked to the post code linked to the post code in the matched UPRN | Text or null |
MSOA | Middle super output area linked to the post code in the matched UPRN address | Text or null |
UPRN match date | The date of match for the UPRN | Date |
UPRN match qualifier | The qualifier for the address match in relation to the UPRN | equivalent, parent, sibling |
EPOCH | AddressBase premium epoch number that generated the match | Number |
Property Classification | The classification of the property according to ABP at the time of match | Classification concept e.g. commercial, residential |
UPRN x coordinate | The X geo coordinate from ABP | Number |
UPTN y coordinate | The Y geo coordinate from ABP | Number |
Match pattern flat | The nature of the match to the flat inferred from the candidate provider address | https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns |
Match pattern flat | The nature of the match to the flat or house name inferred from the candidate provider address | https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns |
Match pattern number | The nature of the match to the house number inferred from the candidate provider address | https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns |
Match pattern street | The nature of the match to the street inferred from the candidate provider address | https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns |
Match pattern post code | The nature of the match to the post code inferred from the candidate provider address | https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns |
Use in SQL query
In order to use this function in query, being in Microsoft SQL, one could use a cross apply.
Example usage of function in SQL
SELECT
covid.patient_id
,covid. nhs_number
,covid.effectiveDate
,por.gms
,por.addressCount
,por.uprn
,por.x
,por.y
FROM
(
SELECT
patient_id
,clinical_effective_date
FROM
observation] o
WHERE
o.core_concept_ IN ('covid-19')
) AS covid
CROSS apply (
SELECT
*
FROM
PlaceAtEventTime (
covid.patient_id
,covid.clinical_effective_date,default
) AS paet
) AS por
Table value function
This is an example implementation of the property as a user defined table value function
Example definition of function in sql
USE [db_lookup]
GO
/****** Object: UserDefinedFunction [gpp].[PlaceAtEventTime_V2_1] Script Date: 14/09/2022 11:45:26 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [gpp].[PlaceAtEventTime_V2_1](@patient_id as bigint, @event_date date, @multiple_addresses int = 0)
RETURNS
@tUPRN TABLE
(
multiple_addresses int,
id bigint,
person_id bigint,
patient_id bigint,
patient_address_id bigint,
gms int,
address_candidate_count int,
address_start_date date,
address_end_date date,
uprn varchar(255),
ralf varchar(255),
address_line_1 varchar(255),
address_line_2 varchar(255),
address_line_3 varchar(255),
address_line_4 varchar(255),
towncity varchar(100),
postcode varchar(50),
lsoa varchar(200),
msoa varchar(200),
uprn_match_date datetime,
uprn_match_qualifier varchar(100),
epoch int,
property_classification varchar(50),
uprn_x float,
uprn_y float,
match_pattern_flat varchar(200),
match_pattern_number varchar(200),
match_pattern_street varchar(200),
match_pattern_post_code varchar(200)
)
AS
BEGIN
DECLARE @gms as varchar(1)
DECLARE @address_count as int
DECLARE @place_of_residence_count as int
DECLARE @person_id as bigint
if (@multiple_addresses <> 0 and @multiple_addresses <> 1) RETURN
select @gms = [gpp].GMS_V2(@patient_id, @event_date)
-- must be currently registered for event date
if @gms <> 1 begin
INSERT INTO @tUPRN(id, person_id, patient_id, patient_address_id, gms, address_candidate_count)
VALUES (null, null, @patient_id, null, @gms, null)
RETURN
end
INSERT INTO @tUPRN (multiple_addresses, id, person_id, patient_id, patient_address_id, gms, address_candidate_count, address_start_date, address_end_date, uprn, ralf,
address_line_1, address_line_2, address_line_3, address_line_4, towncity, postcode, lsoa, msoa, uprn_match_date,
uprn_match_qualifier, epoch, property_classification, uprn_x, uprn_y, match_pattern_flat, match_pattern_number, match_pattern_street, match_pattern_post_code)
SELECT @multiple_addresses, uprn.id, address.person_id, address.patient_id, address.id, @gms, 0, address.start_date, address.end_date, uprn.uprn, uprn.uprn_ralf00, address.address_line_1, address.address_line_2,
address.address_line_3, address.address_line_4, address.city, address.postcode, address.lsoa_2011_code, address.msoa_2011_code, uprn.match_date, uprn.qualifier, uprn.epoch,
uprn.uprn_property_classification, uprn.uprn_xcoordinate, uprn.uprn_ycoordinate, uprn.match_pattern_flat, uprn.match_pattern_number, uprn.match_pattern_street, uprn.match_pattern_postcode
FROM [compass_gp].[dbo].[patient_address] address
LEFT JOIN [compass_gp].[dbo].[patient_address_match] uprn
ON uprn.patient_address_id= address.id
WHERE address.patient_id = @patient_id AND
(address.start_date <= @event_date or address.start_date is null)
AND (address.end_date >= @event_date or address.end_date is null)
AND
uprn.id = (SELECT top 1 id
FROM [compass_gp].[dbo].[patient_address_match] uprn
WHERE uprn.patient_address_id = address.id
ORDER BY uprn.match_date DESC)
if (@multiple_addresses = 0)
begin
select @address_count = count(1) from @tUPRN
if (@address_count > 1)
begin
select @person_id = person_id from @tUPRN
delete from @tUPRN
INSERT INTO @tUPRN(id, person_id, patient_id, patient_address_id, gms, address_candidate_count)
VALUES (null, @person_id, @patient_id, null, @gms, @address_count)
end
end
RETURN
END
gms function
This determines whether a patient is registered for gms services on a particular date
Example definition of gms registered function in sql
USE [db_lookup]
GO
/****** Object: UserDefinedFunction [gpp].[GMS_V2] Script Date: 12/09/2022 21:50:17 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [gpp].[GMS_V2](@nor bigint, @event_date date)
RETURNS VARCHAR
BEGIN
-- declare @event_date as date
-- declare @nor as bigint
-- set @event_date = '2013-12-01'
-- set @nor = 7
declare @var_date_registered as date
declare @var_date_registered_end as date
declare @b as varchar(1)
-- declare @var_date date, @var_dat2 as date = (
Select @var_date_registered=latestreg.date_registered, @var_date_registered_end=latestreg.date_registered_end
From(
Select top 1 patient_id, date_registered, date_registered_end, registration_type_concept_id
From [compass_gp].[dbo].[episode_of_care] e
join [compass_gp].[dbo].[patient]p on p.id=e.patient_id
Where (date_registered <= @event_date and e.patient_id = @nor and (p.date_of_death > @event_date or p.date_of_death is null))
Order by e.date_registered desc) as latestreg
Where (latestreg.registration_type_concept_id=1335267) and (latestreg.date_registered_end is null or (latestreg.date_registered_end > @event_date))
-- print @var_date_registered
-- print @var_date_registered_end
-- 1=Registered and alive, 2=not registered, 3=Unclear
set @b = case
when ((@var_date_registered_end < @var_date_registered)) then 3
when @var_date_registered is not null then 1
else 2
end
return @b
END