ASSIGN- UPRN address match application: Difference between revisions

From Endeavour Knowledge Base
 
(25 intermediate revisions by the same user not shown)
Line 1: Line 1:
The UPRN address matching  application enables a user to match an address to an "official" address, and in the process also provides the Unique Property Reference Number (UPRN) which has been assigned to the official address.
== Source code ==
The source code, available under the Apache 2.0 open source license is hosted on github at


== Background and purpose ==
https://github.com/endeavourhealth-discovery/ASSIGN
It is well established that a person's health is significantly affected by their environment in which they live and work.


People live in flats, houses, or are homeless. People work at home, outside, or in offices or factories.
== Background and Purpose of ASSIGN ==
In partnership with researchers at Queen Mary University of London’s Clinical Effectiveness Group, Endeavour Health has developed an address-matching algorithm to link patient health records to geospatial information. Linking people to places can help researchers understand how health is impacted by social and environmental factors, like the characteristics of a household, green space or air pollution. But patient addresses are entered into GP records as free text so the same address can be written in different ways, making data linkage very difficult.


In order to identify or rectify health issues relating to an environment it is often useful to evaluate the affect of location, type of property, and occupancy, on the people who live and work in them.
The algorithm, known as ASSIGN ('''A'''ddre'''SS''' Match'''I'''n'''G''' to Unique Property Reference '''N'''umbers), allocates a Unique Property Reference Number (UPRN) to patient records. Every property in the UK already has a UPRN. They are allocated by local authorities and made nationally available by Ordnance Survey. A UPRN gives every address a standardised format, enabling pseudonymised linkage to other sources of data.


To perform that evaluation it is of course necessary to know the characteristics of the people who live or work in the property or at a location. The usual way of recording where someone lives is to enter their address. Within the NHS it is common practice to record addresses as provided by the citizen. In doing so, the address is often recorded in a way that is slightly different from another address entry for the same location in another persons record.
ASSIGN compares addresses in the NHS record with the Ordnance Survey's "Address Base Premium" UPRN database, one element at a time, and decides whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations. After rigorous testing and adjustments, ASSIGN correctly matches 98.6% of patient addresses at 38,000 records per minute. It also includes patients’ past addresses, making it possible to study addresses across the life span.


This leaves the first problem of working out who lives in the same household. They may have slightly different address entries, and whilst a human can usually see they are the same, a computer cannot. Even when the different addresses are matched, this still leaves a second problem of finding out exactly where the address is.
Researchers at Queen Mary are using the pseudonymised UPRNs to study:


If various hand entered addresses were to be matched to the one property it would be possible to determine precisely the characteristics of the people living or working at a particular location and property. This matching is labour intensive and problematic.
·        how the health of household members impacts childhood obesity


There are problems with using addresses as a way of linking:
·        whether overcrowded or multi-generational households are at greater risk from Covid-19


# Addresses are recorded by provider organisations in inconsistent ways.
·        how to support GPs to identify people living in care homes so they can provide more effective care.
# Addresses themselves change over time.
# Many properties have more than one address that matches the property. For example the local authority may have one address, whilst the post office may have another.


Luckily, the Ordnance Survey have helped to solve this problem.
As ASSIGN is open source, it is hoped the algorithm will also be used by other researchers to link data, inform policy and improve population health across England.  


A paper is published at  https://ijpds.org/article/view/1674 which describes the use of the algorithm in a project to allocate UPRNs to GP patient addresses.
== Component services ==
ASSIGN supports two main services
# A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)
#[[UPRN address matching API|A REST API]] that systems can use to request a matched address and a UPRN for an address.
== UPRN data source ==
Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.
Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.


Line 28: Line 36:
That being the case, if there was a service or application that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person.
That being the case, if there was a service or application that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person.


Discovery information service supports the mapping of health related addresses to addresses provided by an authoritative organisation, those addresses being a gold standard for pointing to a UPRN. The matching service provides two subtypes of service:
== Application Functionality ==
 
# A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)
#[[UPRN address matching API|A REST API]] that systems can use to request a matched address and a UPRN for that address
 
The remainder of this article discusses the web application
 
== Functionality ==
To access the application the user must either be a user of Discovery, or must create their own account in line with [[Identity Authentication Authorisation#Authentication levels|level 1 authentication]] process. An authenticated user has authority to access the UPRN match user interface. In addition to address matching, if the user wishes to have additional data, the user must have rights to access the OS ABP data, usually via the open Government license.
To access the application the user must either be a user of Discovery, or must create their own account in line with [[Identity Authentication Authorisation#Authentication levels|level 1 authentication]] process. An authenticated user has authority to access the UPRN match user interface. In addition to address matching, if the user wishes to have additional data, the user must have rights to access the OS ABP data, usually via the open Government license.


Line 92: Line 93:


==== Match responses ====
==== Match responses ====
Whenever there is a positive match, the response includes the qualifier of the match and the pattern of match,
The following information is provided when there is a match.
The following information is provided when there is a match.
{| class="wikitable"
{| class="wikitable"
Line 116: Line 120:
|-
|-
|Match qualifier
|Match qualifier
|See values below
|See values in match algorithm article
|The qualifier of the match i.e. whether exact or close etc
|The qualifier of the match i.e. whether exact or close etc
|-
|-
|
|
|Equivalent
|Deprecated. Now Best match. This means that the algorithm thinks that it has found the equivalent entry as being the correct location. This does not mean it is an exact match, only that it thinks that the user 'candidate address' is the same location as the one that is listed below
|-
|
|Best match
|This means that the algorithm thinks that it has found an entry as being the best match and the correct location. This does not mean it is an exact match, only that it thinks that the user 'candidate address' is the same location as the one that is listed below and thinks it is a better match than others.
It may not be the case that it is the best match. Algorithms explain the "best fit" approach which differentiates what the machine thinks is the best match from what a human might think
|-
|
|Best (residential) match
|indicates that the user has attempted to match only on residential properties or those that may be residential or dual use.
|-
|
|Best (+commercial) match
|Indicates that the user has included commercial properties in the match algorithm
|-
|
|Child
|The candidate address is likely to be a child of the authoritative address listed below. For example it might be a flat within the building
|-
|
|Parent
|The candidate address is not specific enough e.g. is a building whereas the authority's address is more detailed
|-
|
|Sibling
|The candidate address is likely to be close to the address listed below e.g next door. i.e. it nearby but not necessarily the exact property. This is useful when using UPRNs for geo location, but if using household addresses it cannot be guaranteed to group households. Care needs to be taken when using this in household grouping.
|-
|-
|Classification
|Classification
Line 205: Line 182:
|-
|-
|Match pattern
|Match pattern
|see values below
|see values in article on match algorithm
|For the purposes of match pattern reporting the 9 main address fields are rationalised to 5. Dependent thoroughfare, dependent locality and locality are merged to street. Whilst town is used as a guide, as it is of little value for matching, it is not included.
|For the purposes of match pattern reporting the 9 main address fields are rationalised to 5 and the match pattern indicates the approximation of the match and any field manipulation that took polace
|}


==== Commercial property vs Residential property matches ====
Properties are classified by assigning classification codes. Each code has a term and indicates the type of property that the database thinks it is. Some of these properties are obviously residential (e.g. detached house) and some are obviously commercial (e.g. a slaughter house). Some may be both residential or commercial (e.g. a public house)


For each of the main 5 fields (flat, building, number, street, postcode) the pattern indicates the degree to which each field is matched and indicates the degree of manipulation or field swapping. A match pattern is built up by one or more of the phrases below i.e. may be more than one manipulation per field.
Assign flags the property types it thinks are usually residential and this information is used in the algorithm in the following way.


Match pattern indicators can be conceptualised as a language grammar with the fields being the subjects, the manipulation of the field being the predicate and the qualifier as the object.
In summary the match algorithm prefers a residential match to a commercial match via the following rules


There are around 12 match terms with around 50 or so theoretical combinations of those terms. For example a candidate field may be dropped to match, and matched as a sibling (ds). Applying these to 5 fields results in the potential of 300 million or so different combinations.
# If there is an exact or equivalent match to a commercial property, and no match to a residential property, the commercial match is returned.
# If there is an approximate match to a commercial property and an approximate match to a residential property, the residential property is returned.
# If there is a residential match and no match to a commercial property, a residential match is returned.
# If there is an approximate match to a commercial property and no match to a residential property the commercial match is returned.
# Addresses that are matched as potential "child matches". e.g. a flat within a property that has a UPRN but where no UPRN exists for the flat, then it must be a residential match i.e. commercial "child" matches are not returned.


However, with the algorithms being determined by plausibility, not mathematics, only a number end up being used, usually around 200 or so across 100,000 addresses. further restrictions on the combination occur due to implausibility of some field swaps. For example, post codes are never swapped with streets. Streets are not moved to numbers (as this would have occurred during the initial address formatting algorithm)
== ASSIGN algorithm ==
''main article'' [[UPRN address matching algorithm|UPRN address matching algorithms]]
 
the algorithms used to match addresses are described in more detail by the [[UPRN address matching algorithm|UPRN address matching algorithms]] article.
 
== Example use of UPRN data ==
 
=== Place of residence information ===
In this example, the UPRN match to an address has been used to determine a place of residence for a patient at a particular point in time.
 
Implementation has consisted of creating a user defined function, in SQL a "table valued function" which returns a wealth of information about the place the person is likely to have resided at at a date of event (e.g. the date of onset of a disease or symptom)
 
In this case the data analyst is seeking the following information, including the quality of the match between the address and the UPRN
 
 
{| class="wikitable"
|Field
|Description
|Values
|-
|Person
|A  set of organisational patients with the same NHS number considered to be one  person
|Identifier
|-
|Patient
|A  uniquely identified person (identified by NHS number) for a particular  organisation
|Identifier
|-
|Alive  and GMS registered
|Whether  GMS registered and alive at the event date
|GMS  registered and alive
 
Not  GMS registered
 
Unclear  whether GMS registered/ alive or not
|-
|Address  candidate count
|The  number of addresses that would appear to indicate the patient’s home address  at the date of event
|Number
 
Normally  1, but in some cases, there may be more than one that appear as possibilities  (see below for optional inclusion)
|-
|Address  start date
|The  address start date according to the GP record.
|Date  or null
|-
|Address  end date
|The  address end date or null according to the GP record
|Date  or null
|-
|UPRN
|The  most valid UPRN for each address recorded in records
|[[UPRN address matching algorithm|https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm]]
<br />
|-
|RALF
|The  most valid RALF for each address derived from the UPRN
|Pseudonymised  number or null
|-
|Address  line 1
|Address  line as recorded in source system
|Text  or null
|-
|Address  line 2
|Address  line as recorded in source system
|Text  or null
|-
|Address  line 3
|Address  line as recorded in source system
|Text  or null
|-
|Address  line 4
|Address  line as recorded in source system
|Text  or null
|-
|-
|&
|Town  or city
|mapped also to
|As  recorded in source system
|indicates a match using more than one candidate field
|Text  or null
|-
|-
|>
|Post  code
|moved to
|As  recorded in source system
|Means that the candidate field was moved to another field to match e.g. number moved to flat
|Text  or null
|-
|-
|<
|LSOA
|moved from
|Lower  super output area linked to the post code linked to the post code in the  matched UPRN
|Means that the candidate field was moved from another field to match on this field
|Text or null
|-
|-
|f
|MSOA
|field merged
|Middle  super output area linked to the post code in the matched UPRN address
|when moved from and to, the fields are then merged to match
|Text  or null
|-
|-
|i
|UPRN  match date
|ABF field ignored
|The  date of match for the UPRN
|ABP field was ignored in order to match i.e. the ABP address contained more precise detail than the candidate but was unnecessary in order to match. This usually means that the candidate field is null
|Date
|-
|-
|d
|UPRN  match qualifier
|Candidate field dropped
|The qualifier for the address match in relation to the UPRN
|The candidate field was dropped in order to match i.. the candidate address has more precise detail than the authority address . The ABP address would probably be null
|equivalent,  parent, sibling
|-
|-
|a
|EPOCH
|Matched as parent
|AddressBase  premium epoch number that generated the match
|The candidate field matched as being at a higher level than the ABP field, for example  flat 6 matching to flat 6a 
|Number
|-
|-
|c
|Property  Classification
|Matched as child
|The classification of the property according to ABP at the time of match
|The candidate field matched as being at a lower level than the ABP field, for example candidate flat 6a, ABP flat 6
|Classification  concept e.g. commercial, residential
|-
|-
|p
|UPRN  x coordinate
|Partial match
|The  X geo coordinate from ABP
|he candidate field was partially matched to the ABP field or vice versa) typically 2 out of 3 words
|Number
|-
|-
|l
|UPTN  y coordinate
|Possible spelling error
|The  Y geo coordinate from ABP
|The candidate field and ABP field were matched using the Levenshtein distance algorithm taking account of mispellings
|Number
|-
|-
|v
|Match  pattern flat
|Level based match
|The nature of the match to the flat inferred from the candidate provider address
|The level of a flat in a building (vertical from the street) was used to create the match e.g. 2b for second floor b
|[[UPRN address matching algorithm#Match%20patterns|https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns]]
|-
|-
|e
|Match  pattern flat
|Equivalent
|The  nature of the match to the flat or house name inferred from the candidate  provider address
|The fields are equivalent, albeit not necessarily spelt the same, using various equivalence lists, word swaps, word drops etc
|[[UPRN address matching algorithm#Match%20patterns|https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns]]
 
|-
|Match  pattern number
|The  nature of the match to the house number inferred from the candidate provider  address
|[[UPRN address matching algorithm#Match%20patterns|https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns]]
|-
|Match  pattern street
|The  nature of the match to the street inferred from the candidate provider  address
|[[UPRN address matching algorithm#Match%20patterns|https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns]]
|-
|Match  pattern post code
|The nature of the match to the post code inferred from the candidate provider  address
|[[UPRN address matching algorithm#Match%20patterns|https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns]]
 
|}
|}
==== Use in SQL query ====
In order to use this function in query, being in Microsoft SQL, one could use a cross apply.
<div class="toccolours mw-collapsible mw-collapsed">
Example usage of function in SQL
<div class="mw-collapsible-content">
<syntaxhighlight lang="sql">
SELECT
        covid.patient_id
        ,covid. nhs_number
        ,covid.effectiveDate
        ,por.gms
        ,por.addressCount
        ,por.uprn
        ,por.x
        ,por.y
    FROM
        (
            SELECT
                    patient_id
                    ,clinical_effective_date
                FROM
                    observation] o
                WHERE
                    o.core_concept_ IN ('covid-19')
        ) AS covid
        CROSS apply (
            SELECT
                    *
                FROM
                    PlaceAtEventTime (
                        covid.patient_id
                        ,covid.clinical_effective_date,default
                    ) AS paet
        ) AS por
</syntaxhighlight>
</div>
</div>
==== Table value function ====
This is an example implementation of the property as a user defined table value function
<div class="toccolours mw-collapsible mw-collapsed">
Example definition of function in sql
<div class="mw-collapsible-content"><syntaxhighlight lang="sql">
USE [db_lookup]
GO
/****** Object:  UserDefinedFunction [gpp].[PlaceAtEventTime_V2_1]    Script Date: 14/09/2022 11:45:26 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [gpp].[PlaceAtEventTime_V2_1](@patient_id as bigint, @event_date date, @multiple_addresses int = 0)
RETURNS
@tUPRN TABLE
(
multiple_addresses int,
id bigint,
person_id bigint,
patient_id bigint,
patient_address_id bigint,
gms int,
address_candidate_count int,
address_start_date date,
address_end_date date,
uprn varchar(255),
ralf varchar(255),
address_line_1 varchar(255),
address_line_2 varchar(255),
address_line_3 varchar(255),
address_line_4 varchar(255),
towncity varchar(100),
postcode varchar(50),
lsoa varchar(200),
msoa varchar(200),
uprn_match_date datetime,
uprn_match_qualifier varchar(100),
epoch int,
property_classification varchar(50),
uprn_x float,
uprn_y float,
match_pattern_flat varchar(200),
match_pattern_number varchar(200),
match_pattern_street varchar(200),
match_pattern_post_code varchar(200)
)
AS
BEGIN
DECLARE @gms as varchar(1)
DECLARE @address_count as int
DECLARE @place_of_residence_count as int
DECLARE @person_id as bigint
if (@multiple_addresses <> 0 and @multiple_addresses <> 1) RETURN
select @gms = [gpp].GMS_V2(@patient_id, @event_date)
-- must be currently registered for event date
if @gms <> 1 begin
INSERT INTO @tUPRN(id, person_id, patient_id, patient_address_id, gms, address_candidate_count)
VALUES (null, null, @patient_id, null, @gms, null)
RETURN
end
INSERT INTO @tUPRN (multiple_addresses, id, person_id, patient_id, patient_address_id, gms, address_candidate_count, address_start_date, address_end_date, uprn, ralf,
address_line_1, address_line_2, address_line_3, address_line_4, towncity, postcode, lsoa, msoa, uprn_match_date,
uprn_match_qualifier, epoch, property_classification, uprn_x, uprn_y, match_pattern_flat, match_pattern_number, match_pattern_street, match_pattern_post_code)
SELECT @multiple_addresses, uprn.id, address.person_id, address.patient_id, address.id, @gms, 0, address.start_date, address.end_date, uprn.uprn, uprn.uprn_ralf00, address.address_line_1, address.address_line_2,
address.address_line_3, address.address_line_4, address.city, address.postcode, address.lsoa_2011_code, address.msoa_2011_code, uprn.match_date, uprn.qualifier, uprn.epoch,
uprn.uprn_property_classification, uprn.uprn_xcoordinate, uprn.uprn_ycoordinate, uprn.match_pattern_flat, uprn.match_pattern_number, uprn.match_pattern_street, uprn.match_pattern_postcode
       
FROM [compass_gp].[dbo].[patient_address] address
LEFT JOIN [compass_gp].[dbo].[patient_address_match] uprn
ON uprn.patient_address_id= address.id
WHERE address.patient_id = @patient_id AND
(address.start_date <= @event_date or address.start_date is null)
AND (address.end_date >= @event_date or address.end_date is null)
AND
uprn.id = (SELECT top 1 id
FROM [compass_gp].[dbo].[patient_address_match] uprn
WHERE uprn.patient_address_id = address.id
ORDER BY uprn.match_date DESC)
if (@multiple_addresses = 0)
begin
select  @address_count = count(1) from @tUPRN
if (@address_count > 1)
begin
select @person_id = person_id from @tUPRN
delete from @tUPRN
INSERT INTO @tUPRN(id, person_id, patient_id, patient_address_id, gms, address_candidate_count)
VALUES (null, @person_id, @patient_id, null, @gms, @address_count)
end
end
RETURN
END
</syntaxhighlight>
</div>
</div>
<br />
<br />


=== Address matching algorithms ===
==== gms function ====
''main article'' [[UPRN address matching algorithm|UPRN address matching algorithms]]
 
This determines whether a patient is registered for gms services on a particular date
 
 
<div class="toccolours mw-collapsible mw-collapsed">
Example definition of gms registered function in sql
 
<div class="mw-collapsible-content"><syntaxhighlight lang="sql">
 
USE [db_lookup]
GO
/****** Object:  UserDefinedFunction [gpp].[GMS_V2]    Script Date: 12/09/2022 21:50:17 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
 
ALTER FUNCTION [gpp].[GMS_V2](@nor bigint, @event_date date)
RETURNS VARCHAR
 
BEGIN
-- declare @event_date as date
-- declare @nor as bigint
-- set @event_date = '2013-12-01'
-- set @nor = 7
 
declare @var_date_registered as date
declare @var_date_registered_end as date
declare @b as varchar(1)
 
-- declare @var_date date, @var_dat2 as date = (
Select @var_date_registered=latestreg.date_registered, @var_date_registered_end=latestreg.date_registered_end
From(
Select top 1  patient_id, date_registered, date_registered_end, registration_type_concept_id
From [compass_gp].[dbo].[episode_of_care] e
join [compass_gp].[dbo].[patient]p on p.id=e.patient_id
Where (date_registered <= @event_date and e.patient_id = @nor and (p.date_of_death > @event_date or p.date_of_death is null))
Order by e.date_registered desc) as latestreg
Where (latestreg.registration_type_concept_id=1335267) and (latestreg.date_registered_end is null or (latestreg.date_registered_end > @event_date))
 
-- print @var_date_registered
-- print @var_date_registered_end
 
-- 1=Registered and alive, 2=not registered, 3=Unclear
set @b = case
when ((@var_date_registered_end < @var_date_registered)) then 3
when @var_date_registered is not null then 1
else 2
end
 
return @b
END
 


Address matching is surprisingly difficult, and the algorithms used to match addresses are described in more detail by the [[UPRN address matching algorithm|UPRN address matching algorithms]] article.
</syntaxhighlight>
</div>
</div>

Latest revision as of 15:43, 1 August 2023

Source code

The source code, available under the Apache 2.0 open source license is hosted on github at

https://github.com/endeavourhealth-discovery/ASSIGN

Background and Purpose of ASSIGN

In partnership with researchers at Queen Mary University of London’s Clinical Effectiveness Group, Endeavour Health has developed an address-matching algorithm to link patient health records to geospatial information. Linking people to places can help researchers understand how health is impacted by social and environmental factors, like the characteristics of a household, green space or air pollution. But patient addresses are entered into GP records as free text so the same address can be written in different ways, making data linkage very difficult.

The algorithm, known as ASSIGN (AddreSS MatchInG to Unique Property Reference Numbers), allocates a Unique Property Reference Number (UPRN) to patient records. Every property in the UK already has a UPRN. They are allocated by local authorities and made nationally available by Ordnance Survey. A UPRN gives every address a standardised format, enabling pseudonymised linkage to other sources of data.

ASSIGN compares addresses in the NHS record with the Ordnance Survey's "Address Base Premium" UPRN database, one element at a time, and decides whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations. After rigorous testing and adjustments, ASSIGN correctly matches 98.6% of patient addresses at 38,000 records per minute. It also includes patients’ past addresses, making it possible to study addresses across the life span.

Researchers at Queen Mary are using the pseudonymised UPRNs to study:

·        how the health of household members impacts childhood obesity

·        whether overcrowded or multi-generational households are at greater risk from Covid-19

·        how to support GPs to identify people living in care homes so they can provide more effective care.

As ASSIGN is open source, it is hoped the algorithm will also be used by other researchers to link data, inform policy and improve population health across England.

A paper is published at https://ijpds.org/article/view/1674 which describes the use of the algorithm in a project to allocate UPRNs to GP patient addresses.

Component services

ASSIGN supports two main services

  1. A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)
  2. A REST API that systems can use to request a matched address and a UPRN for an address.

UPRN data source

Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.

Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.

That being the case, if there was a service or application that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person.

Application Functionality

To access the application the user must either be a user of Discovery, or must create their own account in line with level 1 authentication process. An authenticated user has authority to access the UPRN match user interface. In addition to address matching, if the user wishes to have additional data, the user must have rights to access the OS ABP data, usually via the open Government license.

Inputting an address

There are two main functions available to the user:

  1. Enter an address of some kind and attempt to get a match
  2. Upload a list of addresses (between 1 and 100,000) and attempt to get a list of matches

An additional flag entered by the user indicates whether to match only on residential properties or include commercial matches.

A match is presented in one of three forms and the user can select either/ or:

  • Plain English address : Simply displaying the address fields and their value,together with information
  • JSON structure. This is a machine readable structure which includes the same information and can be processed by a computer more easily than plain English
  • CSV file. Suitable for importing into Excel (with a note on converting UPRNs to text) or to a database.

The matching details for an address are described in more detail below

Address match response

The following table explains the information returned following a match attempt.

The term 'Candidate address' is the address entered by the user or submitted in the file. The 'authority address' is an address provided by the Ordnance survery address based premium and may be a post office address or a local authority address.

No match responses

No match message Value Explanation
Address format Null address lines No address lines submitted
Insufficient characters The address format does not appear to contain enough characters to attempt to match (<9)
Post code quality invalid post code The post code is an invalid format
missing post code Post code is missing. The address may be matched without a post code but normally a post code is necessary even if incorrect in order to simply narrow down the potential matches
Out of area The post code is valid but unrecognised and appears to be out of area It should be noted that there will still be an attempt to match the address without a post code or a partial match on a post code.
Matched false No match

Match responses

Whenever there is a positive match, the response includes the qualifier of the match and the pattern of match,


The following information is provided when there is a match.

Field Value Description
Address format good Sufficient characters and if there is a post code it is valid
See unmatched The address quality may still be poor (post code etc) even though there is a match
Matched True There was a match. Note that this may not be an exact match
UPRN The unique property reference number provided by the Ordnance survey via the Address Based Premium service
Match qualifier See values in match algorithm article The qualifier of the match i.e. whether exact or close etc
Classification The classification code of the property
ClassTerm The term of the code of the property
Algorithm the code of the algorithm that ended up with the best match. For example 1-match1 indicates an exact match
ABPAddress Address components as below: The Address Base Premium address that match to the candidate address
Flat flat number
Building name of building
Building number number of building in street
Dependent thoroughfare A sub-street or something that is dependent on the street to get to the address
Street The street the building is on
Dependent locality An area smaller than a localitu
Locality A locality or area within a town or village
Town town
Post code post code
Organisation name of organisation if recorded by ABP
Match pattern see values in article on match algorithm For the purposes of match pattern reporting the 9 main address fields are rationalised to 5 and the match pattern indicates the approximation of the match and any field manipulation that took polace

Commercial property vs Residential property matches

Properties are classified by assigning classification codes. Each code has a term and indicates the type of property that the database thinks it is. Some of these properties are obviously residential (e.g. detached house) and some are obviously commercial (e.g. a slaughter house). Some may be both residential or commercial (e.g. a public house)

Assign flags the property types it thinks are usually residential and this information is used in the algorithm in the following way.

In summary the match algorithm prefers a residential match to a commercial match via the following rules

  1. If there is an exact or equivalent match to a commercial property, and no match to a residential property, the commercial match is returned.
  2. If there is an approximate match to a commercial property and an approximate match to a residential property, the residential property is returned.
  3. If there is a residential match and no match to a commercial property, a residential match is returned.
  4. If there is an approximate match to a commercial property and no match to a residential property the commercial match is returned.
  5. Addresses that are matched as potential "child matches". e.g. a flat within a property that has a UPRN but where no UPRN exists for the flat, then it must be a residential match i.e. commercial "child" matches are not returned.

ASSIGN algorithm

main article UPRN address matching algorithms

the algorithms used to match addresses are described in more detail by the UPRN address matching algorithms article.

Example use of UPRN data

Place of residence information

In this example, the UPRN match to an address has been used to determine a place of residence for a patient at a particular point in time.

Implementation has consisted of creating a user defined function, in SQL a "table valued function" which returns a wealth of information about the place the person is likely to have resided at at a date of event (e.g. the date of onset of a disease or symptom)

In this case the data analyst is seeking the following information, including the quality of the match between the address and the UPRN


Field Description Values
Person A set of organisational patients with the same NHS number considered to be one person Identifier
Patient A uniquely identified person (identified by NHS number) for a particular organisation Identifier
Alive and GMS registered Whether GMS registered and alive at the event date GMS registered and alive

Not GMS registered

Unclear whether GMS registered/ alive or not

Address candidate count The number of addresses that would appear to indicate the patient’s home address at the date of event Number

Normally 1, but in some cases, there may be more than one that appear as possibilities (see below for optional inclusion)

Address start date The address start date according to the GP record. Date or null
Address end date The address end date or null according to the GP record Date or null
UPRN The most valid UPRN for each address recorded in records https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm


RALF The most valid RALF for each address derived from the UPRN Pseudonymised number or null
Address line 1 Address line as recorded in source system Text or null
Address line 2 Address line as recorded in source system Text or null
Address line 3 Address line as recorded in source system Text or null
Address line 4 Address line as recorded in source system Text or null
Town or city As recorded in source system Text or null
Post code As recorded in source system Text or null
LSOA Lower super output area linked to the post code linked to the post code in the matched UPRN Text or null
MSOA Middle super output area linked to the post code in the matched UPRN address Text or null
UPRN match date The date of match for the UPRN Date
UPRN match qualifier The qualifier for the address match in relation to the UPRN equivalent, parent, sibling
EPOCH AddressBase premium epoch number that generated the match Number
Property Classification The classification of the property according to ABP at the time of match Classification concept e.g. commercial, residential
UPRN x coordinate The X geo coordinate from ABP Number
UPTN y coordinate The Y geo coordinate from ABP Number
Match pattern flat The nature of the match to the flat inferred from the candidate provider address https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns
Match pattern flat The nature of the match to the flat or house name inferred from the candidate provider address https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns
Match pattern number The nature of the match to the house number inferred from the candidate provider address https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns
Match pattern street The nature of the match to the street inferred from the candidate provider address https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns
Match pattern post code The nature of the match to the post code inferred from the candidate provider address https://wiki.endeavourhealth.org/index.php?title=UPRN_address_matching_algorithm#Match_patterns

Use in SQL query

In order to use this function in query, being in Microsoft SQL, one could use a cross apply.

Example usage of function in SQL

SELECT
        covid.patient_id
        ,covid. nhs_number
        ,covid.effectiveDate
        ,por.gms
        ,por.addressCount
        ,por.uprn
        ,por.x
        ,por.y
    FROM
        (
            SELECT
                    patient_id
                    ,clinical_effective_date
                FROM
                    observation] o
                WHERE
                    o.core_concept_ IN ('covid-19')
        ) AS covid 
        CROSS apply (
            SELECT
                    *
                FROM
                    PlaceAtEventTime (
                        covid.patient_id
                        ,covid.clinical_effective_date,default
                    ) AS paet
        ) AS por

Table value function

This is an example implementation of the property as a user defined table value function

Example definition of function in sql

USE [db_lookup]
GO
/****** Object:  UserDefinedFunction [gpp].[PlaceAtEventTime_V2_1]    Script Date: 14/09/2022 11:45:26 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER FUNCTION [gpp].[PlaceAtEventTime_V2_1](@patient_id as bigint, @event_date date, @multiple_addresses int = 0)
RETURNS
@tUPRN TABLE
(
	multiple_addresses int,
	id bigint,
	person_id bigint,
	patient_id bigint,
	patient_address_id bigint,
	gms int,
	address_candidate_count int,
	address_start_date date,
	address_end_date date,
	uprn varchar(255),
	ralf varchar(255),
	address_line_1 varchar(255),
	address_line_2 varchar(255),
	address_line_3 varchar(255),
	address_line_4 varchar(255),
	towncity varchar(100),
	postcode varchar(50),
	lsoa varchar(200),
	msoa varchar(200),
	uprn_match_date datetime,
	uprn_match_qualifier varchar(100),
	epoch int,
	property_classification varchar(50),
	uprn_x float,
	uprn_y float,
	match_pattern_flat varchar(200),
	match_pattern_number varchar(200),
	match_pattern_street varchar(200),
	match_pattern_post_code varchar(200)
)
AS
BEGIN
	DECLARE @gms as varchar(1)
	DECLARE @address_count as int
	DECLARE @place_of_residence_count as int
	DECLARE @person_id as bigint

	if (@multiple_addresses <> 0 and @multiple_addresses <> 1) RETURN

	select @gms = [gpp].GMS_V2(@patient_id, @event_date)
	-- must be currently registered for event date
	if @gms <> 1 begin
		INSERT INTO @tUPRN(id, person_id, patient_id, patient_address_id, gms, address_candidate_count)
			VALUES (null, null, @patient_id, null, @gms, null)
		RETURN
	end

	INSERT INTO @tUPRN (multiple_addresses, id, person_id, patient_id, patient_address_id, gms, address_candidate_count, address_start_date, address_end_date, uprn, ralf,
		address_line_1, address_line_2, address_line_3, address_line_4, towncity, postcode, lsoa, msoa, uprn_match_date,
		uprn_match_qualifier, epoch, property_classification, uprn_x, uprn_y, match_pattern_flat, match_pattern_number, match_pattern_street, match_pattern_post_code)

	SELECT @multiple_addresses, uprn.id, address.person_id, address.patient_id, address.id, @gms, 0, address.start_date, address.end_date, uprn.uprn, uprn.uprn_ralf00, address.address_line_1, address.address_line_2,
		address.address_line_3, address.address_line_4, address.city, address.postcode, address.lsoa_2011_code, address.msoa_2011_code, uprn.match_date, uprn.qualifier, uprn.epoch,
		uprn.uprn_property_classification, uprn.uprn_xcoordinate, uprn.uprn_ycoordinate, uprn.match_pattern_flat, uprn.match_pattern_number, uprn.match_pattern_street, uprn.match_pattern_postcode
        
	FROM [compass_gp].[dbo].[patient_address] address

	LEFT JOIN [compass_gp].[dbo].[patient_address_match] uprn
	ON uprn.patient_address_id= address.id

	WHERE address.patient_id = @patient_id AND 
	(address.start_date <= @event_date or address.start_date is null)
	AND (address.end_date >= @event_date or address.end_date is null)
	AND
	uprn.id = (SELECT top 1 id 
		FROM [compass_gp].[dbo].[patient_address_match] uprn
	WHERE uprn.patient_address_id = address.id
	ORDER BY uprn.match_date DESC)

	if (@multiple_addresses = 0)
	begin
		select  @address_count = count(1) from @tUPRN
		if (@address_count > 1)
		begin
			select @person_id = person_id from @tUPRN
			delete from @tUPRN
			INSERT INTO @tUPRN(id, person_id, patient_id, patient_address_id, gms, address_candidate_count)
				VALUES (null, @person_id, @patient_id, null, @gms, @address_count)
		end
	end

	RETURN
END


gms function

This determines whether a patient is registered for gms services on a particular date


Example definition of gms registered function in sql

USE [db_lookup]
GO
/****** Object:  UserDefinedFunction [gpp].[GMS_V2]    Script Date: 12/09/2022 21:50:17 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER FUNCTION [gpp].[GMS_V2](@nor bigint, @event_date date)
RETURNS VARCHAR

BEGIN
	-- declare @event_date as date
	-- declare @nor as bigint
	
	-- set @event_date = '2013-12-01'
	-- set @nor = 7

	declare @var_date_registered as date
	declare @var_date_registered_end as date
	declare @b as varchar(1)

	-- declare @var_date date, @var_dat2 as date = (
	Select @var_date_registered=latestreg.date_registered, @var_date_registered_end=latestreg.date_registered_end
	From(
	Select top 1  patient_id, date_registered, date_registered_end, registration_type_concept_id
	From [compass_gp].[dbo].[episode_of_care] e
	join [compass_gp].[dbo].[patient]p on p.id=e.patient_id
	Where (date_registered <= @event_date and e.patient_id = @nor and (p.date_of_death > @event_date or p.date_of_death is null))
	Order by e.date_registered desc) as latestreg
	Where (latestreg.registration_type_concept_id=1335267) and (latestreg.date_registered_end is null or (latestreg.date_registered_end > @event_date))

	-- print @var_date_registered
	-- print @var_date_registered_end

	-- 1=Registered and alive, 2=not registered, 3=Unclear
	set @b = case
		when ((@var_date_registered_end < @var_date_registered)) then 3
		when @var_date_registered is not null then 1
		else 2
	end

	return @b
END