Difference between revisions of "ASSIGN- UPRN address match application"

From Discovery Data Service
Jump to navigation Jump to search
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
The UPRN address matching application that allows a user to match one or more addresses from a systems address file to an authoritative address, and to allocate a Unique Property Reference Number (UPRN) for the location of that address.
+
== Background and Purpose of ASSIGN ==
 +
In partnership with researchers at Queen Mary University of London’s Clinical Effectiveness Group, Endeavour Health has developed an address-matching algorithm to link patient health records to geospatial information. Linking people to places can help researchers understand how health is impacted by social and environmental factors, like the characteristics of a household, green space or air pollution. But patient addresses are entered into GP records as free text so the same address can be written in different ways, making data linkage very difficult.
  
== Background and purpose ==
+
The algorithm, known as ASSIGN ('''A'''ddre'''SS''' Match'''I'''n'''G''' to Unique Property Reference '''N'''umbers), allocates a Unique Property Reference Number (UPRN) to patient records. Every property in the UK already has a UPRN. They are allocated by local authorities and made nationally available by Ordnance Survey. A UPRN gives every address a standardised format, enabling pseudonymised linkage to other sources of data.
The background to the objective of matching addresses and UPRN is as follows:
 
  
It is well established that a person's health is significantly affected by their environment in which they live and work.
+
ASSIGN compares addresses in the NHS record with the Ordnance Survey's "Address Base Premium" UPRN database, one element at a time, and decides whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations. After rigorous testing and adjustments, ASSIGN correctly matches 98.6% of patient addresses at 38,000 records per minute. It also includes patients’ past addresses, making it possible to study addresses across the life span.
  
People live in flats, houses, or are homeless. People work at home, outside, or in offices or factories.
+
Researchers at Queen Mary are using the pseudonymised UPRNs to study:
  
In order to identify or rectify health issues relating to environments it is necessary to evaluate the affect of location, type of property, and occupancy, on the people who live and work in them.
+
·        how the health of household members impacts childhood obesity
  
In order to measure the effects on health it is necessary to link the health records of people to the properties in which they live and work.
+
·        whether overcrowded or multi-generational households are at greater risk from Covid-19
  
In order to create that link it is necessary to identify the relevant property and the usual way in which a property is identified is via the person's home or work address.
+
·        how to support GPs to identify people living in care homes so they can provide more effective care.
  
There are problems with using addresses as a way of linking:
+
As ASSIGN is open source, it is hoped the algorithm will also be used by other researchers to link data, inform policy and improve population health across England.
  
# Addresses are recorded by provider organisations in inconsistent ways.
+
== Component services ==
# Addresses themselves change over time.
+
ASSIGN supports two main services
# Many properties have more than one address that matches the property. For example the local authority may have one address, whilst the post office may have another.
 
  
One way of resolving this problem is to assign a property identifier to an actual location and link the various addresses to it. When this is done, then by linking a person's health record to this property identifier, issues relating to the property, location and occupancy can be studied, problems can be acted on, and lives will be saved.
+
# A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)
 +
#[[UPRN address matching API|A REST API]] that systems can use to request a matched address and a UPRN for an address.
  
 +
== UPRN data source ==
 
Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.
 
Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.
  
 
Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.
 
Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.
  
That being the case, if there was a service that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person. This will save lives.
+
That being the case, if there was a service or application that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person.
  
Discovery information model supports the mapping of health related addresses to addresses provided by an authoritative organisation, those addresses being a gold standard for pointing to a UPRN. The matching service provides two subtypes of service:
+
== Application Functionality ==
 
+
To access the application the user must either be a user of Discovery, or must create their own account in line with [[Identity Authentication Authorisation#Authentication levels|level 1 authentication]] process. An authenticated user has authority to access the UPRN match user interface. In addition to address matching, if the user wishes to have additional data, the user must have rights to access the OS ABP data, usually via the open Government license.
# A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)
 
#[[UPRN address matching API|A REST API]] that systems can use to request a matched address and a UPRN for that address
 
 
 
== Functionality ==
 
To access the application the user must either be a user of Discovery, or must create their own account in line with [[Identity Authentication Authorisation#Authentication levels|level 1 authentication]] process. An authenticated user has authority to access the UPRN match user interface.
 
  
 
=== Inputting an address ===
 
=== Inputting an address ===
There are two main functions available to the user
+
There are two main functions available to the user:
  
 
# Enter an address of some kind and attempt to get a match
 
# Enter an address of some kind and attempt to get a match
# Upload a list of addresses (between 1 and 100,000) and attempt to get a  list of matched entries
+
# Upload a list of addresses (between 1 and 100,000) and attempt to get a  list of matches
 +
 
 +
An additional flag entered by the user indicates whether to match only on residential properties or include commercial matches.
  
 
A match is presented in one of three forms and the user can select either/ or:
 
A match is presented in one of three forms and the user can select either/ or:
Line 48: Line 46:
 
* CSV file. Suitable for importing into Excel (with a note on converting UPRNs to text) or to a database.
 
* CSV file. Suitable for importing into Excel (with a note on converting UPRNs to text) or to a database.
  
The matching details for an address are described in more detail  
+
The matching details for an address are described in more detail below
 +
 
 +
=== Address match response ===
 +
The following table explains the information returned following a match attempt.
  
=== Address matching information ===
+
The term 'Candidate address' is the address entered by the user or submitted in the file. The 'authority address' is an address provided by the Ordnance survery address based premium and may be a post office address or a local authority address.
The following table explains the information returned following a no match
+
 
 +
==== No match responses ====
 
{| class="wikitable"
 
{| class="wikitable"
 
|+
 
|+
Line 59: Line 61:
 
|-
 
|-
 
|style width=150px; | Address format
 
|style width=150px; | Address format
|Poor
+
|Null address lines
|The address format does not appear to contain enough information to attempt to match
+
|No address lines submitted
 +
|-
 +
|
 +
|Insufficient characters
 +
|The address format does not appear to contain enough characters to attempt to match (<9)
 
|-
 
|-
 
|Post code quality
 
|Post code quality
|invalid
+
|invalid post code
|The post code is an invalid format  
+
|The post code is an invalid format
 
|-
 
|-
|Post code quality
+
|
 +
|missing post code
 +
|Post code is missing. The address may be matched without a post code but normally a post code is necessary even if incorrect in order to simply narrow down the potential matches
 +
|-
 +
|
 
|Out of area
 
|Out of area
|The post code is valid but unrecognised and appears to be out of area It should be noted that there will still be an attempt to match the address without a post code or a partial match on a post code.  
+
|The post code is valid but unrecognised and appears to be out of area It should be noted that there will still be an attempt to match the address without a post code or a partial match on a post code.
 +
|-
 +
|Matched
 +
|false
 +
|No match
 
|}
 
|}
The following information is provided when their is a match
+
 
 +
==== Match responses ====
 +
Whenever there is a positive match, the response includes the qualifier of the match and the pattern of match,
 +
 
 +
 
 +
The following information is provided when there is a match.
 
{| class="wikitable"
 
{| class="wikitable"
 
|+
 
|+
Line 76: Line 95:
 
!Value
 
!Value
 
!Description
 
!Description
 +
|-
 +
|style width=100px; |Address format
 +
|style width=100 px; |good
 +
|Sufficient characters and if there is a post code it is valid
 
|-
 
|-
 
|
 
|
 +
|See unmatched
 +
|The address quality may still be poor (post code etc) even though there is a match
 +
|-
 +
|Matched
 +
|True
 +
|There was a match. Note that this may not be an exact match
 +
|-
 +
|UPRN
 
|
 
|
 +
|The unique property reference number provided by the Ordnance survey via the Address Based Premium service
 +
|-
 +
|Match qualifier
 +
|See values in match algorithm article
 +
|The qualifier of the match i.e. whether exact or close etc
 +
|-
 
|
 
|
 
|-
 
|-
 +
|Classification
 
|
 
|
 +
|The classification code of the property
 +
|-
 +
|ClassTerm
 
|
 
|
 +
|The term of the code of the property
 +
|-
 +
|Algorithm
 +
|
 +
|the code of the algorithm that ended up with the best match. For example 1-match1  indicates an exact match
 +
|-
 +
|ABPAddress
 +
|Address components as below:
 +
|The Address Base Premium address that match to the candidate address
 +
|-
 +
|
 +
|Flat
 +
|flat number
 +
|-
 +
|
 +
|Building
 +
|name of building
 +
|-
 +
|
 +
|Building number
 +
|number of building in street
 +
|-
 +
|
 +
|Dependent thoroughfare
 +
|A sub-street or something that is dependent on the street to get to the address
 +
|-
 +
|
 +
|Street
 +
|The street the building is on
 +
|-
 +
|
 +
|Dependent locality
 +
|An area smaller than a localitu
 +
|-
 
|
 
|
 +
|Locality
 +
|A locality or area within a town or village
 
|-
 
|-
 
|
 
|
 +
|Town
 +
|town
 +
|-
 
|
 
|
 +
|Post code
 +
|post code
 +
|-
 
|
 
|
 +
|Organisation
 +
|name of organisation if recorded by ABP
 +
|-
 +
|Match pattern
 +
|see values in article on match algorithm
 +
|For the purposes of match pattern reporting the 9 main address fields are rationalised to 5 and the match pattern indicates the approximation of the match and any field manipulation that took polace
 
|}
 
|}
 
<br />
 
<br />
==== Address matching algorithms[<nowiki/>[[Information model service&veaction=edit&section=7|edit]] | [[Information model service&action=edit&section=7|edit source]]] ====
+
 
 +
== ASSIGN algorithm ==
 
''main article'' [[UPRN address matching algorithm|UPRN address matching algorithms]]
 
''main article'' [[UPRN address matching algorithm|UPRN address matching algorithms]]
  
Address matching is surprisingly difficult, and the algorithms used to match addresses are described in more detail by the [[UPRN address matching algorithm|UPRN address matching algorithms]] article.
+
the algorithms used to match addresses are described in more detail by the [[UPRN address matching algorithm|UPRN address matching algorithms]] article.

Latest revision as of 09:44, 10 November 2021

Background and Purpose of ASSIGN

In partnership with researchers at Queen Mary University of London’s Clinical Effectiveness Group, Endeavour Health has developed an address-matching algorithm to link patient health records to geospatial information. Linking people to places can help researchers understand how health is impacted by social and environmental factors, like the characteristics of a household, green space or air pollution. But patient addresses are entered into GP records as free text so the same address can be written in different ways, making data linkage very difficult.

The algorithm, known as ASSIGN (AddreSS MatchInG to Unique Property Reference Numbers), allocates a Unique Property Reference Number (UPRN) to patient records. Every property in the UK already has a UPRN. They are allocated by local authorities and made nationally available by Ordnance Survey. A UPRN gives every address a standardised format, enabling pseudonymised linkage to other sources of data.

ASSIGN compares addresses in the NHS record with the Ordnance Survey's "Address Base Premium" UPRN database, one element at a time, and decides whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations. After rigorous testing and adjustments, ASSIGN correctly matches 98.6% of patient addresses at 38,000 records per minute. It also includes patients’ past addresses, making it possible to study addresses across the life span.

Researchers at Queen Mary are using the pseudonymised UPRNs to study:

·        how the health of household members impacts childhood obesity

·        whether overcrowded or multi-generational households are at greater risk from Covid-19

·        how to support GPs to identify people living in care homes so they can provide more effective care.

As ASSIGN is open source, it is hoped the algorithm will also be used by other researchers to link data, inform policy and improve population health across England.

Component services

ASSIGN supports two main services

  1. A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)
  2. A REST API that systems can use to request a matched address and a UPRN for an address.

UPRN data source

Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.

Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.

That being the case, if there was a service or application that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person.

Application Functionality

To access the application the user must either be a user of Discovery, or must create their own account in line with level 1 authentication process. An authenticated user has authority to access the UPRN match user interface. In addition to address matching, if the user wishes to have additional data, the user must have rights to access the OS ABP data, usually via the open Government license.

Inputting an address

There are two main functions available to the user:

  1. Enter an address of some kind and attempt to get a match
  2. Upload a list of addresses (between 1 and 100,000) and attempt to get a list of matches

An additional flag entered by the user indicates whether to match only on residential properties or include commercial matches.

A match is presented in one of three forms and the user can select either/ or:

  • Plain English address : Simply displaying the address fields and their value,together with information
  • JSON structure. This is a machine readable structure which includes the same information and can be processed by a computer more easily than plain English
  • CSV file. Suitable for importing into Excel (with a note on converting UPRNs to text) or to a database.

The matching details for an address are described in more detail below

Address match response

The following table explains the information returned following a match attempt.

The term 'Candidate address' is the address entered by the user or submitted in the file. The 'authority address' is an address provided by the Ordnance survery address based premium and may be a post office address or a local authority address.

No match responses

No match message Value Explanation
Address format Null address lines No address lines submitted
Insufficient characters The address format does not appear to contain enough characters to attempt to match (<9)
Post code quality invalid post code The post code is an invalid format
missing post code Post code is missing. The address may be matched without a post code but normally a post code is necessary even if incorrect in order to simply narrow down the potential matches
Out of area The post code is valid but unrecognised and appears to be out of area It should be noted that there will still be an attempt to match the address without a post code or a partial match on a post code.
Matched false No match

Match responses

Whenever there is a positive match, the response includes the qualifier of the match and the pattern of match,


The following information is provided when there is a match.

Field Value Description
Address format good Sufficient characters and if there is a post code it is valid
See unmatched The address quality may still be poor (post code etc) even though there is a match
Matched True There was a match. Note that this may not be an exact match
UPRN The unique property reference number provided by the Ordnance survey via the Address Based Premium service
Match qualifier See values in match algorithm article The qualifier of the match i.e. whether exact or close etc
Classification The classification code of the property
ClassTerm The term of the code of the property
Algorithm the code of the algorithm that ended up with the best match. For example 1-match1 indicates an exact match
ABPAddress Address components as below: The Address Base Premium address that match to the candidate address
Flat flat number
Building name of building
Building number number of building in street
Dependent thoroughfare A sub-street or something that is dependent on the street to get to the address
Street The street the building is on
Dependent locality An area smaller than a localitu
Locality A locality or area within a town or village
Town town
Post code post code
Organisation name of organisation if recorded by ABP
Match pattern see values in article on match algorithm For the purposes of match pattern reporting the 9 main address fields are rationalised to 5 and the match pattern indicates the approximation of the match and any field manipulation that took polace


ASSIGN algorithm

main article UPRN address matching algorithms

the algorithms used to match addresses are described in more detail by the UPRN address matching algorithms article.