ASSIGN- UPRN address match application: Difference between revisions

From Endeavour Knowledge Base
No edit summary
No edit summary
Line 2: Line 2:


== Background and purpose ==
== Background and purpose ==
The background to the objective of matching addresses and UPRN is as follows:
It is well established that a person's health is significantly affected by their environment in which they live and work.
It is well established that a person's health is significantly affected by their environment in which they live and work.


People live in flats, houses, or are homeless. People work at home, outside, or in offices or factories.
People live in flats, houses, or are homeless. People work at home, outside, or in offices or factories.


In order to identify or rectify health issues relating to environments it is necessary to evaluate the affect of location, type of property, and occupancy, on the people who live and work in them.
In order to identify or rectify health issues relating to an environment it is often useful to evaluate the affect of location, type of property, and occupancy, on the people who live and work in them.


In order to measure the effects on health it is necessary to link the health records of people to the properties in which they live and work.
To perform that evaluation it is of course necessary to know the characteristics of the people who live or work in the property or at a location. The usual way of recording where someone lives is to enter their address. Within the NHS it is common practice to record addresses as provided by the citizen. In doing so, the address is often recorded in a way that is slightly different from another address entry for the same location in another persons record.


In order to create that link it is necessary to identify the relevant property and the usual way in which a property is identified is via the person's home or work address.
This leaves the first problem of working out who lives in the same household. They may have slightly different address entries, and whilst a human can usually see they are the same, a computer cannot. Even when the different addresses are matched, this still leaves a second problem of finding out exactly where the address is.
 
If various hand entered addresses were to be matched to the one property it would be possible to determine precisely the characteristics of the people living or working at a particular location and property. This matching is labour intensive and problematic.


There are problems with using addresses as a way of linking:
There are problems with using addresses as a way of linking:
Line 20: Line 20:
# Many properties have more than one address that matches the property. For example the local authority may have one address, whilst the post office may have another.
# Many properties have more than one address that matches the property. For example the local authority may have one address, whilst the post office may have another.


One way of resolving this problem is to assign a property identifier to an actual location and link the various addresses to it. When this is done, then by linking a person's health record to this property identifier, issues relating to the property, location and occupancy can be studied, problems can be acted on, and lives will be saved.
Luckily, the Ordnance Survey have helped to solve this problem.


Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.
Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.
Line 26: Line 26:
Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.
Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.


That being the case, if there was a service that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person. This will save lives.
That being the case, if there was a service or application that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person.


Discovery information model supports the mapping of health related addresses to addresses provided by an authoritative organisation, those addresses being a gold standard for pointing to a UPRN. The matching service provides two subtypes of service:
Discovery information service supports the mapping of health related addresses to addresses provided by an authoritative organisation, those addresses being a gold standard for pointing to a UPRN. The matching service provides two subtypes of service:


# A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)  
# A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)  
#[[UPRN address matching API|A REST API]] that systems can use to request a matched address and a UPRN for that address
#[[UPRN address matching API|A REST API]] that systems can use to request a matched address and a UPRN for that address
The remainder of this article discusses the web application


== Functionality ==
== Functionality ==
To access the application the user must either be a user of Discovery, or must create their own account in line with [[Identity Authentication Authorisation#Authentication levels|level 1 authentication]] process. An authenticated user has authority to access the UPRN match user interface.
To access the application the user must either be a user of Discovery, or must create their own account in line with [[Identity Authentication Authorisation#Authentication levels|level 1 authentication]] process. An authenticated user has authority to access the UPRN match user interface. In addition to address matching, if the user wishes to have additional data, the user must have rights to access the OS ABP data, usually via the open Government license.


=== Inputting an address ===
=== Inputting an address ===
There are two main functions available to the user
There are two main functions available to the user:


# Enter an address of some kind and attempt to get a match
# Enter an address of some kind and attempt to get a match
# Upload a list of addresses (between 1 and 100,000) and attempt to get a  list of matched entries
# Upload a list of addresses (between 1 and 100,000) and attempt to get a  list of matches


An additional flag entered by the user indicates whether to match only on residential properties or include commercial matches.
An additional flag entered by the user indicates whether to match only on residential properties or include commercial matches.  


A match is presented in one of three forms and the user can select either/ or:
A match is presented in one of three forms and the user can select either/ or:
Line 50: Line 52:
* CSV file. Suitable for importing into Excel (with a note on converting UPRNs to text) or to a database.
* CSV file. Suitable for importing into Excel (with a note on converting UPRNs to text) or to a database.


The matching details for an address are described in more detail  
The matching details for an address are described in more detail below


=== Address matching information ===
=== Address match response ===
The following table explains the information returned following a no match.
The following table explains the information returned following a match attempt.


The term 'Candidate address' is the address entered by the user or submitted in the file. The 'authority address' is an address provided by the Ordnance survery address based premium and may be a post office address or a local authority address.
The term 'Candidate address' is the address entered by the user or submitted in the file. The 'authority address' is an address provided by the Ordnance survery address based premium and may be a post office address or a local authority address.

Revision as of 10:06, 11 August 2020

The UPRN address matching application enables a user to match an address to an "official" address, and in the process also provides the Unique Property Reference Number (UPRN) which has been assigned to the official address.

Background and purpose

It is well established that a person's health is significantly affected by their environment in which they live and work.

People live in flats, houses, or are homeless. People work at home, outside, or in offices or factories.

In order to identify or rectify health issues relating to an environment it is often useful to evaluate the affect of location, type of property, and occupancy, on the people who live and work in them.

To perform that evaluation it is of course necessary to know the characteristics of the people who live or work in the property or at a location. The usual way of recording where someone lives is to enter their address. Within the NHS it is common practice to record addresses as provided by the citizen. In doing so, the address is often recorded in a way that is slightly different from another address entry for the same location in another persons record.

This leaves the first problem of working out who lives in the same household. They may have slightly different address entries, and whilst a human can usually see they are the same, a computer cannot. Even when the different addresses are matched, this still leaves a second problem of finding out exactly where the address is.

If various hand entered addresses were to be matched to the one property it would be possible to determine precisely the characteristics of the people living or working at a particular location and property. This matching is labour intensive and problematic.

There are problems with using addresses as a way of linking:

  1. Addresses are recorded by provider organisations in inconsistent ways.
  2. Addresses themselves change over time.
  3. Many properties have more than one address that matches the property. For example the local authority may have one address, whilst the post office may have another.

Luckily, the Ordnance Survey have helped to solve this problem.

Unique Property Reference Numbers (UPRNs) are property identifiers for every property in Great Britain. Ordnance Survey provides access to these in a number of products, but their AddressBase Premium product is the most comprehensive database. It is derived from local government's NLPG (National Land and Property Gazetteer) as created and maintained by GeoPlace, Ordnance Survey’s OS MasterMap Address Layer 2 and the Royal Mail’s PAF (Postcode Address File). It adheres to British Standard for addressing BS7666, and every property has its Unique Property Reference Number (UPRN) and geographical co-ordinates. It is updated every 6 weeks.

Furthermore, there is an assured link between all addresses associated with a property over its life cycle, and from local authority and Royal Mail sources.

That being the case, if there was a service or application that matched the address from someone's health record to at least ONE of the addresses supplied by Address Base Premium then the UPRN can be derived and linked to the person.

Discovery information service supports the mapping of health related addresses to addresses provided by an authoritative organisation, those addresses being a gold standard for pointing to a UPRN. The matching service provides two subtypes of service:

  1. A Web application that people can use to upload a list of addresses and obtain a list of matched addresses and UPRNs (this article)
  2. A REST API that systems can use to request a matched address and a UPRN for that address

The remainder of this article discusses the web application

Functionality

To access the application the user must either be a user of Discovery, or must create their own account in line with level 1 authentication process. An authenticated user has authority to access the UPRN match user interface. In addition to address matching, if the user wishes to have additional data, the user must have rights to access the OS ABP data, usually via the open Government license.

Inputting an address

There are two main functions available to the user:

  1. Enter an address of some kind and attempt to get a match
  2. Upload a list of addresses (between 1 and 100,000) and attempt to get a list of matches

An additional flag entered by the user indicates whether to match only on residential properties or include commercial matches.

A match is presented in one of three forms and the user can select either/ or:

  • Plain English address : Simply displaying the address fields and their value,together with information
  • JSON structure. This is a machine readable structure which includes the same information and can be processed by a computer more easily than plain English
  • CSV file. Suitable for importing into Excel (with a note on converting UPRNs to text) or to a database.

The matching details for an address are described in more detail below

Address match response

The following table explains the information returned following a match attempt.

The term 'Candidate address' is the address entered by the user or submitted in the file. The 'authority address' is an address provided by the Ordnance survery address based premium and may be a post office address or a local authority address.

No match responses

No match message Value Explanation
Address format Null address lines No address lines submitted
Insufficient characters The address format does not appear to contain enough characters to attempt to match (<9)
Post code quality invalid post code The post code is an invalid format
missing post code Post code is missing. The address may be matched without a post code but normally a post code is necessary even if incorrect in order to simply narrow down the potential matches
Out of area The post code is valid but unrecognised and appears to be out of area It should be noted that there will still be an attempt to match the address without a post code or a partial match on a post code.
Matched false No match

Match responses

The following information is provided when there is a match.

Field Value Description
Address format good Sufficient characters and if there is a post code it is valid
See unmatched The address quality may still be poor (post code etc) even though there is a match
Matched True There was a match. Note that this may not be an exact match
UPRN The unique property reference number provided by the Ordnance survey via the Address Based Premium service
Match qualifier See values below The qualifier of the match i.e. whether exact or close etc
Equivalent Deprecated. Now Best match. This means that the algorithm thinks that it has found the equivalent entry as being the correct location. This does not mean it is an exact match, only that it thinks that the user 'candidate address' is the same location as the one that is listed below
Best match This means that the algorithm thinks that it has found an entry as being the best match and the correct location. This does not mean it is an exact match, only that it thinks that the user 'candidate address' is the same location as the one that is listed below and thinks it is a better match than others.

It may not be the case that it is the best match. Algorithms explain the "best fit" approach which differentiates what the machine thinks is the best match from what a human might think

Best (residential) match indicates that the user has attempted to match only on residential properties or those that may be residential or dual use.
Best (+commercial) match Indicates that the user has included commercial properties in the match algorithm
Child The candidate address is likely to be a child of the authoritative address listed below. For example it might be a flat within the building
Parent The candidate address is not specific enough e.g. is a building whereas the authority's address is more detailed
Sibling The candidate address is likely to be close to the address listed below e.g next door. i.e. it nearby but not necessarily the exact property. This is useful when using UPRNs for geo location, but if using household addresses it cannot be guaranteed to group households. Care needs to be taken when using this in household grouping.
Classification The classification code of the property
ClassTerm The term of the code of the property
Algorithm the code of the algorithm that ended up with the best match. For example 1-match1 indicates an exact match
ABPAddress Address components as below: The Address Base Premium address that match to the candidate address
Flat flat number
Building name of building
Building number number of building in street
Dependent thoroughfare A sub-street or something that is dependent on the street to get to the address
Street The street the building is on
Dependent locality An area smaller than a localitu
Locality A locality or area within a town or village
Town town
Post code post code
Organisation name of organisation if recorded by ABP
Match pattern see values below for each of the main 5 fields (flat, bulding, number, street, postcode) indicates the degree to which each field is matched and indicates the degree of manipulation or field swapping. A match pattern is built up by one or more of the phrases below i.e. may be more than one manipulation per field
mapped also to indicates a match using more than one candidate field
moved to Means that the candidate field was moved to another field to match e.g. number moved to flat
moved from Means that the candidate field was moved from another field to match on this field
field merged when moved from and to, the fields are then merged to match
ABF field ignored ABP field was ignored in order to match i.e. the ABP address contained more precise detail than the candidate but was unnecessary in order to match. This usually means that the candidate field is null
Candidate field dropped The candidate field was dropped in order to match i.. the candidate address has more precise detail than the authority address . The ABP address would probably be null
Matched as parent The candidate field matched as being at a higher level than the ABP field, for example flat 6 matching to flat 6a
Matched as child The candidate field matched as being at a lower level than the ABP field, for example candidate flat 6a, ABP flat 6
Partial match he candidate field was partially matched to the ABP field or vice versa) typically 2 out of 3 words
Possible spelling error The candidate field and ABP field were matched using the Levenshtein distance algorithm taking account of mispellings
Level based match The level of a flat in a building (vertical from the street) was used to create the match e.g. 2b for second floor b
Equivalent The fields are equivalent, albeit not necessarily spelt the same, using various equivalence lists, word swaps, word drops etc


Address matching algorithms

main article UPRN address matching algorithms

Address matching is surprisingly difficult, and the algorithms used to match addresses are described in more detail by the UPRN address matching algorithms article.