[Talk-GB] Request for UK address lists for postcode extraction
Andy Robinson (blackadder-lists)
ajrlists at googlemail.com
Mon Dec 1 15:35:34 GMT 2008
David Earl wrote:
>Sent: 01 December 2008 3:10 PM
>To: talk-gb at openstreetmap.org
>Subject: Re: [Talk-GB] Request for UK address lists for postcode extraction
>On 01/12/2008 14:11, Brian Quinion wrote:
>> I'm currently doing some work trying to generate postcode location
>> data for the UK using address lists and address lookup using OSM data
>> to supplement NPE. So far it seems to work quite well with the
>> address lists that I have available to me (and coping quite well with
>> ambiguous road names) but I'm limited in my data sources and most of
>> the address data is fairly consistent in both format and quality.
>> So, before I open the interface to the public, I'd like to test the
>> code with some lists provided by other people.
>> Does anyone have, or know of, any address lists that I would be able
>> to use for this purpose? Obviously it needs to be license compatible
>> with OSM (so please no lists generated from royal mail postcode data!)
>> and ideally I'm after data sets containing at least:
>> street address (house name / number optional)
>> town / city
>> formatted as CSV or TSV. I'm specifically not after data containing
>> the names of individuals.
>> Has anyone got any suggestions, or is willing to offer any data? Even
>> personal address books would be useful for testing...
>Why not do it the other way round?
>You know all the 2,500 or so prefixes, and there are only 26 x 26 * 100
>combinations for the second part for each - about 200 million in all. If
>you feed these potential postcodes in quotes into Google UK over a long
>period with appropriate pauses so as not to get locked out, and look at
>the result for recognizable addresses (that's the tricky bit) as I'm
>doing in the Namefinder, you'd probably cover 75% of UK postcodes.
>Yes, its slow, but it's probably the biggest source there is. At one a
>second it would take about 6 years, but by enlisting 100 friends you'd
>do it in a month - less if it's possible to be more intelligent about it
>- for example, for the number part if there's no 14XX or 15XX I doubt
>there would be any 16s or above either, except for a few special cases.
I'm curious about this. Data scraped via Google is still subject to the
terms of the original page it references?
>Talk-GB mailing list
>Talk-GB at openstreetmap.org
>No virus found in this incoming message.
>Checked by AVG - http://www.avg.com
>Version: 8.0.176 / Virus Database: 270.9.12/1821 - Release Date: 30/11/2008
More information about the Talk-GB