[OSM-talk] Expanding the postcode database

Robert (Jamie) Munro rjmunro at arjam.net
Wed Jan 23 00:49:41 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gervase Markham wrote:
| Ray Booysen wrote:
|> I actually have a database lying around somewhere will all
|> possibilities.  Quite a high number.
|
| Any chance of digging it out and doing SELECT COUNT(*)?


It's just number of prefixes * 10 * 26 * 26.
= 2907 * 6760 = 19,651,320.

As someone else said, there are 1.8 million actually issued. We can't
look at that list.

Google's apis limit you to 1000 searches / day, so that's ... a long
time :-)

We could ask our friend Ed at Google if he can get us a more liberal
API, or we could look at Yahoo's API, and there may be others. If we
spread across several, we may get somewhere. Another, albeit dodgier
possibility is to hand off the searching to the mechanical turk style
agents, so their role is to search for the postcode, copy the address,
and paste it into OSM search. Perhaps some sort of JS hack could do this.

Another possibility is that we could spider websites that turn up
frequently in the Google results, or that we know have a lot of
addresses on, but this is likely to be close to the bone in copyright of
those sites.

The other option is to try to guess the addresses of places, then do a
forward search for those addresses, and pull the postcodes out of that.
With my very limited testing, this is successful, but usually only gets
one postcode per street, where most streets have at least 2 postcodes -
often more.

Of course, if we are evaluating all postcodes, it would be useful to
keep the list of road names that are not found, along with roughly where
they are (from the first part of the postcode + first digit of the
second part). This would provide a useful way to find roads that need to
be mapped.

Robert (Jamie) Munro
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHlo8iz+aYVHdncI0RAlL8AKD1flOCE2+f/wqqUWKbxUAzhZJLBQCg9P7B
9Mn2GQpXKiROAfG3ynPwQbQ=
=Dv/Q
-----END PGP SIGNATURE-----




More information about the talk mailing list