[Talk-GB] Ordnance Survey data matching

Sun Apr 4 14:28:29 BST 2010

Please slap me if I'm either jumping the gun, or duplicating here, but
I don't think anyone has covered this publicly already.

I have had a quick poke around, and the meridian2 data seems to use a
UID called OSODR (Ordnance Survey Oscar Database Reference). After
some further poking around, it seems that this reference will be
consistent across all of their data releases, though this is based in
part on assumptions. (anyone have any more detail on this?)

Now it seems like a very worthwhile exercise to attempt to do some
detailed matching up of the the ways in the OS data and the ways is
the OSM data, this is a completely non intrusive process, and can even
be done offline, so it's not a problem to be doing now.

I'm not well positioned to do this myself due to a lack of sql
experience, but here is my suggestions:

Pick a county that's a manageable size, and have some well mapped
areas, some poorly mapped areas, and some non mapped areas.

Ignore everything that isn't a road.

Then run a bunch of searches on the 2 datasets to find ways that match
between them.

if the start and end coords match (within ~5 meters or so), they are
likey the same;
if the start and end coords match, but backwards, they are likely the
same with a reversal.
the above ways can then be removed form further searches.

Take a look at the matches, and remove any that in fact don't follow
the same (or close to) course (for each node in each dataset, check
it's proximity to the closest waysegment in the other, not perfect,
but good enough i reckon)

Take a look at the data that's left, and work out where to go next. I
suspect there will be ways that exist as 2 end to end ways (where a
road name changes) in one set, but as a single way in the other. Or
areas where a road name changes, but the position of the change is
different between the datasets.

There will be areas that just straight up don't match, these will be
numerous, and would be best filtered for carefully, and flagged for
human checking (openstreetbugs?)

Subtleties that need further investigating would include: split carage
ways; roads that only partially exist in our data (country lanes that
have poorly defined ends or have not been fully surveyed); anything in
our data marked position=approximate

The results of this process could lead to some really useful data. our
geometry (in general) seems to be better than the meridian2 data, but
there are areas where we are missing data such as names, or any data
at all in some rural areas.

The general idea would be to do an import that takes the best from
both data sets, and preserves all of our data except where identified
as beeing inferior.

If we can generate a list of ways that exist in meridian2, but are
absent totally from our data, I say it would be worth importing them
(carefully) their geometry is fairly poor, but it's well within usable
parameters. And it's complete.

If the import is done sensibly, it would be a fairly simple process to
reimport any ways that have had no further work on them if better data
becomes available from OS (someone said something about that
happening) using a filter on last update user and OSODR reference.
(this is based on the same assumption as above)

Other moderately related points: their coastline data is way ahead of
ours (even if offset by a fixed distance from what I've seen, no sure
even which side the error is on, email me for a reference if
interested, I'll try to find the data I was looking at again). At
least in areas where no one has updated it. Unfortunately coastline
ways are quite long, (though from what I've seen, not unmanageably so)
and may have been updated in part or only very slightly, checking for
version of nodes may be worthwhile in this case.

So, am I onto somthing, or has this already been descussed to death on
some other list?

Thanks,
JR