[Imports] Proposal for proper OSM import solution (OpenMetaMap)

Thu Aug 18 12:22:53 UTC 2011

On Thu, Aug 18, 2011 at 6:23 AM, Jaak Laineste <jaak at nutiteq.com> wrote:
> Hello,
>
> Based on my own long-time thinking and small talk in WhereCamp Berlin
> I created request for comments on kind of different approach to
> imports called meta-mapping.

Since this proposal is nearly (exactly) identical to a thought I had
about a year ago, I feel pretty qualified to speak about it.

The objective of a tool like this would be to allow someone to run a
database of geographic data and isolate it from other datasets- that
is by keeping the databases separate, one may allow for more
flexibility in changing the data in one of the non-OSM datasets.

An example would be if a city government's dataset were to add/remove
listings of libraries, the conflation process in OSM would be harder
than it would if there were simply a database where the information
existed in isolation and then were linked to OSM. Simple, right?

Sadly, the solution has flaws when the rubber meets the road.

1. By moving objects out of the OSM database, you move the complexity
out of the OSM database and into the conflation database

Moving the problem doesn't solve it. It just hides it (and you'll see
why in a the next few points).

2. This approach implies that external data sets are correct.

Underlying this approach is an assumption that we can rely on other
datasets accuracy. Sadly this is not the case. As I work with more
datasets and compare them to on the ground surveying, I find that many
government datasets are either wrong, or out of date.

Take TIGER as an example. I'm going through TIGER 2010 as we speak.
Most of what i've found indicates that when OSM is active in an area,
our maps are more accurate than TIGER, even TIGER 2010, which is more
accurate than TIGER 2005 (what was imported in the US).

We need therefore to encourage more mappers to map and not to rely on
these external datasets. This project would do the opposite.

3. Data in the aggregated map won't be collected by on the ground mappers.

Some data, like the road data, will appear in both OSM and external
datasets, but there's other data which may just never get collected by
the community, if the map appears to already be complete.

And then since there's less on the ground mapping, the problems I
mentioned earlier regarding flawed external datasets don't get noticed
and corrected.

4. It assumes OSM object IDs remain constant.

OSM object IDs change. They don't change a lot, but they do change,
and you can't force users to jump through hoops to preserve them (as
we've seen people propose).

5. It assumes external data sets IDs remain constant

One of the whole points of this project seems to be to keep up to date
with external datasets, such as those put out by local governments
every quarter.

Since most of these external datasets will be given in Shapefile
format, there will need to be a conversion process.

You can't be assured that the ID numbers on objects will remain
constant from Q1 and Q2. Heck, I bet you'd find that even their own
internal IDs won't remain constant, at least not for every single ID
on every single object on every single external database, of which
there may be dozens or more.

So you're constantly in a race to conflate changing object IDs.

6. License nightmare

This is a powder-keg ready to explode, but I'll just say this:
Incompatible licenses will not allow this.

7. Tremendous work.

The conflation process would be very hard to do, and frankly, not a
lot of fun. You'll end up writing programs to do most of it I'm sure,
but no programs will be perfect.

So people have to do it, and, frankly, it's not fun work.

These are the reasons I never went forward with this project.

- Serge