[Imports] CLC meadow cleanup

Ilpo Järvinen ilpo.jarvinen at helsinki.fi
Sat Mar 22 08:21:35 UTC 2014


On Sat, 22 Mar 2014, Paul Norman wrote:

> I was doing some CLC cleanup tonight, removing landuse=meadow polygons 
> that didn't remotely match more recent imagery. Of all the meadow 
> polygons, not one was worth keeping. I found small woods, roads, farms, 
> residential areas, and basically anything but good data. After going at 
> it piece-meal I'm wondering if we need to go after it in a systematic 
> manner with a mechanical edit.
> 
> There are 19k ways and 1.2k relations with CLC:id, landuse=meadow, and 
> version=1. About the same number of both have version>1. Based on the 
> sampling I did, if any are accurate, it is purely by chance.
> 
> What I'm wondering is 
> 
> 1. I did the editing in Poitou-Charentes, France. Is the CLC data here 
>   representative of other data? 
> 
> 2. Are there other CLC classifications which are just as bad? 

I don't know about France, but CLC data in general seems to be just as
bad as you describe. Only when there's really large, continuous body of 
something CLC might have guessed almost right but obviously the boundary 
accuracy is still similarly bad. On areas where there are lots of 
discontinuities/small features, the results pretty much equals to random
for any small feature.

> If the area I looked at is representative, I am contemplating proposing 
> a mechanical edit to remove the bad data. What are peoples thoughts on 
> this? 
> 
> I'm not getting into specific details at this point, as I'm just 
> evaluating the concept. Before actually doing a mechanical edit, I'd 
> provide technical details for review, and raise the question with a 
> wider audience. 

Usually when CLC comes up in discussions (with those drawing by hand or 
surveying), I don't ever hear anything positive about it, which is no 
wonder, those people are the ones who encounter all the garbage and
are confused what should be done with it (remove or what).

>From those, who imported it, I kept hearing that CLC can be fixed once
in DB but obviously nobody ever stepped in to do that. Even those 
discussions have now died (at least here in Finland), although opening 
up of some other datasets might be a partial reason to that (but I 
personally doubt the fixers would have appeared regardless of the other 
datasets).

IMHO, CLC is good example of the wrong import approach, i.e., somebody
imports garbage data to DB first so that people can then fix it. The 
fixing, when already in DB, seems to hardly occur in practice for any 
significant number of the imported geometries.

The correct approach is to fix things prior to import or immediately 
after putting something to DB. Delayed fixing is not going to work
in practice. Although I can well understand why dumping first to DB
looks appealing, it requires lot less effort first but in the end
it will be big pain like CLC now is to all doing much more useful
mapping (than CLC data ever is/was).

Sadly, I expect there to be some resistance from those who are/were for 
this "dumping approach" if you start removing the garbage. At least here
I always hear that "removal should not be done before the actual replacing 
with the other, better dataset occurs".


-- 
 i.



More information about the Imports mailing list