Thanks all for the fruitful discussion and the helpful comments!

We agree on the need for local, manual validation.

As a trial, we could start with a 50 by 50 km low density area
(something like http://osm.org/go/0CkJV_Q- ) and have it fully validated
by the local community, hence only manual edits.
The process could be as follows:
1. Extract built-up areas and building density from imagery (automated)
2. For each polygon (semi-automated):
2a. In the area covered by the polygon and in the neighbourood (say 0,5
to 1 km buffer) do an analysis of all tags related to landuse. Continue
with next step 2b only if there is absolutely no relevant information in
OSM (to avoid duplicates or overlapping polygons).
2b. Have an algorithm choose the best combination of tags for this
polygon (without uploading). This could be decided on the basis of 2a
(the polygons already coded in 2a are our training set): e.g., among our
polygons, let's take those that are very similar to OSM polygons already
tagged in OSM as “place=village” (we have dozen of textural,
morphological, etc. measures for this); for those, we propose the tag
“place=village”. We are thinking of doing some kind of statistical
analysis (PCA/supervised classification) here. After this step, we may
even go back to step 1 (maybe iteratively) to fine-tune our detection
algorithm (the algorithm produces a continuous value, we have to
discretize it with a threshold, step 2b can help us choose the threshold
that best fit existing OSM data)
2c. For polygons ruled out in step 2a, compute one or two statistics
(such as building density, and maybe average building size for instance)
for the OSM polygon
3. Share the polygons by USB key or email to other local fellow mappers
(no posting to a web site). I have a few names in mind: In the past few
weeks, as a hobby, I've been training friends of the Italian Alpine Club
(especially the mountain bikers) on OSM mapping, and altogether they
have an excellent knowledge of the area. I have also put an announcement
in http://wiki.openstreetmap.org/wiki/Lombardia_Community
(you will also find my two login names here ;-)  ). I also have contacts
with several active mappers in other institutes of our research center.
4. Have local mappers (including myself) manually validate or reject
each polygon, and manually upload the validated polygons, with
appropriate tags (e.g. landuse=residential,  landuse=commercial,
5. Report on the above.

The area that we suggest ( http://osm.org/go/0CkJV_Q- )  seems a good
place to test this, as there are already a few village polygons
digitized, but many are missing, and the majority of building footprints
are missing.

Jorge wrote: “But you can contribute by calculating, for example, the
OSM building coverage in some countries or cities, and then let the
community knows how bad/good is the current OSM coverage.”
We like this idea, but how can we share it? (legal and technical
constraint). This is strongly related to the next point: The only thing
that does not fit in the above 5-step process is how to share the result
of step 2c (it is not convenient to share id/density pairs by email).
Maybe this could be the (semi-) automatically uploaded part.

Thanks in advance for any comment.


