[OSM-dev] Generalisation

Marco Boeringa marco at boeringa.demon.nl
Wed May 2 16:33:51 UTC 2018

Hi Tomas,

The generalization I wrote about was just a crude basic generalization 
of vector (building) data of OSM using a default tool of ESRI's ArcGIS. 
The specific tool used (Simplify Polygon), has more advanced settings 
than standard Douglas Peucker, but by itself does nothing really special 
other than weeding out vertices / nodes. I just attempted to use it with 
different tolerances to see what the results would be, and concluded the 
resulting defects in building topology, were not worth the reduction in 
file size.

Of course, if your city's buildings are far more detailed than the 
average building in OSM, e.g. an import of official government data as 
measured up to centimeter level using land surveying techniques, with 
rather large vertex counts on average, I can imagine that even simple 
techniques to generalize, may reduce vertex counts more than I achieved. 
And also depends a lot on how much you personally tolerate artefacts... 
(mine is low for buildings).

However, as to interesting stuff to read: our national Kadaster of the 
Netherlands, is actually the very first and only national mapping 
organization world wide, that has successfully managed to implement a 
fully automated generalization work flow for generating 1:50k maps from 
1:10k maps, including landuses, waterways, highways, but also 
generalizing build-up areas and buildings. They used a range of 
cartographic generalization tools from ArcGIS (that I didn't use...).

The results achieved by the Dutch Kadaster, closely mimic what Imagico 
states as a more holistic approach to generalization, and are largely 
truth to how cartographers traditionally manually generalized maps. In 
fact, one of the key aspects of the workflow developed by the Dutch 
Kadaster, was to mimic as closely as possible the inherent "rules" their 
cartographers used and developed over decades or more than a century, to 
"generalize" maps to smaller scales.

However, if you read the level of effort needed to achieve this (years 
of development by a small team of employees / researchers, and a huge 
tool chain build up), and the sheer processing power needed to do such a 
sophisticated generalization, it is utterly clear you cannot do this in 
real time. It is only worth the effort in organizations like the 
national mapping agencies, where the ultimate gain of automatization to 
fully replace manual conversion of topographic maps from one scale to 
another, or to keep different workflows for different scale map series 
(1:10k,1:25k,1:50k,1:100k,1:250k etc.) alive, far outweighs the effort 
to develop such a generalization tool chain and workflow in the long run.

They now call this workflow "AUTOgen"

The Dutch Kadaster was actually awarded a prize by ESRI for this work. 
See also this ArcNews bulletin (pages 19-21): 

Some links to this work by the Dutch Kadaster:

Links to English pages of Dutch Kadaster;

Other interesting information regarding buildings (LODs) from research 
involving one the person also involved in the Kadaster work:

(Note: I wasn't involved in any of this by the way, just know of this work)


Op 16-4-2018 om 19:23 schreef Tomas Straupis:
> 2018-04-16 19:34 GMT+03:00 Marco Boeringa wrote:
>> No, buildings are not the most interesting. I once generalized all buildings
>> in Denmark. It only reduced the storage by maybe 5%, at the high cost of
>> heavily distorting a large number of them. Most buildings in OSM are in fact
>> already in their most generalized state: just 4 nodes. Unless you think
>> triangles is a suitable representation ;-)
>    Interesting, what algorithm did you use?
>    I'm playing around in Vilnius which has urban houses, big block
> houses, industrial zones and old town with lots of connected buildings
> of very irregular shapes.
>    In Vilnius there are 54267 buildings tagged with 366979 vertexes.
>    Clustering them with distance of 5m gets 45810 objects (of course
> with the same number of vertexes).
>    Removing buildings with area < 100 and having neighbours in < 500
> meters I'm left with 28974 buildings with 299224 vertexes.
>    Simplification (amalgamating buildings in the cluster and trying to
> remove edges < 20m) reduces the number of vertexes to 117108.
>    So this is much more than 5%.
>    There are still a lot of problems (no triangles:), but I do not
> expect number of vertexes to rise considerably.
>    Even "dumb" generalisation (st_buffer+- with join=mitter) reduces
> vertex count by ~25%.
>    Reducing storage/tile size is not the only/main purpose of generalisation.

More information about the dev mailing list