[OSM-dev] Binary OSM; the first pass encoder

Stefan de Konink stefan at konink.de
Sun Nov 9 06:14:48 GMT 2008


Stefan de Konink wrote:
> As discussed before; it is possible to do a second pass binary encoding 
> with all strings in a distinct table. Where the linked list can be 
> recovered to an array can be recovered from the storage. This would make 
> a significance difference for the tag keys alone.
> 
> In this case all string fields can converted to unsigned long fields for 
> now 4G of distinct fields seems enough :)

Since I have some more statistics.

The binary file is 418MB
The strings within the binary file 224MB (\n terminated)
Amount of lines: 29688795
This list deduplicated: 19MB
Amount of lines: 2087179


So with some quick calculations:

418 - 224 + 90 + 19 =~ 303MB


...now it would be nice to see how this values work out on the full 
planet :) Never the less; 300MB of binary data directly useable in
any application, plus an on demand generated index, doesn't sound bad 
for entire country.


Stefan




More information about the dev mailing list