[OSM-dev] Defining the area of a country / What country is a LatLon in.

Brett Henderson brett at bretth.com
Wed Nov 21 22:10:55 GMT 2007


Jon Burgess wrote:
> Any borders you get from the OSM data are going to have many thousands
> of nodes with way more detail than you probably require. From what I
> recall, Brett said previously that the osmosis bounding box task was not
> optimised for bounding boxes with this many nodes and would probably be
> really slow. 
>
> 	Jon
>   
Just as an FYI, osmosis should work fine in this case with a little 
command line tweaking.  The main problem with these tasks is the memory 
consumption, which is mainly caused by tracking node ids within a box.  
The osmosis bounding area tasks (--bounding-box and --bounding-polygon) 
by default use sorted lists of ids to track which entities are inside 
the box.  This means that every node inside the box results in 
approximately 4 bytes of data being consumed (plus small amount of 
overhead).

However the id tracking mechanism is configurable.  Both the tasks 
mentioned above accept a "idTrackerType" argument which accepts values 
"BitSet" or "IdList".  The default is IdList which consumes 32 bits per 
id and is the most efficient for small'ish bounding boxes.  However if 
you wish to create a much larger bounding box (containing approximately 
more than 1/32 of the planet) it will be more efficient to switch to the 
"BitSet" implementation where every id in the database consumes 1 bit of 
memory regardless of whether it lies within the bounding box.  The 
IdList implementation consumes memory proportional to the bounding box 
size, the BitSet implementation consumes memory proportional to the 
planet size.

In other words, for small bounding boxes use the default "IdList", for 
large bounding boxes use the "BitSet" implementation.  If you're only 
creating a single bounding box, and you aren't severely memory 
constrained, you can always specify idTrackerType=BitSet.

Cheers,
Brett

PS.  I forgot to document this command line option on the osmosis wiki 
page, but it's fixed now.

PPS.  The crossover point at which BitSet becomes more efficient is for 
boxes containing roughly 1/32 of the planet data, but deleted nodes skew 
that figure by making the BitSet consume memory for non-existent nodes.  
The real number will be somewhere between 1/32 and 1/16 of the planet.





More information about the dev mailing list