[OSM-dev] Defining the area of a country / What country is a LatLon in.
brett at bretth.com
Wed Nov 21 22:10:55 GMT 2007
Jon Burgess wrote:
> Any borders you get from the OSM data are going to have many thousands
> of nodes with way more detail than you probably require. From what I
> recall, Brett said previously that the osmosis bounding box task was not
> optimised for bounding boxes with this many nodes and would probably be
> really slow.
Just as an FYI, osmosis should work fine in this case with a little
command line tweaking. The main problem with these tasks is the memory
consumption, which is mainly caused by tracking node ids within a box.
The osmosis bounding area tasks (--bounding-box and --bounding-polygon)
by default use sorted lists of ids to track which entities are inside
the box. This means that every node inside the box results in
approximately 4 bytes of data being consumed (plus small amount of
However the id tracking mechanism is configurable. Both the tasks
mentioned above accept a "idTrackerType" argument which accepts values
"BitSet" or "IdList". The default is IdList which consumes 32 bits per
id and is the most efficient for small'ish bounding boxes. However if
you wish to create a much larger bounding box (containing approximately
more than 1/32 of the planet) it will be more efficient to switch to the
"BitSet" implementation where every id in the database consumes 1 bit of
memory regardless of whether it lies within the bounding box. The
IdList implementation consumes memory proportional to the bounding box
size, the BitSet implementation consumes memory proportional to the
In other words, for small bounding boxes use the default "IdList", for
large bounding boxes use the "BitSet" implementation. If you're only
creating a single bounding box, and you aren't severely memory
constrained, you can always specify idTrackerType=BitSet.
PS. I forgot to document this command line option on the osmosis wiki
page, but it's fixed now.
PPS. The crossover point at which BitSet becomes more efficient is for
boxes containing roughly 1/32 of the planet data, but deleted nodes skew
that figure by making the BitSet consume memory for non-existent nodes.
The real number will be somewhere between 1/32 and 1/16 of the planet.
More information about the dev