[OSM-dev] New OSM binary fileformat implementation.

Thu May 6 17:13:50 BST 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sorry for the delay in responding; crazy life, and I've been fixing
existing bugs in my project rather than thinking about breaking new ground.

On 05/02/2010 12:35 AM, Scott Crosby wrote:
> 
> With pruning out metadata, some judicious filtering of uninteresting tags,
> and increasing the granularity to 10 microdegrees (about 1m resolution),
> I've fit the whole planet in 3.7gb.
> 
Sweet. I hope this format works for my use case.

>>
> I have no code for pulling entities out by ID, but that would be
> straightforward to add, if there was a demand for it.
> 
I would definitely need that. I'm coding to the travelingsalesman API's
DataSet interface which does include retrieval by ID.

> have to pay a disk seek whether it is in my format or not. My format being
> very dense, might let RAM hold the working set and avoid the disk seek. 1ms
> to decompress is already far faster than a hard drive, though not a SSD.

Keeping everything in RAM is probably workable. At the very least, to go
global with a format like this would seem to be a matter of starting
with a mid-level VPS that stores everything on disk and eventually
upgrading to a high-RAM, low disk space EC2 or GoGrid instance. Without
it, I'm looking at half a TB of storage and possibly a significant chunk
of RAM, and even so I don't think my current dataset can handle that.

In other words, I like the option of keeping everything in RAM far
better than what I'm doing right now. :)

> 
> Could you tell me more about the kinds of lookups your application will do?
> 

Sure. You can see the interface I've implemented here:

http://travelingsales.svn.sourceforge.net/viewvc/travelingsales/trunk/libosm/src/org/openstreetmap/osm/data/IDataSet.java?view=markup

Basically, the executive summary is that there are four broad kinds of
lookups:

Entity by ID, as mentioned earlier

Entities based on intersection with bounding box, currently done by the
somewhat inaccurate method of finding all contained nodes, then
returning any associated ways/relations. Would be great if I could
locate contained ways even if they don't have a node in the box, but
even if not, it'd be no worse than what's there now. :)

Entities by presence of certain tags, in some instances also with
bounding box conditions (I.e. all "amenity"->"fuel" nodes, or all of
such nodes within a given bounds)

Nearest entity to a given point, expanding outward. I can, for instance,
roughly find the nearest way by finding the node nearest to a set of
coordinates, checking for its presence in any ways, then finding the
next nearest and recursing outward until the conditions are met. The
conditions check is done externally, so the search need only return the
nearest entity, next nearest, etc.)

I know you've said elsewhere that you don't want this format to replace
the need for a database, and I respect that. I just don't quite know
where that line is. Even so, I clearly don't need all of my database's
functionality for the OSM-facing aspects of this app and hope that these
limited uses are in scope.

Thanks for thinking about and working on these issues. :)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvi6r4ACgkQIaMjFWMehWJvigCfV6d+2UY/5Mm1HCHquTMOG5Ru
h50An0DeN8y+ADCBsVLw1V4w0xt+nql1
=wJIc
-----END PGP SIGNATURE-----