[OSM-dev] New OSM binary fileformat implementation.

Sat May 1 16:25:01 BST 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/01/2010 09:47 AM, Scott Crosby wrote:
>>
> I agree wholeheartedly with letting it evolve and I'm interested in hearing
> yours (and others) thoughts on what additional features to include or
> exclude.
> 

Some of these questions may be a bit premature, but I don't know how far
along your design is, and perhaps asking them now may influence that
design in ways that work for me.

I'm developing an accessible map-browsing, GPS navigation app. You can
read my initial blog post on the project here:

http://thewordnerd.info/2010/03/introducing-hermes/

At the moment, this uses LibOSM from travelingsalesman and an
as-of-yet-unreleased dataset using MongoDB for the geospatial queries. I
don't really understand enough higher-level math to roll my own
geospatial code, especially since I can't visually verify the results,
so it's easier to use LibOSM and roll a dataset that I can run on a
production site than it would be to re-invent the wheel.

Unfortunately, this method introduces a variety of complications. First,
the database for TX alone is 10 gigs. Ballpark estimations are that I
might need half a TB or more to store the entire planet. I'll also need
substantial RAM to store the working set for the DB index. All this
means that, to launch this project on a global scale, I'd need a lot
more funding than I as an individual am likely to find.

I'm really excited to read your numbers for compression, because at
first glance, this would seem to take the project from something that
I'd need substantial EC2 infrastructure for, to something I can run on a
mid-level VPS, slashing costs from $1000+month to $50 or so/month. So my
questions:

Is there a performance or size penalty to ordering the data
geographically rather than by ID? I understand that this won't be the
default case, but I'm wondering if there would likely be any major
performance issues for using it in situations where you're likely to
want bounding-box access rather than simply pulling out entities by ID.

Also, is there any reason that this format wouldn't be suitable for a
site with many active users performing geographic, read-only queries of
the data? Again, I'd guess not, since the data isn't compressed as such,
but maybe seeking several gigs into a file to locate nearby entities
would be a factor, or it may work just fine for single-user access but
not so well with multiple distinct seeks for different users in widely
separate locations.

Anyhow, I realize these questions may be naive at such an early stage,
but the idea that I may be able to pull this off without infrastructure
beyond my budget is an appealing one. Are there any reasons your binary
format wouldn't be able to accomodate this situation, or couldn't be
optimized to do so?

Thanks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvcR80ACgkQIaMjFWMehWLAngCcDTYdjW6SrKaPoKdqjjEY4r3U
C34AnR4f8NEM18Z07Xr9vjli8/6UFYCz
=feGc
-----END PGP SIGNATURE-----