[OSM-dev] Block sizes in PBF Format

Jochen Topf jochen at remote.org
Tue Nov 30 20:09:18 GMT 2010


On Tue, Nov 30, 2010 at 10:30:46AM -0600, Scott Crosby wrote:
> On Tue, Nov 30, 2010 at 3:28 AM, Jochen Topf <jochen at remote.org> wrote:
> > The PBF_Format wiki page states: "The length of a Blob *should* be less than 16
> > megabytes and *must* be less than 32 megabytes." But forther down it says "I
> > collect 8k entities to form a PrimitiveBlock, which is serialized into the
> > Blob..."
> >
> > So what happens if the 8k entities take up more than 32 megabytes? Thats 4k
> > per entity, which could be reached with large relations. Well, we need quite
> > a few of those large relations, but its good to know where the limits of the
> > format are and they should be clearly documented.
> 
> I chose those limits so that software could reject bad files without
> crashing due to running out of RAM. However, if the limits cause
> problems with storing those big relations, that is a limitation in my
> osmosis implementation, not in the design of the format.

And it allows us to work with a fixed size buffer, which also makes things
easier. So I am all for it.

I am not seeing a problem with big relations at the moment, we just have to
keep that in mind. With the 32 MB block limit we can't have a single relation
with more that 32 MB, but that would probably break many other software, too,
so I am fine with that. :-)

> For simplicity in my implementation, I had the osmosis serializer use
> the same number of entities in each block, and made that a command
> line option (for testing purposes). Nothing in the format requires a
> fixed number of entities in each block, and a better implementation
> could operate with a variable number of entities in a block, starting
> a new block whenever it estimates that the current one is 'too big'.
> 
> A short-term workaround might be to store only 2k relations in a block.
> 
> Thanks for the question, I have changed the wiki to note that the 8k
> entities in a block is an implementation decision, and at the same
> time to note that the size limits on a blob are uncompressed sizes.

Great!

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/  +49-721-388298




More information about the dev mailing list