[OSM-dev] Simpler binary OSM formats

Sun Feb 7 19:10:54 UTC 2016

>
> As for fixed sized blocks in vex, I did consider that option but couldn’t
> come up with a compelling reason for it. I can see the case for a maximum
> block size (so we know what the maximum size of allocation will be), but
> can you give a concrete example of how fixed-size blocks would be
> advantageous in practice? I would be very hesitant to split any entity
> across multiple blocks.
>

When you need relations-ways-nodes read order, blocks will save you  from
unnecessary read-through the whole file (yes, you can skip decompression
for nodes/ways but still you must read the whole file).

Second example: find something by id, if you have blocks it's easy to map
whole block into memory and do a binary search for logN block reads instead
of seeing through a file all the time.

2016-02-07 23:59 GMT+05:00 Andrew Byrd <andrew at fastmail.net>:

> Hi Dmitry,
>
> Yes, there are similarities and I did study the o5m format before I began
> work on vex. The last section of my original article compares the two and
> gives my impressions of o5m:
> http://conveyal.com/blog/2015/04/27/osm-formats#comparisons-with-o5m
>
> In summary: o5m uses string tables with a fixed size and an LRU eviction
> policy. Producers and consumers must keep their string tables exactly in
> sync. Strings are then referenced by integers indicating how recently they
> were used (1 to 15000). This adds quite a bit of complexity to o5m
> implementations, especially considering that this eviction strategy can
> backfire on certain inputs leading to files that are actually bigger than a
> basic gzipped text representation of the same data. According to
> http://wiki.openstreetmap.org/wiki/Talk:O5m#Compression_Algorithms o5m
> uses string tables specifically to avoid relying on general purpose
> compression. I find this unnecessary considering that zlib compression is
> quite effective, resource efficient (with adjustable compression level),
> and available practically everywhere.
>
> There are a few other unusual design decisions documented and discussed at
> http://wiki.openstreetmap.org/wiki/O5m and
> http://wiki.openstreetmap.org/wiki/Talk:O5m. For example, strings are
> both introduced and terminated by a null byte, and are often stored in
> pairs (i.e. three null bytes per string pair, one of which is at the
> beginning of the string).
>
> Of course I recognize o5m's contribution to the dialog on binary formats,
> and we can of course learn from the o5m concept, but my conclusion is that
> it does not have the combination of extreme simplicity and compactness
> necessary to complement the existing formats.
>
> As for fixed sized blocks in vex, I did consider that option but couldn’t
> come up with a compelling reason for it. I can see the case for a maximum
> block size (so we know what the maximum size of allocation will be), but
> can you give a concrete example of how fixed-size blocks would be
> advantageous in practice? I would be very hesitant to split any entity
> across multiple blocks.
>
> -Andrew
>
> On 07 Feb 2016, at 09:06, Дмитрий Киселев <dmitry.v.kiselev at gmail.com>
> wrote:
>
> Looks pretty similar to o5m, except tags key=value are not round-buffered.
>
> As a further extension, it would be nice to have the ability to have
> blocks of fixed size.
> Just write nodes one by one while you haven't full-fill byte buffer.
> For extremely big relations (which are larger than one block) it's
> possible to mark two adjacent blocks as connected, but there should be a
> few of them.
>
> It would help to read write and seek over files.
>
> 2016-02-07 3:47 GMT+05:00 Stadin, Benjamin <
> Benjamin.Stadin at heidelberg-mobil.com>:
>
>> Hi Andrew,
>>
>> Cap'n Proto (successor of ProtoBuffer from the guy who invented
>> ProtoBuffer) and FlatBuffers (another ProtoBuffer succesor, by Google) have
>> gained a lot of traction since last year. Both eliminate many if the
>> shortcomings of the original ProtoBuffer (allow for random access,
>> streaming,...), and improve on performance also.
>>
>> https://github.com/google/flatbuffers
>>
>> Here is a comparison between ProtoBuffer competitors:
>> https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html
>>
>> In my opinion FlatBuffers is the most interesting. It seems to have very
>> good language and platform support, and has quite a high adoption rate
>> already.
>>
>> I think that it's well worth to reconsider creating an own file format
>> and parser for several reasons. Your concept looks well thought, it should
>> be possible to implement a lighweight parser using FlatBuffers for your
>> data scheme.
>>
>> Regards
>> Ben
>>
>> Von meinem iPad gesendet
>>
>> Am 06.02.2016 um 22:37 schrieb Andrew Byrd <andrew at fastmail.net>:
>>
>> Hello OSM developers,
>>
>> Last spring I posted an article discussing some shortcomings of the PBF
>> format and proposing a simpler binary OSM interchange format called VEX.
>> There was a generally positive response at the time, including helpful
>> feedback from other developers. Since then I have revised the VEX
>> specification as well as our implementation, and Conveyal has been using
>> this format in our own day-to-day work.
>>
>> I have written a new article describing of the revised format:
>> http://conveyal.com/blog/2016/02/06/vex-format-part-two
>>
>> The main differences are 1) it is more regular and even simpler to parse;
>> and 2) file blocks are compressed individually, allowing parallel
>> processing and seeking to specific entity types. It is no longer smaller
>> than PBF, but still comparable in size.
>>
>> Again, I would welcome any comments you may have on the revised format
>> and the potential for a shift to simpler binary OSM formats.
>>
>> Regards,
>> Andrew Byrd
>>
>>
>> On 29 Apr 2015, at 01:35, andrew byrd <andrew at fastmail.net> wrote:
>>
>> Hello OSM developers,
>>
>> Over the last few years I have worked on several pieces of software that
>> consume and produce the PBF format. I have always appreciated the
>> advantages of PBF over XML for our use cases, but over time it became
>> apparent to me that PBF is significantly more complex than would be
>> necessary to meet its objectives of speed and compactness.
>>
>> Based on my observations about the effectiveness of various techniques
>> used in PBF and other formats, I devised an alternative OSM representation
>> that is consistently about 8% smaller than PBF but substantially simpler to
>> encode and decode. This work is presented in an article at
>> http://conveyal.com/blog/2015/04/27/osm-formats/. I welcome any comments
>> you may have on this article or on the potential for a shift to simpler
>> binary OSM formats.
>>
>> Regards,
>> Andrew Byrd
>> _______________________________________________
>> dev mailing list
>> dev at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/dev
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/dev
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/dev
>>
>>
>
>
> --
> Thank you for your time. Best regards.
> Dmitry.
>
>
>

-- 
Thank you for your time. Best regards.
Dmitry.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20160208/4215ff88/attachment-0001.html>