<div dir="ltr"><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><div>As for fixed sized blocks in vex, I did consider that option but
couldn’t come up with a compelling reason for it. I can see the case for
a maximum block size (so we know what the maximum size of allocation
will be), but can you give a concrete example of how fixed-size blocks
would be advantageous in practice? I would be very hesitant to split any
entity across multiple blocks.</div><div class=""><div id=":ow" class="" tabindex="0"><img class="" src="https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif"></div></div></blockquote><div><br></div><div>When you need relations-ways-nodes read order, blocks will save you from unnecessary read-through the whole file (yes, you can skip decompression for nodes/ways but still you must read the whole file).<br><br></div><div>Second example: find something by id, if you have blocks it's easy to map whole block into memory and do a binary search for logN block reads instead of seeing through a file all the time.<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2016-02-07 23:59 GMT+05:00 Andrew Byrd <span dir="ltr"><<a href="mailto:andrew@fastmail.net" target="_blank">andrew@fastmail.net</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi Dmitry,<div><br></div><div>Yes, there are similarities and I did study the o5m format before I began work on vex. The last section of my original article compares the two and gives my impressions of o5m: <a href="http://conveyal.com/blog/2015/04/27/osm-formats#comparisons-with-o5m" target="_blank">http://conveyal.com/blog/2015/04/27/osm-formats#comparisons-with-o5m</a></div><div><br></div><div>In summary: o5m uses string tables with a fixed size and an LRU eviction policy. Producers and consumers must keep their string tables exactly in sync. Strings are then referenced by integers indicating how recently they were used (1 to 15000). This adds quite a bit of complexity to o5m implementations, especially considering that this eviction strategy can backfire on certain inputs leading to files that are actually bigger than a basic gzipped text representation of the same data. According to <a href="http://wiki.openstreetmap.org/wiki/Talk:O5m#Compression_Algorithms" target="_blank">http://wiki.openstreetmap.org/wiki/Talk:O5m#Compression_Algorithms</a> o5m uses string tables specifically to avoid relying on general purpose compression. I find this unnecessary considering that zlib compression is quite effective, resource efficient (with adjustable compression level), and available practically everywhere.</div><div><br></div><div>There are a few other unusual design decisions documented and discussed at <a href="http://wiki.openstreetmap.org/wiki/O5m" target="_blank">http://wiki.openstreetmap.org/wiki/O5m</a> and <a href="http://wiki.openstreetmap.org/wiki/Talk:O5m" target="_blank">http://wiki.openstreetmap.org/wiki/Talk:O5m</a>. For example, strings are both introduced and terminated by a null byte, and are often stored in pairs (i.e. three null bytes per string pair, one of which is at the beginning of the string).</div><div><br></div><div>Of course I recognize o5m's contribution to the dialog on binary formats, and we can of course learn from the o5m concept, but my conclusion is that it does not have the combination of extreme simplicity and compactness necessary to complement the existing formats.</div><div><br></div><div>As for fixed sized blocks in vex, I did consider that option but couldn’t come up with a compelling reason for it. I can see the case for a maximum block size (so we know what the maximum size of allocation will be), but can you give a concrete example of how fixed-size blocks would be advantageous in practice? I would be very hesitant to split any entity across multiple blocks.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>-Andrew</div></font></span><div><div class="h5"><div><br><div><blockquote type="cite"><div>On 07 Feb 2016, at 09:06, Дмитрий Киселев <<a href="mailto:dmitry.v.kiselev@gmail.com" target="_blank">dmitry.v.kiselev@gmail.com</a>> wrote:</div><br><div><div dir="ltr"><div><div>Looks pretty similar to o5m, except tags key=value are not round-buffered.<br><br></div>As a further extension, it would be nice to have the ability to have blocks of fixed size. <br>Just write nodes one by one while you haven't full-fill byte buffer.<br></div><div>For extremely big relations (which are larger than one block) it's possible to mark two adjacent blocks as connected, but there should be a few of them.<br></div><div><br></div>It would help to read write and seek over files.<br></div><div class="gmail_extra"><br><div class="gmail_quote">2016-02-07 3:47 GMT+05:00 Stadin, Benjamin <span dir="ltr"><<a href="mailto:Benjamin.Stadin@heidelberg-mobil.com" target="_blank">Benjamin.Stadin@heidelberg-mobil.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="auto">
<div>Hi Andrew,</div>
<div><br>
</div>
<div>Cap'n Proto (successor of ProtoBuffer from the guy who invented ProtoBuffer) and FlatBuffers (another ProtoBuffer succesor, by Google) have gained a lot of traction since last year. Both eliminate many if the shortcomings of the original ProtoBuffer (allow
for random access, streaming,...), and improve on performance also.</div>
<div><br>
</div>
<div><a href="https://github.com/google/flatbuffers" target="_blank">https://github.com/google/flatbuffers</a></div>
<div><br>
</div>
<div>Here is a comparison between ProtoBuffer competitors:</div>
<div><a href="https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html" target="_blank">https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html</a></div>
<div><br>
</div>
<div>In my opinion FlatBuffers is the most interesting. It seems to have very good language and platform support, and has quite a high adoption rate already. </div>
<div><br>
</div>
<div>I think that it's well worth to reconsider creating an own file format and parser for several reasons. Your concept looks well thought, it should be possible to implement a lighweight parser using FlatBuffers for your data scheme. </div>
<div><br>
</div>
<div>Regards</div>
<div>Ben <br>
<br>
<div>Von meinem iPad gesendet</div>
</div><div><div>
<div><br>
Am 06.02.2016 um 22:37 schrieb Andrew Byrd <<a href="mailto:andrew@fastmail.net" target="_blank">andrew@fastmail.net</a>>:<br>
<br>
</div>
<blockquote type="cite">
<div>
<div>Hello OSM developers,</div>
<div><br>
</div>
<div>Last spring I posted an article discussing some shortcomings of the PBF format and proposing a simpler binary OSM interchange format called VEX. There was a generally positive response at the time, including helpful feedback from other developers.
Since then I have revised the VEX specification as well as our implementation, and Conveyal has been using this format in our own day-to-day work.</div>
<div><br>
</div>
<div>I have written a new article describing of the revised format:<br>
</div>
<a href="http://conveyal.com/blog/2016/02/06/vex-format-part-two" target="_blank">http://conveyal.com/blog/2016/02/06/vex-format-part-two</a>
<div><br>
</div>
<div>
<div>The main differences are 1) it is more regular and even simpler to parse; and 2) file blocks are compressed individually, allowing parallel processing and seeking to specific entity types. It is no longer smaller than PBF, but still comparable
in size.</div>
<br>
</div>
<div>Again, I would welcome any comments you may have on the revised format and the potential for a shift to simpler binary OSM formats.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Andrew Byrd</div>
<div><br>
</div>
<div><br>
<div>
<blockquote type="cite">
<div>On 29 Apr 2015, at 01:35, andrew byrd <<a href="mailto:andrew@fastmail.net" target="_blank">andrew@fastmail.net</a>> wrote:</div>
<br>
<div>
<div>
<div>Hello OSM developers,<br>
</div>
<div> </div>
<div>Over the last few years I have worked on several pieces of software that consume and produce the PBF format. I have always appreciated the advantages of PBF over XML for our use cases, but over time it became apparent to me that PBF is significantly
more complex than would be necessary to meet its objectives of speed and compactness.<br>
</div>
<div> </div>
<div>Based on my observations about the effectiveness of various techniques used in PBF and other formats, I devised an alternative OSM representation that is consistently about 8% smaller than PBF but substantially simpler to encode and decode. This
work is presented in an article at <a href="http://conveyal.com/blog/2015/04/27/osm-formats/" target="_blank">
http://conveyal.com/blog/2015/04/27/osm-formats/</a>. I welcome any comments you may have on this article or on the potential for a shift to simpler binary OSM formats.<br>
</div>
<div> </div>
<div>Regards,<br>
</div>
<div>Andrew Byrd<br>
</div>
</div>
_______________________________________________<br>
dev mailing list<br>
<a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a><br>
<a href="https://lists.openstreetmap.org/listinfo/dev" target="_blank">https://lists.openstreetmap.org/listinfo/dev</a><br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div><span>_______________________________________________</span><br>
<span>dev mailing list</span><br>
<span><a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a></span><br>
<span><a href="https://lists.openstreetmap.org/listinfo/dev" target="_blank">https://lists.openstreetmap.org/listinfo/dev</a></span><br>
</div>
</blockquote>
</div></div></div>
<br>_______________________________________________<br>
dev mailing list<br>
<a href="mailto:dev@openstreetmap.org" target="_blank">dev@openstreetmap.org</a><br>
<a href="https://lists.openstreetmap.org/listinfo/dev" rel="noreferrer" target="_blank">https://lists.openstreetmap.org/listinfo/dev</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br><div><div dir="ltr">Thank you for your time. Best regards.<br>Dmitry.</div></div>
</div>
</div></blockquote></div><br></div></div></div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr">Thank you for your time. Best regards.<br>Dmitry.</div></div>
</div>