[OSM-dev] Indexing of PBF files
William Temperley
willtemperley at gmail.com
Tue Oct 16 21:13:09 UTC 2018
No, Osmium can't do what I described. The reader thread / worker thread
model you describe does not read the data in parallel on multiple machines,
which is what I have been doing, albeit with a preprocessing step to
separate the blocks as they are not currently directly addressable, or even
seperable, without a sequential read.
A delimiter would however solve this problem.
On Tue, 16 Oct 2018 at 22:43, Jochen Topf <jochen at remote.org> wrote:
> On Tue, Oct 16, 2018 at 10:18:08PM +0200, William Temperley wrote:
> > Requiring the sequential read makes using the pbf format difficult in
> data
> > parallel processing.
> >
> > When files are split into equal sized chunks to be processed in parallel,
> > it is necessary to be able to seek to the beginning of the next block
> > (blob) to begin processing there.
> >
> > This is not currently possible with the pbf format, as the file _must_ be
> > read sequentially to figure out where the blob ends / new one begins.
> With
> > an index, or even just a simple delimiter it would be possible to figure
> > this out in a parallel processing scenario.
>
> Osmium can do this just fine. It has one thread reading the data
> sequentially, figuring out where the blocks start and end and parceling
> out the block decoding work to other threads. Not as simple and probably
> not quite as fast as with an index pointing to those blocks, but it does
> work.
>
> Indexes have the drawback that you can't streaming-write the data any
> more, you have to go back to write the index. Or you write them at the
> end, then you can't streaming read any more (at least when you want to
> use the index).
>
> Jochen
> --
> Jochen Topf jochen at remote.org https://www.jochentopf.com/
> +49-351-31778688
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20181016/7f31e4ea/attachment.html>
More information about the dev
mailing list