[OSM-dev] Indexing of PBF files
Florian Lohoff
f at zz.de
Mon Feb 14 23:16:22 UTC 2022
Hi,
On Mon, Feb 14, 2022 at 05:21:48PM +0800, Andrew Byrd wrote:
> Jochen, you stated that there are good reasons why it’s standard to
> sort PBF files by ID. For future reference, can you confirm the
> reasons?
It is important when applying changes. You can concurrently stream the
changes and the original file without a lot of memory usage and apply
the matching ids from the changes file. This process is then bound
by streaming io and possibly CPU but not block storage latency.
Ordering concerning nodes, way, relations referencing. Later
objects in the file only backreference to earlier ones not the other way around
(Except for relations which may reference later/higher relation ids)
So the format is optimized (and has been even from .xml times) for
efficient updating and reduced memory consumption while processing
the whole file.
When processing files in the size of the planet you may trade
disk i/o against memory consumption. As for the whole planet we have,
for most users, overflown physical memory so it is down to optimizing
disk i/o.
By adding the osm_id ordering constraint you can switch processing from
"random io" to "streaming io" which makes a huge difference even for
nvme-ssd based block storage. You trade disk storage iops against
streaming io.
E.g. 500K iops against 3GByte/s streaming read.
Flo
--
Florian Lohoff f at zz.de
Any sufficiently advanced technology is indistinguishable from magic.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20220215/1067559b/attachment.sig>
More information about the dev
mailing list