[OSM-dev] Ordering OSM-Files (was: visible-Flag in PBF)

Thu May 12 00:26:33 BST 2011

On Tue, May 10, 2011 at 11:12 AM, Jochen Topf <jochen at remote.org> wrote:
> On Mon, May 09, 2011 at 05:41:19PM -0500, Scott Crosby wrote:
>> Yes. How about the following optional feature flags to indicate sort order:
>>
>>  Sort.NodeWayRelationThenID
>>      --- Because this is the default order of the planet.
>>  Sort.Unordered
>>      --- Explicitly indicate that the order is unknown.
>>  Sort.RelationWayNodeThenID
>>      --- Because this order makes multipolygon/region extraction more efficient.
>>
>> What are the motivations for the other sort orders you proposed above?
>
> Thats the problem. I don't know yet what orders will make sense and I don't
> think we should change the definitions of the PBF file each time a new order is
> "invented". But defining all of them from the get-go would be too many because
> its a combinatorical problem.

I see no way to avoid preplanning. Readers and writers must be able to
agree on the meanings of various sort-order flags, otherwise they
cannot use them. That description might as well go in the
specification. This choice is also OSM-wide --- because XML files
likewise need to indicate the sort order.

>
> I am also thinking ahead to when we might have more types, for instance an area
> type. How would this fit in? Do we then need Sort.NodeWayAreaRelationThenID?

Yes.  And, as these files would likely be incompatible with earlier
software, we define additional keys.

>
> Maybe we can come up with a better idea how to code this.

We can reduce the combinatorial explosion by decomposing a bit:

 EntitiesSorted:Node,Way,Relation
 EntitiesSorted:Relation,Way,Node
 EntitiesSorted:Unordered

 NodesSorted:IdIncreasing
 NodesSorted:IdDecreasing
 NodesSorted:IdUnordered

 WaysSorted:IdIncreasing
 WaysSorted:IdDecreasing
 WaysSorted:IdUnordered

and similarily for relations.

>> I'm now less convinced of the value of this feature. What is it for?

(feature : Including entity counts for each block)

>
> To save the cost of the zip decompression for blocks we are not interested
> in. I'll do some benchmarking to see how much this would save.

I seem to recall that osmosis can decode and --write-null a PBF planet
in about 15 minutes. Getting entity counts for each block should be
faster than that. You only need to decode to protocol buffers, the
delta decoding and creating entities can be skipped.

Scott