[osmosis-dev] osmchange file produced by osmconvert gives osmosis error

Tue Aug 7 13:39:49 BST 2012

Hi,

On 04.08.2012 07:01, Martijn van Exel wrote:
> I produced this small osmchange file using a recently compiled
> osmconvert: https://gist.github.com/3254628
> the osmosis --rxc task chokes on it though: https://gist.github.com/3254632
>
> Ideas?
yes, the cause of this is pretty simple. Osmosis requires lat and lon 
attributes on nodes, and these attributes are not set in the change file 
on the deleted nodes.

While the cause is pretty simple, the fix for this (_if_ we want to fix 
this) isn't going to be simple. The abscence of the lat/lon attributes 
on the deleted nodes is conceptually OK since it is going to be ignored 
anyway. However, the <node> parser (NodeElementProcessor to be precise) 
does not know about the context of the <node> element - and the 
osmChange <delete> is the only case where lat/lon is actually optional.

Also, the Osmosis data model (the Node class) has an implicit assumption 
that the nodes always have lat/lon - the fields are typed as double and 
can't thus be set to null, and there is no such thing like 
Node.isPositionDefined().

There also was a recent related API change which makes lat/lon optional 
for already deleted (visible="false") nodes, so now seems a good point 
to talk about that too.

So what can be done about this:

1. Make the lat/lon attributes optional and change the data model 
accordingly.
2. Make the lat/lon attributes optional everywhere and don't change the 
data model.
3. Teach the NodeElementProcessor about context it is called from and 
make it generate some bogus and/or sentinel values for the Node classes 
in that case (0, Double.NaN, Double.MAX_VALUE, whatever).
4. Ignore the whole issue and just say that lat/lon is required.

My personal and also very humble opinion about those:

1.: For this, we would need to change all the downstream tasks to check 
if the position is actually there. Given that the correct absence of 
lat/lon is a quite rare case, this seems rather costly.
2. and 3.: These are actually almost the same thing. If only the parser 
thinks lat/lon are optional and the downstream tasks think it's always 
there, the parser needs to put some value in the node for 
Node.getLatitude() to return.

The only difference between 2 and 3 is the amount of validation done in 
the parser (and therefore the assumptions the downstream tasks can have 
about the data). I think it's better validate in the parser and fail 
fast. Yes, bringing context to NodeElementProcessor is a bit ugly - but 
hey, it's deep down in the implementation classes and nobody sees it. If 
the input is actually broken (normal OSM XML with lon/lat not set), it's 
better to fail in the parser than pretend that it's OK leading to 
undefined behavior downstream.

4.: Well, we _could_ ignore osmconvert producing those files (no offense 
intended, Markus) and argue that it's a bug in osmconvert and it should 
be fixed on that side and so on. But I don't think it is wise to ignore 
the API... So this is not an option, despite having a very attractive 
implementation cost of zero :)

To sum it up, I tend to prefer 3.

Other thoughts on this point, anyone? :)

Greetings from Stuttgart
Igor