[OSM-dev] OsmChange format and 0.6

Thu Feb 12 12:18:22 GMT 2009

Matt Amos wrote:
> On Mon, Feb 9, 2009 at 1:06 AM, Frederik Ramm <frederik at remote.org> wrote:
>   
>>    I have come across a strange logic twist and want to confuse you
>> with it. (Maybe it was clear to anybody anyway, don't know.)
>>     
It wasn't clear to me.  I'd never thought about it before :-)
>> API 0.6 supports uploading OsmChange files, with the additional
>> requirement that each node/way/relation contained in the change file be
>> given an extra "changeset" attribute. (Escapes me now why we did not
>> have that in the header but I remember there was a valid reason for it.)
>>     
>
> the valid reason is simply that we re-use the node/way/relation XML
> readers and these require the changeset ID. in fact, the diff doesn't
> even need the changeset ID in the header as the URL already contains a
> valid changeset ID.
>
> at some point in the future it might be worth taking the changeset ID
> out of the element parsers and putting it into the controller. whether
> we want to make the change while 0.6 is so close to release, i'll
> leave for others to discuss ;-)
>   
If changesets were truly atomic then I'd be all for this.  I would have 
invented a new change format in osmosis that simply wrapped changesets 
into a replication file (*.osr maybe :-).  But with changesets being 
longer lived things I'll always have to treat entities individually 
within osmosis.  I understand why they've been implemented the way they 
have, but there's always going to be some messiness.

Now the I realise the API doesn't have that constraint and can assume 
groups of entities all belong to a single changeset but other than 
saving a few per cent of data traffic I don't think it matters much.
>> Now one should think that if I have two API server and generate a diff
>> from server #1 with Osmosis, it should be possible to upload that to
>> server #2 through the API (after amending every object with a changeset
>> attribute).
>>
>> But that doesn't work: API 0.6 of course expects the uploader to report
>> the current version of each object they have, to make sure it matches
>> the database version before incrementing the version number and storing
>> the new data. Whereas Osmosis of course writes the current version
>> number to the file.
>>     
>
> in this, i lean towards both versions being in the file - the new one
> for <create>, the old one for <delete> and both for <modify>, but
> these are probably ideas to be kept for a future version of the
> format.
>   
There's always a bit of weirdness in translating from the changeset 
format to the database representation.  In the database a delete isn't 
really a delete, it's a modify that sets the visible flag to false and 
therefore gets a new version.  A create is also very similar to a modify 
in that they both create a new entry in the history table.  So when 
producing changesets osmosis treats create, modify and delete almost 
identically and just spits out the current version regardless.  Even if 
the API chooses to do something very different, osmosis can't if it 
wants to keep two databases identical, it always has to spit out the 
latest version of the entity regardless of the action.

As for the version difference between the API and osmosis, they're 
talking about slightly different things.  The API is dealing with a 
request to perform an action, whereas the osmosis is describing the 
results of an action, therefore the data differs somewhat.  The two sets 
of data are going to be almost identical, but the version will differ as 
a result of one being the before and one the after.  It's a bit like 
trying to load a JOSM file into a mysql database, it kind of works 
because the data is similar but there's some key differences such as 
pending actions, negative identifiers and so on that differ between the 
two making them semi incompatible.

If you wished to create a true replication capability via the API, you'd 
perhaps need an extra flag of some kind when making an API request that 
switches from "change this entity to look like this", to "keep a record 
of this entity changing to look like this".  In the absence of that I 
think we're stuck with having to fudge version numbers.

Slightly longer term, if I ever get around to implementing proper 
changeset replication (ie. keeping the mysql changeset table 
synchronized) then you won't be able to replicate to the API anyway 
because I'll have to enhance the *.osc file format to include changeset 
details and there's no way (that I'm aware of) of creating a changeset 
with a specific id via the API.
>> So, if you wanted to upload an Osmosis-generated .osc to another server,
>> not only would you have to add changesets, but you would also have to
>> decrement each version number!
>>     
>
> yup. this is the disadvantage of the warning in
> http://wiki.openstreetmap.org/wiki/Api06#Summary_of_changes_required_in_clients
> "There is no guarantee that newversion == oldversion+1". so
> decrementing the version number is not even guaranteed to work!
>   
I don't think there's any harm in doing the "decrement by one" thing so 
long as you're aware that it isn't true replication and will have some 
potential wrinkles.  Presumably this is mainly used for testing, I would 
have thought that a true replication environment would utilise direct 
table manipulation.
>   
>> Come to think of it, you would also have to replace the IDs of all newly
>> created objects in the osc file by negative IDs so that the destination
>> server can issue its own IDs... so it seems that, while superficially
>> the same, there is a world of difference between an osc file as created
>> by Osmosis and one as read by the API. - Maybe we should have opted for
>> a wholly different format to make this clearer. Sigh.
>>     
>
> actually, you don't have to do this - the server treats all incoming
> IDs in <create> blocks as placeholders, whether they're negative or
> not.
>
> however, you'd lose all the timestamp and user information in those
> .osc files to be overwritten with whatever user you're logged-in as
> and the current time.
>
> you're right, maybe we shouldn't have tried to re-use a
> server-to-server sync format for client-to-server communications,
> where things like the allocation of new IDs and setting user and
> timestamp info are not trusted - we could have just omitted them from
> the file. on the other hand - do we really want YAOCF (yet another OSM
> change format) when there are already three?
>   
I don't have much to add because the existing change format suits me 
nicely :-)

But I don't see a problem with using the same format for multiple 
purposes where it works.  The basic osm format is used for multiple 
purposes as well.  It can describe a snapshot of data in planet format, 
a set of pending changes for a specific data subset in JOSM format, and 
even a complete history of data with "visible" attributes if required.  
They all have quite different purposes and represent different things 
but share common data types with a few key attribute changes.  The osc 
format is similar and can represent a set of changes that have occurred 
such as osmosis changesets or a set of changes to apply to be uploaded 
to the API.

We could get extra funky and create complex schemas that define base 
types with the common data attributes then extend them for each specific 
purpose but the effort involved is fairly considerable and doesn't 
provide much value that I can see.  I'd stick with what we have in 0.6, 
see how it works and look at refactoring things in 0.7 once we have some 
experience with it.

Brett

PS. I've written this so long after the fact that it is probably largely 
irrelevant to the discussion now :-)