[OSM-dev] OsmChange format and 0.6
Brett Henderson
brett at bretth.com
Thu Feb 12 12:18:22 GMT 2009
Matt Amos wrote:
> On Mon, Feb 9, 2009 at 1:06 AM, Frederik Ramm <frederik at remote.org> wrote:
>
>> I have come across a strange logic twist and want to confuse you
>> with it. (Maybe it was clear to anybody anyway, don't know.)
>>
It wasn't clear to me. I'd never thought about it before :-)
>> API 0.6 supports uploading OsmChange files, with the additional
>> requirement that each node/way/relation contained in the change file be
>> given an extra "changeset" attribute. (Escapes me now why we did not
>> have that in the header but I remember there was a valid reason for it.)
>>
>
> the valid reason is simply that we re-use the node/way/relation XML
> readers and these require the changeset ID. in fact, the diff doesn't
> even need the changeset ID in the header as the URL already contains a
> valid changeset ID.
>
> at some point in the future it might be worth taking the changeset ID
> out of the element parsers and putting it into the controller. whether
> we want to make the change while 0.6 is so close to release, i'll
> leave for others to discuss ;-)
>
If changesets were truly atomic then I'd be all for this. I would have
invented a new change format in osmosis that simply wrapped changesets
into a replication file (*.osr maybe :-). But with changesets being
longer lived things I'll always have to treat entities individually
within osmosis. I understand why they've been implemented the way they
have, but there's always going to be some messiness.
Now the I realise the API doesn't have that constraint and can assume
groups of entities all belong to a single changeset but other than
saving a few per cent of data traffic I don't think it matters much.
>> Now one should think that if I have two API server and generate a diff
>> from server #1 with Osmosis, it should be possible to upload that to
>> server #2 through the API (after amending every object with a changeset
>> attribute).
>>
>> But that doesn't work: API 0.6 of course expects the uploader to report
>> the current version of each object they have, to make sure it matches
>> the database version before incrementing the version number and storing
>> the new data. Whereas Osmosis of course writes the current version
>> number to the file.
>>
>
> in this, i lean towards both versions being in the file - the new one
> for <create>, the old one for <delete> and both for <modify>, but
> these are probably ideas to be kept for a future version of the
> format.
>
There's always a bit of weirdness in translating from the changeset
format to the database representation. In the database a delete isn't
really a delete, it's a modify that sets the visible flag to false and
therefore gets a new version. A create is also very similar to a modify
in that they both create a new entry in the history table. So when
producing changesets osmosis treats create, modify and delete almost
identically and just spits out the current version regardless. Even if
the API chooses to do something very different, osmosis can't if it
wants to keep two databases identical, it always has to spit out the
latest version of the entity regardless of the action.
As for the version difference between the API and osmosis, they're
talking about slightly different things. The API is dealing with a
request to perform an action, whereas the osmosis is describing the
results of an action, therefore the data differs somewhat. The two sets
of data are going to be almost identical, but the version will differ as
a result of one being the before and one the after. It's a bit like
trying to load a JOSM file into a mysql database, it kind of works
because the data is similar but there's some key differences such as
pending actions, negative identifiers and so on that differ between the
two making them semi incompatible.
If you wished to create a true replication capability via the API, you'd
perhaps need an extra flag of some kind when making an API request that
switches from "change this entity to look like this", to "keep a record
of this entity changing to look like this". In the absence of that I
think we're stuck with having to fudge version numbers.
Slightly longer term, if I ever get around to implementing proper
changeset replication (ie. keeping the mysql changeset table
synchronized) then you won't be able to replicate to the API anyway
because I'll have to enhance the *.osc file format to include changeset
details and there's no way (that I'm aware of) of creating a changeset
with a specific id via the API.
>> So, if you wanted to upload an Osmosis-generated .osc to another server,
>> not only would you have to add changesets, but you would also have to
>> decrement each version number!
>>
>
> yup. this is the disadvantage of the warning in
> http://wiki.openstreetmap.org/wiki/Api06#Summary_of_changes_required_in_clients
> "There is no guarantee that newversion == oldversion+1". so
> decrementing the version number is not even guaranteed to work!
>
I don't think there's any harm in doing the "decrement by one" thing so
long as you're aware that it isn't true replication and will have some
potential wrinkles. Presumably this is mainly used for testing, I would
have thought that a true replication environment would utilise direct
table manipulation.
>
>> Come to think of it, you would also have to replace the IDs of all newly
>> created objects in the osc file by negative IDs so that the destination
>> server can issue its own IDs... so it seems that, while superficially
>> the same, there is a world of difference between an osc file as created
>> by Osmosis and one as read by the API. - Maybe we should have opted for
>> a wholly different format to make this clearer. Sigh.
>>
>
> actually, you don't have to do this - the server treats all incoming
> IDs in <create> blocks as placeholders, whether they're negative or
> not.
>
> however, you'd lose all the timestamp and user information in those
> .osc files to be overwritten with whatever user you're logged-in as
> and the current time.
>
> you're right, maybe we shouldn't have tried to re-use a
> server-to-server sync format for client-to-server communications,
> where things like the allocation of new IDs and setting user and
> timestamp info are not trusted - we could have just omitted them from
> the file. on the other hand - do we really want YAOCF (yet another OSM
> change format) when there are already three?
>
I don't have much to add because the existing change format suits me
nicely :-)
But I don't see a problem with using the same format for multiple
purposes where it works. The basic osm format is used for multiple
purposes as well. It can describe a snapshot of data in planet format,
a set of pending changes for a specific data subset in JOSM format, and
even a complete history of data with "visible" attributes if required.
They all have quite different purposes and represent different things
but share common data types with a few key attribute changes. The osc
format is similar and can represent a set of changes that have occurred
such as osmosis changesets or a set of changes to apply to be uploaded
to the API.
We could get extra funky and create complex schemas that define base
types with the common data attributes then extend them for each specific
purpose but the effort involved is fairly considerable and doesn't
provide much value that I can see. I'd stick with what we have in 0.6,
see how it works and look at refactoring things in 0.7 once we have some
experience with it.
Brett
PS. I've written this so long after the fact that it is probably largely
irrelevant to the discussion now :-)
More information about the dev
mailing list