[OSM-dev] Osmosis, Changesets, Diffs (replicate) and general questions
Brett Henderson
brett at bretth.com
Sat Oct 31 01:49:13 GMT 2009
Lars Francke wrote:
>>> I'd like to include full changeset information in diffs but it's not
>>> trivial. I'm not sure if I'll ever get to this personally. I'd love to
>>> see somebody take it on though.
>>>
>> I'll have a look at it but I don't want to get your hopes up :)
>>
>
> I had a look and my initial enthusiasm has been dampened a little.....a lot :)
>
Hehe, welcome to my world :-)
> I think I understand the PostgreSQL/Skytools/xmin/transaction stuff
> now but I have to admit that I have problems with all the
> indirections/layers/redirections in EntityDao and some other classes.
> But it seems to me that it wouldn't be the best idea to make Changeset
> an Entity subclass as there are just too many differences.
>
This is one of the things I'm struggling with as well. The Bound data
type has similar issues because it doesn't even have an id, but it's
fudged in order to pass it through the same pipeline. I'm torn between
passing changesets through as another Entity type and having some unused
fields, and introducing more specific object hierarchies and making the
pipeline more generic/complex.
Honestly, I don't have any strong opinions on this one. It's hard to
work up enough enthusiasm to tackle it.
Where it starts to get tricky is in detecting which changesets need to
be sent through the pipeline, and in particular how to transfer the
bounding box information. From memory you need to send it through when
it's created (to avoid foreign key problems), when it's used by entities
(because the bounding box might have been updated), and when it is
closed. Not sure how that relates to the new replication code though,
the new algorithm would need to be figured out for that one.
> ReplicationDestination on the other hand needs ChangeContainers which
> require EntityContainers, which require Entity objects. Some of the
> functions in EntityDao don't apply either (changeset has no version or
> "timestamp", ...). In addition the OSMWriter would have to be extended
> and a ChangesetWriter needs to be written. I could certainly try to
> hack something together but I'm afraid that it'd fall short of your
> code standards :)
>
If the main issue is that timestamp and version are not used in a
Changeset then I'd lean towards just sub-classing Entity. It's kinda
messy but not atrocious. The alternatives add a lot of complexity.
My main issue with accepting patches is that I'm usually the one who has
to maintain them so I do tend to be a little fussy ;-) But so long as
it has reasonable test coverage, and all tasks are updated (ie.
including the existing timestamp based replication tasks, and all
downstream tasks such as xml writers), and the various code checks (eg.
checkstyle) pass then I shouldn't have a problem. If you're keen to
have a go we can create a branch and do some experiments to see what works.
> * I _only_ looked at the replication task, so it is _very_ possible
> that I overlooked something and my changes would break compatibility.
> I'll have a second look at this
> * The xmin-index for the changeset-table would have to be created but
> I suppose that wouldn't be a big problem
>
Yep, should be fine. The changeset table is relatively small compared
to the node table for example so we shouldn't have an issue getting the
index created if it is necessary.
> But I'd be glad if you could give me any pointers. I still won't
> promise anything but I'm still reading the code...so who knows.
>
You seem to have a reasonable handle on it. To be honest I'm not too
sure where to begin :-) There's a lot in there and I struggle to
remember how it all works. About all you can do is focus on one task at
a time. One thing that is quite confusing is that there are several
different database access methods in use. There's the original code
used by the old --read-apidb-change type tasks, then there's improved
code in the pgsql tasks, then there's the new replication tasks which
are Spring Framework based. As a result there's some redundant classes
in there that could be eliminated with a good refactor and rewrite of
the apidb tasks. The new Spring Framework stuff is the direction I'm
heading in as it requires far less code, is cleaner, and is less error
prone.
Don't be scared to have a go at it, and feel free to ask me to take a
look at some code before you spend too much time on it. So long as it's
done in a branch to keep the trunk relatively stable.
> On another note: Has anyone ever had a look at alternative database
> systems for OSM? No, I don't propose a change! I'd just be interested
> if anyone had a look at systems like HBase, MonetDB (Stefan de Konink
> does a lot of stuff with MonetDB if I remember correctly), MongoDB,
> CouchDB, Cassandra, ... and their possible use cases for OSM.
>
The only ones I've played with a MySQL (obsolete), PostgreSQL/PostGIS
(basis of apidb and pgsql tasks), and Berkeley DB Java Edition (deleted
because I couldn't get it to scale).
Cheers,
Brett
More information about the dev
mailing list