[OSM-dev] Changeset files (was Removing Minutely and Hourly Changesets)

Anthony osm at inbox.org
Mon Dec 21 14:24:36 GMT 2009

On Mon, Dec 21, 2009 at 6:44 AM, Jon Burgess <jburgess777 at googlemail.com>wrote:

> On Mon, 2009-12-21 at 01:08 -0500, Anthony wrote:
> > Cool.  If anyone familiar with the planet dumper tool is listening...
> >
> > In
> >
> http://svn.openstreetmap.org/applications/utils/planet.osm/C/output_osm.c
> >
> > } else if ((*in >= 0) && (*in < 32)) {
> >             escape_tmp[len] = '?';
> >             len++;
> >
> > should be something like
> >
> > } else if ((*in > 0) && (*in < 32)) {
> >             len+=sprintf(&escape_tmp[len], "&#%d;", *in);
> >
> > "Something like" as in I haven't even checked if that compiles :).
> Most of the control characters are not allowed in a valid XML file. It
> makes no difference whether they are present as an ASCII character or as
> the equivalent entity.

Ah yes.  Hmm.  That said, most of the characters actually in the database
are carriage returns, which along with tabs and line feeds (also in the db)
are valid in XML.  Other characters are present - for instance ASCII 3 in
http://www.openstreetmap.org/browse/changeset/1325382 - those will be more
of a problem.

Hopefully the database can be cleaned of the rest of the characters, because
I'd imagine each dumper is going to have a slightly different way of dealing
with them.  Until that's done, I guess there's no right answer.

> > Of course, another thing to consider is that 1024 bytes isn't enough
> > for the truly pathological cases.  I think you need like 1531 or
> > something to handle that.  Fixing this might be enough to properly
> > process the current db, though.
> How do you arrive at the 1531 number?


Not sure if that's the absolute longest encoded string.  But 255 quotes
makes a valid key/value, and the planet dumper would truncate it, right?

> Any chance of adding num_changes?
> The current output reflects the same information as the /changeset API
> call. Do you think it should be there too?

Not as a bug, but as a feature request, I guess so.  It's more useful in the
dumps than the API (you can use it to make sure you've got everything
downloaded), but it'd be useful in the API as well, I suppose.  It seems to
be in the DB, so there shouldn't be a performance impact, right?

I see it's mentioned on http://wiki.openstreetmap.org/wiki/.osm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20091221/b080bf24/attachment.html>

More information about the dev mailing list