[OSM-dev] OSM Date Formats
Brett Henderson
brett at bretth.com
Sat Sep 29 02:25:14 BST 2007
Hi All,
There are a number of ways dates are being represented in osm files
causing complexity and performance impact when parsing them.
JOSM writes dates in this format: "2007-07-22 13:42:29"
Osmosis writes dates in this format: "2007-07-10T11:32:32.000Z"
planet.rb writes dates in this format: "2007-02-12T18:43:01+00:00"
Note that osmosis and planet.rb are both using the correct xml date
format, but osmosis is always writing in UTC format represented by the
Z. The osmosis format is shorter but is including the millisecond
information which isn't necessary. From what I can tell, JOSM has just
done its own thing.
Parsing these files is tricky. Osmosis actually uses three separate
parsers, a custom parser reading the UTC format for speed, a standard
xml date parser as a fallback, and a customised JOSM parser as a second
fallback for all remaining cases. If you have a look at the JOSM code,
it has all kinds of trickery to parse a date format much of which
appears unnecessary if it stuck to the standard java support for xml
data types (javax.xml.datatype.XmlGregorianCalendar is available from
java 1.5).
I'd like to standardise on a common format. The custom osmosis parser
provides almost 10x speed improvement over the generic java xml date
parser but only works for a single format (currently the osmosis one), I
don't want to write custom parsers for every format combination out
there. I thought that osmosis was going to become the new planet dumper
which made the problem go away for me but it appears that's no longer
the case with planet.c stepping in.
I'd like to standardise on a UTC date with Z suffix similar to the
osmosis example. I am willing to remove the millisecond information to
make it even shorter if necessary (I'll have to write my own formatter
but not a big job). As a nice side effect this would noticeably reduce
the size of the planet.
Thoughts?
Cheers,
Brett
More information about the dev
mailing list