[josm-dev] Date Parsing
Brett Henderson
brett at bretth.com
Fri Oct 12 07:49:01 BST 2007
Frederik Ramm wrote:
> Hi,
>
> I was trying to find out why the preparation of downloaded data
> suddenly took so long. Suddenly I remembered something that Brett
> Henderson had said about osmosis becoming unbearably slow when trying
> to parse dates, and true: I was able to cut the time it required to
> open my sample file from 16 seconds to 2 seconds just by commenting
> out date parsing!
>
> I have now settled on a method where the date is stored verbatim and
> written out exactly like it was read, and only parsed when the
> timestamp is required for display or processing (which doesn't happen
> often in JOSM).
>
That's a good idea. I thought about doing it in osmosis but it was a
bit tricky with the way it is structured at the moment, the xml date
parsing code is very separate from the data classes. I might revisit
this one day.
Apologies for hijacking the thread but here's some additional info ...
(my ulterior motive is to ask about the JOSM date parsing patch I posted
recently ;-)
Just on data parsing speed, there's three general "speeds" I've measured
for date parsing. I don't have any hard numbers handy.
1. XmlGregorianCalendar for UTC dates.
2. JOSM date parser for any date format, and XmlGregorianCalendar for
non-UTC dates.
3. Osmosis custom parser which only handles UTC dates.
They're ordered in terms of increasing speed. The XmlGregorianCalendar
class is atrocious for UTC dates (with a Z suffix). The JOSM parser is
much faster and is about on a par with the XmlGregorianCalendar for
non-UTC dates. The custom parser is much faster again but only handles
UTC dates falling back to XmlGregorianCalendar for non-UTC dates. Each
type is almost an order of magnitude faster than the previous type.
From memory the osmosis custom parser was close to two orders of
magnitude faster than XmlGregorianCalendar for UTC dates.
I've mentioned it elsewhere but I was hoping to standardise on a common
date format that would allow optimisation of the common case.
Osmosis is already writing all dates in the standard UTC format
previously discussed (eg. "2007-09-05T06:14:50Z"). I spoke to Jon
Burgess about updating the planet dump task to do the same thing, not
sure if he's made any progress on that yet. And I've submitted the
patch for JOSM. If those three can be updated it only leaves the API to
be updated out of the main osm producing tools I'm aware of.
Cheers,
Brett
More information about the josm-dev
mailing list