[josm-dev] Date Parsing

Brett Henderson brett at bretth.com
Fri Oct 12 07:49:01 BST 2007


Frederik Ramm wrote:
> Hi,
>
>    I was trying to find out why the preparation of downloaded data
> suddenly took so long. Suddenly I remembered something that Brett
> Henderson had said about osmosis becoming unbearably slow when trying
> to parse dates, and true: I was able to cut the time it required to
> open my sample file from 16 seconds to 2 seconds just by commenting
> out date parsing!
>
> I have now settled on a method where the date is stored verbatim and
> written out exactly like it was read, and only parsed when the
> timestamp is required for display or processing (which doesn't happen
> often in JOSM).
>   
That's a good idea.  I thought about doing it in osmosis but it was a 
bit tricky with the way it is structured at the moment, the xml date 
parsing code is very separate from the data classes.  I might revisit 
this one day.

Apologies for hijacking the thread but here's some additional info ... 
(my ulterior motive is to ask about the JOSM date parsing patch I posted 
recently ;-)

Just on data parsing speed, there's three general "speeds" I've measured 
for date parsing.  I don't have any hard numbers handy.
1. XmlGregorianCalendar for UTC dates.
2. JOSM date parser for any date format, and XmlGregorianCalendar for 
non-UTC dates.
3. Osmosis custom parser which only handles UTC dates.

They're ordered in terms of increasing speed.  The XmlGregorianCalendar 
class is atrocious for UTC dates (with a Z suffix).  The JOSM parser is 
much faster and is about on a par with the XmlGregorianCalendar for 
non-UTC dates.  The custom parser is much faster again but only handles 
UTC dates falling back to XmlGregorianCalendar for non-UTC dates.  Each 
type is almost an order of magnitude faster than the previous type.  
 From memory the osmosis custom parser was close to two orders of 
magnitude faster than XmlGregorianCalendar for UTC dates.

I've mentioned it elsewhere but I was hoping to standardise on a common 
date format that would allow optimisation of the common case.

Osmosis is already writing all dates in the standard UTC format 
previously discussed (eg. "2007-09-05T06:14:50Z").  I spoke to Jon 
Burgess about updating the planet dump task to do the same thing, not 
sure if he's made any progress on that yet.  And I've submitted the 
patch for JOSM.  If those three can be updated it only leaves the API to 
be updated out of the main osm producing tools I'm aware of.

Cheers,
Brett





More information about the josm-dev mailing list