[OSM-dev] Osmosis enableDateParsing

Brett Henderson brett at bretth.com
Sun Jan 11 21:40:17 GMT 2009


Jochen Topf wrote:
> On Fri, Jan 09, 2009 at 11:31:03PM +1100, Brett Henderson wrote:
>   
>> Jochen Topf wrote:
>>     
>>> On Thu, Jan 08, 2009 at 09:29:50PM +1100, Brett Henderson wrote:
>>>   
>>>       
>>>> Jochen Topf wrote:
>>>>     
>>>>         
>>>>> On Thu, Jan 08, 2009 at 12:06:59PM +1100, Brett Henderson wrote:
>>>>>         
>>>>>           
>>>>>> However it should be very fast now.  I recently implemented a 
>>>>>> change to keep the date as a string internally (I thought I sent 
>>>>>> you an email about this, could be wrong).  If it is written to 
>>>>>> xml again the previous date string is used unchanged.  If it is 
>>>>>> written to something like a database then the string will be 
>>>>>> parsed when it is required.  Even if parsing does occur, the 
>>>>>> custom code used internally is orders of magnitude faster than 
>>>>>> standard java xml date parsing classes.  It is the same code I 
>>>>>> submitted for JOSM though so has been around for some time now.
>>>>>>             
>>>>>>             
>>>>> So can we get rid of the option?
>>>>>         
>>>>>           
>>>> It is still useful if the input file doesn't include timestamps.  I  
>>>> remember somebody using that recently.  I'll add a comment to the 
>>>> wiki to let people know it isn't useful for speed any more.
>>>>     
>>>>         
>>> Shouldn't Osmosis be able to see by itself that there is no timestamp to
>>> parse in that case?
>>>   
>>>       
>> A timestamp is required by most downstream tasks.  Osmosis could detect  
>> a missing timestamp but then it would have to invent one.  I'd prefer to  
>> play it safe than do something unexpected like that.  But I don't care  
>> strongly about it.
>>     
>
> Why invent one? Wouldn't it just be "null"? All jobs, of course, must
> handle this case gracefully. But they should anyway check their input. And
> if they can absolutely not work without a timestamp, they can throw an
> exception.
>
> I don't see what the timestamp parsing option can add. Either there is a
> timestamp or there isn't. The option can't influence that.
>   
We have three cases to deal with:
1. Files that contain timestamps.
2. Files that are erroneously missing timestamps.
3. Files that missing timestamps due to some input tool not providing them.

Case 1 is fully supported using the current implementation.
Case 2 is an error condition and should be treated as such.  The current 
code could be improved here to raise a better error message, I think it 
results in a NullPointerException at the moment which is a bug.
Case 3 is the one we're discussing here.

The current implementation is simple because it assumes a timestamp is 
mandatory.  If a timestamp is not available but you wish to process the 
file, the way of getting osmosis to manufacture a timestamp for you is 
with the enableDateParsing=false option.  Perhaps the name isn't ideal 
but the option indicates whether the timestamp should be read from the 
input file, or taken from the task instantiation timestamp.

We can go down the path of making timestamp optional throughout osmosis 
but I'd question its usefulness.  Currently 99(.9?)% of files out there 
contain a timestamp.  The only ones that don't (perhaps created 
manually) cannot be processed by many tasks without a timestamp being 
added which is something the existing option provides.  Having to check 
for nulls all over the place is tedious and easy to miss.  I normally 
make everything mandatory unless there's a good case for making it 
optional.  From memory almost everything supported by osmosis is 
currently mandatory.  The only exception I can think of is the user (and 
uid) attributes, but in that case I use a static OsmUser instance called 
"NONE" rather than using null.

So allowing null and handling it appropriately will work.  If you or 
somebody else is willing to make the change then I'm okay with it.  I 
just think its a waste of time.

Having said all that, if timestamps are not going to be provided by all 
tools in 0.6 timeframe then I'll take back everything I've said :-)  Is 
JOSM always adding a timestamp now that we have version numbers?

Brett






More information about the dev mailing list