[osmosis-dev] preprocessing for turn-restrictions usins an osmosis-plugin

Thu Feb 19 12:26:20 GMT 2009

marcus.wolschon at googlemail.com wrote:
> On Thu, 19 Feb 2009 19:32:18 +1100, Brett Henderson <brett at bretth.com>
> wrote:
>   
>> I'd suggest that we extends the existing Dataset support to expose write 
>> methods.  The other option is that we leave the existing one as it is 
>> and provide a new one that extends the current one to provide write 
>> access (allowing some implementations to support read-only access only) 
>> but it's probably just creating extra work for not much benefit.  If an 
>> implementation doesn't support updates it can throw a meaningful error 
>> message saying write access isn't supported.
>>     
>
> Both ways are fine with me. However for the second one I suggest
> a method "isWriteSupported()".
>
>   
>> Existing tasks such as --read-pgsql which don't actually read but just 
>> expose a read-only database handle (the Dataset) to downstream tasks 
>> would need to be renamed.  Perhaps --get-pgsql-dataset?
>>     
>
> As --read-pgsql does not provide an entity-stream I guess it was
> misnamed from the start. What about --access-pgsql ? 
>   
I don't think it sounds any clearer to be honest :-)  But I'm not too 
fussed about the name for now.
>> The existing Dataset interface exposes the method "DatasetReader 
>> createReader();" and the DatasetReader provides a bunch of methods for 
>> reading from the db.  This might have to be re-factored somewhat.  The 
>> DatasetReader interface might be more appropriately called 
>> DatasetConnection.  The Dataset.createReader method might be more 
>> appropriately called createConnection.  Alternative name could be 
>> DatasetContext and createContext which is more generic than using the 
>> word "Connection" which implies an external database but I'm not too
>>     
> fussy.
>
> Yes, a rename sounds usefull. It was ...quite confusing when I saw
> that for the first time and had to decide what Interface I needed
> to actually implement to provide random access from a new data-format.
>
> Maybe: 
> DatasetReader => Dataset or RandomAccessData or Map
> Dataset => DatasetFactory or RandomAccessFactory or MapFactory
> DatasetSink => DatasetClient or RandomAccessClient or MapAccessClient
>   as the task need not be a sink for data
>   comming from the Dataset but be a source for data going into it or
>   both.
> with the tasks named:
>  --get-XYZ-dataset or --access-XYZ or --XYZ-map
>   
Okay, this might sound pedantic but I don't feel all that strongly about 
it.  The Dataset interface is what is passed between tasks.  It 
represents a set of data to be read (and now modified).  I'd rather not 
call it DatasetFactory because that would imply that it can be used to 
create datasets which is not the case, the data already exists (it may 
be empty but it exists).  The issue I ran into when creating this is 
that multiple threads may access the Dataset concurrently (I initially 
created it to extract many bboxes concurrently).  Where the data is 
backed by a database that means each thread requires its own 
connection.  So each task must instantiate a connection when it starts 
processing, and close it when it is done.  The DatasetReader provides 
the means to hold that connection specific context.  The DatasetReader 
name is no longer appropriate but I thought something equivalent would 
be okay.  Is it really that confusing?

As for the DatasetSink name, I merely named it that to align with the 
rest of the Sink Source model.  I like the sound of DatasetClient.

What does "Map" in some of the above names represent?  Map as in 
OpenStreetMap, or Map as in HashMap?
>
> As for the interface-change, I am currently using this:
> http://travelingsales.svn.sourceforge.net/viewvc/travelingsales/libosm/src/org/openstreetmap/osm/data/IDataSet.java?view=markup
> For OsmBin it has added methods
> getRelationsForWay(), getRelationsForNode(), getRelationsForRelation()
> instead of just getWaysForNode() .
> There is no getAllNodes() or getAllWays(), instead it uses
> getNodes(Bounds.World),
> however many backends do not scale in that method. (I did not implement for
> streaming large areas from the database once they are imported.)
>   
Those methods all sound good.
> If your new interface is not completely different I can provide you
> with tested implementations of random access to 
> * in-memory data (good for temporary data and unit-tests)
> * osm-xml-files  (could prove very usefull)
> * osm-xml-files split by tile-number with hsqldb for an id->tile index
> * osmbin with 3 additional lower-detail maps that are generated and updated
> on the fly
>   (native format for Traveling Salesman)
> as well as less tested code for
> * hsqldb
> * mysql
> where you may borrow code to implement the update-functionality
> for your existing mysql and postgis -datasets.
>   
This also sounds great.  Once we get the core in place we can look at 
adding some of these.
>> Your task could then implement both the DatasetSink and Sink interfaces 
>> allowing it to receive an incoming entity stream, and a handle to a 
>> database.  The command line would look something like:
>> osmosis --read-xml myfile.osm --get-pgsql-dataset 
>> dbAuthFile=authFile.txt --induce-ways-for-turn-restrictions.
>>
>> What do you think?
>>     
>
> Sounds very good.
> Is there anything I can help with?
>   
You can code it if you like :-)  Do you have svn access?

I haven't been doing much lately other than writing emails.  I still 
haven't done the writeable entity stuff although that shouldn't take 
long once I get into it.  I want to get that done first, then I might 
have time for this writeable Dataset stuff.  If you want to make a start 
though it would be great.  The days of me having time to do it all 
myself appear to be over.

Brett