Hi Simon,<br><br>Sorry for the really slow reply on this.  I was kinda hoping somebody else with more knowledge would step in and answer your questions.  I'm not very familiar with the various filtering tasks ...<br><br>

Some comments below.<br><br><div class="gmail_quote">On Wed, Jul 7, 2010 at 3:41 AM, Simon Nuttall <span dir="ltr"><<a href="mailto:simon.nuttall@gmail.com">simon.nuttall@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

We use osmosis to merge geofabrik's UK and Ireland extracts to fill a<br>

database from which we build the CycleStreets.net routing structures.<br>

<br>

osmosis<br>

 -v 100<br>

 --read-xml data/osm/downloads/great_britain.osm enableDateParsing=no<br>

 --sort-0.6 type="TypeThenId"<br>

 --read-xml data/osm/downloads/ireland.osm enableDateParsing=no<br>

 --sort-0.6 type="TypeThenId"<br>

 --merge --write-apidb dbType=mysql populateCurrentTables=no<br>

host=localhost database=britainOSM user=import password=xxx<br>

validateSchemaVersion=no<br>

<br>

That currently takes about 1.5 hours to run, building a 4.5 GB database.<br>

<br>

1. I have just discovered the keyValueList option.<br>

<br>

I could use this as a long list of  e.g.<br>

"highway.residential,highway.unclassified,highway.primary,...cycleway.opposite,access.yes...,foot...,bicycle..."<br>

pairs to filter the ways. I'd estimate there would be about 50 pairs<br>

in the list.<br>

<br>

My question here is would that speed up or slow down the osmosis?<br></blockquote><div><br>If you're writing all output to a database I suspect it would speed the process up.  The --node-key-value and --way-key-value tasks are both very simple in implementation and do HashMap lookup to check whether each node/way should be included.  It could possibly be made more efficient through the use of a Patricia-Trie index or somesuch, but I suspect it will be fairly fast anyway.  You may have to write out node and way data to temp files then merge them together afterwards due to --xxxx-key-value tasks only passing through the type of data they care about.  You can't use wildcards.<br>

<br>You may wish to check out the --tag-filter task which is much more flexible.  I believe you'll be able to use wildcards as you require, and it can also operate entirely using a single stream rather than requiring the use of temporary files as per the --xxxx-key-value tasks.  I have no idea how well it performs, but being able to use wildcards should help somewhat.<br>

 </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

<br>

Is there any scope for wildcards in that list, e.g. cycleway.* ?<br></blockquote><div><br>Yes, using the --tag-filter task.<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


<br>

<br>

2.  Duplicate ways / multiple versions of nodes and ways.<br>

<br>

The processing we do in CycleStreets after this osmosis detects<br>

hundreds of duplicate ways and a few tens of duplicate nodes.<br>

<br>

Are there osmosis options for just catching the latest versions of the<br>

nodes and ways?<br>

(I have to say I don't really understand this area very well.)<br></blockquote><div><br>I suspect your input files also have these duplicates.  Is that the case?<br><br>There is no task for fixing files that have multiple versions, but there is a trick for using the --simplify-change task which I'll have to dig up ...<br>

</div></div><br>Brett<br><br>