<br><br><div class="gmail_quote">2009/10/23 Peter Childs <span dir="ltr"><<a href="mailto:pchilds@bcs.org">pchilds@bcs.org</a>></span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

I'm looking to set up a local mirror of the OSM data, so I can index and working out some new wonderful way of searching it. etc etc<div><br></div><div>Anyway, What's the best way to set this up, </div><div><br></div>


<div>I was looking at taking the planet.osm possibly with diffs later and throwing it at a SAX parser and then into a database.</div><div><br></div><div>I did speculate on using OSMOSIS but its too slow and I'm speculating on soundexing and metaphoning the data as its imported,</div>


<div><br></div><div>I'm also looking at being able to build a tree (parent/child) structure for areas, But these are only ideas currently.</div><div><br></div><div>Currently I'm importing planet.osm into a postgres database using osmosis so see how big it is, But its been going all night, and looks like its only done about 5% where as decompressing the planet takes about 2 hours, so I was expecting it done in kind of say 6?</div>


<div><br></div><div>Any ideas/help would be most useful.</div><br></blockquote></div><br>Currently, there are only two competing schemas for OSM database: osmosis and osm2pgsql.<br>A full import is taking time and you will need a machine capable of very throughput in terms of IO. I don't think there is an easy way to import data directly into a mode that will just work. In addition, Osmosis has a SAX parser option which works very nicely. But you will still be limited by your hardware IO performance.<br>

Personally, I believe that soundexing data is not very interesting, as it is very limited (read only English language). Using double metaphone is a better idea, but initially I suspect you might want to know the scope of the search you want to do and then expand on it afterwards.<br>

Working on a full planet isn't going to be the easiest thing to do since it is so huge. You may want to restrict yourself to only a smaller country like UK. In addition, if you want to perform a meaningful search, you will probably need your own database schema. The work that Brian Quinion is doing is absolutely brilliant from that point of view.<br>

<br>Emilie Laffray<br>