<meta charset="utf-8"><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); ">Paul Houle said--- </span><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); "><br>
> That said, my new strategy for dealing with "large dump files" is<br><meta charset="utf-8">> to cut the file into segments (like 'split') and recompress the<br><meta charset="utf-8">> fragments. If your processing chain allows it, this can be a powerful<br>
<meta charset="utf-8">> way to get a concurrency speedup. If more dump files were published in<br><meta charset="utf-8">> this format, we could get the benefits of "parallel compression"<br><meta charset="utf-8">> without the cost.</span></div>
<div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); "><br></span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); ">This reminds me of a<a href="http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html">n excellent solution to a similar problem</a> that may be applicable to dealing efficiently with the planet.osm file. It comes from dealing with a similarly sized wikipedia english language bzip file. </span></div>
<div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); "><br></span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); ">Basically you split it into chunks as you've already done but in addition you build an index that tells you which are the first complete entries of each chunk. Then what you've got is O(1) searching of a huge binary file. Piping the output of bzcat to osmarender means you're seeking through the entire file every time. For Wikipedia at least the entries are self-contained and in alphabetical order so this works. It's a great idea and allows a really fast offline wikipedia reader using all open source tools. Conceivably someone could adapt that for more quickly working with the planet.osm file. </span></div>
<div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); "><br></span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); ">Now there's probably several huge reasons the concept wouldn't work with the planet.osm file, I don't know a thing about it's internal organization so I can't say...</span></div>
<div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); "><br></span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); ">But perhaps there's some amount of data locality that can be exploited to make this work. If there's at least one type of information that we can use to seek through the file and find perhaps a country or a boundary of some sort, then it could be possible.</span></div>
<div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); "><br></span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); ">Your post reminded me of the wikipedia dump solution so I thought I'd mention it. </span></div>
<div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); "><br></span></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); ">Regards,</span></div>
<div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(34, 34, 34); ">-DC</span></div>