On Wed, Jan 7, 2009 at 2:28 PM, Brett Henderson <span dir="ltr"><<a href="mailto:brett@bretth.com">brett@bretth.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="Ih2E3d">Karl Newman wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Heh, I was thinking along those lines, too, but I thought I'd wait until your format stabilizes. It would be quite handy to have a random-access data source for Osmosis without requiring a database installation.<br>

</blockquote></div>

I'm not sure if you realise this but there already is one of those.  I can't reach the wiki at the moment but the tasks in question are:<br>

--write-customdb<br>

--read-customdb<br>

<br>

I forget the details of the on-disk format but it's built around the osmosis store classes so uses osmosis specific object serialisation.<br>

<br>

The --read-customdb task doesn't actually stream data to downstream tasks, it provides the data in the form of a "dataset" object.  This can be consumed by tasks such as --dataset-dump which will stream from the dataset, but it can also be consumed by any task wishing to access data randomly.<br>


<br>

Performance isn't great for large datasets which is why I created the pgsql-simple schema.  I've never been able to get an on-disk format to scale nearly as well as a real database.<br>

<br>

I also wrote a set of tasks based on one the Berkeley DB Java Edition but deleted it because performance was even worse.<br><font color="#888888">

<br>

Brett<br>

</font></blockquote></div><br>Actually yes, I was aware of that one, but I recall that you said the performance was not good and I thought you had removed that one for performance reasons (I didn't pay close attention; I guess it was the BDB that you removed instead).<br>

<br>Karl<br>