[OSM-dev] 0.6 move and downtime (re-scheduled)

Fri Mar 13 02:37:50 GMT 2009

Grant Slater wrote:
> Large imports in the pipeline.

Partitioning is a scalable solution to that, not buying new hardware.

>> Now it is nice you put 32GB (extra expensive) memory in there, but 
>> most likely your hot performance would be far better with more (cheap) 
>> memory than more disks. At the time I wrote my paper on OSM Dec2008, 
>> there was about 72GB of CSV data. Thus with lets say 128GB you will 
>> have your entire database *IN MEMORY* no fast disks required.
> 
> Indexes in memory, not data.

The current academic community disagrees with you on that. The data is 
in your memory anyway because there is no direct relation to your 
processor and disk. So if you call it blockcache, mmap based loading, or 
malloc based indexing; it is all the same, investing in the wrong 
bottleneck.

> Yes; 32GB is intentionally low, price point. with lots of room to 
> expand. 8GB-DDR2-ECC is not yet available. (It will also need to climbed 
> off the silly price shelf).
> Currently DB is 344GB with indexes, excluding binary logs.

Wow... (serious wow) I have never seen the database THAT expanded unless 
I was using an XML database.

>> ...or are you actually moving from OS to Solaris to utilize those 10 
>> disks for your lets say less than 100G worth of geodata using them as 
>> duplicates in the pool [opposed to integrity duplicates]?
> 
> Solaris ZFS right?

Zpools yes.

> We are going with Linux because it's within our current skillset.
> RAID10 is extremely high read and write performance, ideally suited to 
> our database load.
> I'll look again at RAID5E and RAID6 (effectively pool duplicates with 
> entire array integrity), but they will likely again be discarded due to 
> slow write performance on small blocks.

You are missing the point, with RAID you cannot move your head faster 
you can only LOAD a set of data with a higher rate. This is not your 
typical database action, unless you keep your complete database on disk 
and scan trough it every time you have a query. Not the case if you have 
indices and want to find data from random parts on a disk. Therefore 
your seek times will only decrease if you can search on the individual 
disk not as a combined pair.

Stefan