[Talk-ca] Bodies of seawater in Canada - area definitions

Fri Oct 22 01:47:37 UTC 2021

Really off topic but a bit of background. We measure disk performance in 
milliseconds, memory in nanoseconds and there is one million nanoseconds 
in a millisecond.  Basically memory costs more per gig than a hard disk 
does. Hard disks rotate so you get a delay when you rotate and if you 
have to move the head that has a cost as well.  Traditionally we 
improved performance by reading in the entire track of data from the 
hard disk rather than wait to get the bit we want. We'd cache the bit we 
didn't need on the hard drive.  Typically databases are not CPU limited, 
my experience was 3% CPU utilization was quite normal.  The limiting 
factor typically would be memory and disk access.

SSDs have improved things we no longer have to move the head.

TPC.org has a series of benchmarks and companies put together hardware 
and software to compete in the benchmarks that include price 
performance.  They are a bit artificial in that typically the log files 
are turned off and sometimes the code is optimised for the benchmark but 
they do give a very good idea of the optimal machine configuration if 
you look at the detail of the benchmarks.  Looking at one of the 
cheapest price performance configurations it has 16 x 64 GB memory 
modules for example.  I'll let you do the maths.  The companies will use 
whatever they need software wise so some will be happy with Windows, 
some with UNIX of one flavour or another.  You'll see many different 
database software being used but PostgreSQL isn't mentioned often in the 
price performance charts.  It probably doesn't have a particular feature 
that is tested by the benchmark.  That doesn't mean it isn't a good 
choice for OpenStreetMap.

So the secret of price performance or even just performance is 
shovelling data into the CPU and if it comes from memory so much the 
better.  The best databases anticipate what the CPU will want next. When 
it's time to write to the hard drives they'll do what is called a lazy 
write and shove it in memory, it gets written the hard drives a little 
later on but the CPU can get on with the next task.  It does remember 
that it is still in memory so if it gets a request it will read it from 
memory rather than going to the hard drive. Roughly a million times faster.

Caching works on reads as well.  Parts of the database will be accessed 
more often than others.  For OSM we can expect that at 3 am local time 
there will be less mappers active than at say 5 pm.  So at 3 am we leave 
that bit of the database on the hard drive and keep another bit in 
memory.  At 5 pm we keep the local bit in memory. This is very much a 
simplification but the basic idea holds true.  Typically we hope to get 
80% of the data we're after in memory so we get 80% of the speed of 
memory at perhaps 5% of the cost of a pure memory system and that is 
where the database software pays for itself. We can also gain 
performance with a cluster, the database is spread over a number of 
servers.  It does add a level of complexity though which doesn't always 
help on the reliability side.

I think Oracle was one of the first relational databases but Sybase 
looked at the problems Oracle had and wrote a relational database that 
was an improvement in many ways.  Microsoft licensed it for Windows, 
Microsoft SQL server but when Sybase was a bit unresponsive to bug 
reports took the database code in house.  They then spent money hiring 
database knowledgeable people and rewrote Microsoft SQL server.  So in 
my opinion Microsoft SQL server is solid and is self tuning in it's 
caches but it is a relational database and they are not optimal in all 
circumstances.

One aspect of computer code is once written you can run it on many 
machines.  In my opinion the more people who run it the fewer 
undocumented system features you'll find.  So from a reliability point 
of view it's better not to be an outlier.  Again this is just my opinion.

Overall translation to improve the database performance you just throw 
on faster SSDs and more memory.  It's called throwing hardware at the 
problem but if you can optimise what you put in the database that's much 
cheaper.  Besides I like being kind to databases.

Cheerio John

Iain Ingram wrote on 10/21/2021 1:52 PM:
> Database performance aside as I would see this falling into regular 
> issues. If we see the database straining now we will see it in a year 
> regardless.
>
> Just my two cents.