[OSM-talk] Upload slowness - what's going on?

Grant Slater openstreetmap at firefishy.com
Fri May 13 12:59:12 UTC 2016


Hi All,

On Monday 9th May 2016 the master OSM database server was moved to
York (Bytemark) from London (Imperial).
This was to avoid multiple upcoming weekends of planned power testing
& maintenance at the Imperial data centre. For the last few years
Imperial has housed all our main critical systems including master &
slave DB servers and frontend & backend web/api servers. We also added
4 new frontend/backend web/api server to York on Monday.

We now have the master database server in York and the secondary
database server in Imperial. We also have a warm standby slave db in
AWS Ireland. A fourth SSD (NVMe) based DB server was delivered
yesterday (Thursday), but it needs testing (burn-in, reliability,
performance etc) before we can start using it. Slave DB servers can be
promoted to master if required.

The slave db servers serve Web/API read traffic and writes go to the
master. When the frontend + backend servers were in the same data
centre as the master db server the latency was <1ms. We now run a VPN
to connect the servers up and the latency is ~8ms Imperial to
Bytemark. Currently we are using the frontend & backends server at
Imperial (closest to slave db read server) and sending writes over the
VPN to Bytemark. The extra 8ms roundtrip is triggered multiple times
based on the size of the upload changeset, this is the root cause for
the slower uploads. The link between Imperial & Bytemark can handle
gigabit speeds. Over the last few days we've been tweaking the VPN
settings to get optimal latency & throughput over the links.

Over today (for at least the weekend) we are switching to the new
frontend & backend servers in York (Bytemark). London Imperial will be
offline from approximately 5pm (GMT+1) for the first weekend of power
maintenance.

In summary: The slow uploads are a known issue and we'll fix as soon
as practical. Our main concern has been setting up multiple data
center redundancy to avoid extended downtime.

Here is the list of all core hardware and hosting locations:
https://hardware.openstreetmap.org/

Hope that answers the questions. ;-)

Photos or it didn't happen:
* Syncing & powering down before we start London -> York DB move:
https://twitter.com/OSM_Tech/status/729582996685213696
* Staged photo of racking up the master DB server at Bytemark:
https://twitter.com/OSM_Tech/status/729693392737832961
* Testing the new Frontend / Backend servers a week ago:
https://twitter.com/OSM_Tech/status/728286193696292865

Bytemark are a fantastic hosting company and their ongoing support of
the OpenStreetMap project is highly commendable. Please support them
;-) https://twitter.com/bytemark/status/729698435339853824

Kind regards,

Grant
Part of the OSM Ops team.


On 13 May 2016 at 11:44, Tim Waters <chippy2005 at gmail.com> wrote:
> I believe the Dev mailing list may have some of your technical answers
> https://lists.openstreetmap.org/pipermail/dev/2016-May/thread.html
>
> It appears from that list that the database servers are now a few
> hundreds of miles from where the web servers are, causing the increase
> in latency. I do not know if this is a permanent change, the thread on
> osm-dev does seem to indicate that things are still in flux.
>
> Tim
>
>
>
> On 13 May 2016 at 06:02, Ben Discoe <bdiscoe at gmail.com> wrote:
>> Several of us have noticed radically slowly upload speed for
>> changesets, roughly since the server move on May 9.  Like, as
>> painfully slow as it used to be, it's now several times slower.
>>
>> It's been discussed with @OSM_Tech on twitter, in this thread:
>> https://twitter.com/OSM_Tech/status/730857486618664960
>>
>> Before I get too hysterical, can somebody tell me what happened, and
>> can it be fixed?
>>
>> OSM_Tech's mysterious message:
>>   "Large uploads will take around 3 times longer. Small uploads extra
>> delay should be minimal."
>>
>> Does this mean that something did change?  It is database writes that
>> are taking so much longer?  Changesets with as few as 400 object are
>> taking several times longer, what constitutes "large" vs. "small"?
>> Can it be fixed?  Can I donate large sums of money somewhere to help
>> it get fixed?
>>
>> Thanks,
>> Ben
>>
>> _______________________________________________
>> talk mailing list
>> talk at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/talk
>
> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk



More information about the talk mailing list