[OSM-talk] Upload slowness - what's going on?

Fri May 13 13:15:17 UTC 2016

Hi Grant,

Thank you for the technical explanation and especially for all the efforts
improving the infrastructure! It's unfortunate uploads are so much slower
now, but if the whole setup is more reliable/redundant, that's a small
price to pay.

Polyglot

2016-05-13 14:59 GMT+02:00 Grant Slater <openstreetmap at firefishy.com>:

> Hi All,
>
> On Monday 9th May 2016 the master OSM database server was moved to
> York (Bytemark) from London (Imperial).
> This was to avoid multiple upcoming weekends of planned power testing
> & maintenance at the Imperial data centre. For the last few years
> Imperial has housed all our main critical systems including master &
> slave DB servers and frontend & backend web/api servers. We also added
> 4 new frontend/backend web/api server to York on Monday.
>
> We now have the master database server in York and the secondary
> database server in Imperial. We also have a warm standby slave db in
> AWS Ireland. A fourth SSD (NVMe) based DB server was delivered
> yesterday (Thursday), but it needs testing (burn-in, reliability,
> performance etc) before we can start using it. Slave DB servers can be
> promoted to master if required.
>
> The slave db servers serve Web/API read traffic and writes go to the
> master. When the frontend + backend servers were in the same data
> centre as the master db server the latency was <1ms. We now run a VPN
> to connect the servers up and the latency is ~8ms Imperial to
> Bytemark. Currently we are using the frontend & backends server at
> Imperial (closest to slave db read server) and sending writes over the
> VPN to Bytemark. The extra 8ms roundtrip is triggered multiple times
> based on the size of the upload changeset, this is the root cause for
> the slower uploads. The link between Imperial & Bytemark can handle
> gigabit speeds. Over the last few days we've been tweaking the VPN
> settings to get optimal latency & throughput over the links.
>
> Over today (for at least the weekend) we are switching to the new
> frontend & backend servers in York (Bytemark). London Imperial will be
> offline from approximately 5pm (GMT+1) for the first weekend of power
> maintenance.
>
> In summary: The slow uploads are a known issue and we'll fix as soon
> as practical. Our main concern has been setting up multiple data
> center redundancy to avoid extended downtime.
>
> Here is the list of all core hardware and hosting locations:
> https://hardware.openstreetmap.org/
>
> Hope that answers the questions. ;-)
>
> Photos or it didn't happen:
> * Syncing & powering down before we start London -> York DB move:
> https://twitter.com/OSM_Tech/status/729582996685213696
> * Staged photo of racking up the master DB server at Bytemark:
> https://twitter.com/OSM_Tech/status/729693392737832961
> * Testing the new Frontend / Backend servers a week ago:
> https://twitter.com/OSM_Tech/status/728286193696292865
>
> Bytemark are a fantastic hosting company and their ongoing support of
> the OpenStreetMap project is highly commendable. Please support them
> ;-) https://twitter.com/bytemark/status/729698435339853824
>
> Kind regards,
>
> Grant
> Part of the OSM Ops team.
>
>
> On 13 May 2016 at 11:44, Tim Waters <chippy2005 at gmail.com> wrote:
> > I believe the Dev mailing list may have some of your technical answers
> > https://lists.openstreetmap.org/pipermail/dev/2016-May/thread.html
> >
> > It appears from that list that the database servers are now a few
> > hundreds of miles from where the web servers are, causing the increase
> > in latency. I do not know if this is a permanent change, the thread on
> > osm-dev does seem to indicate that things are still in flux.
> >
> > Tim
> >
> >
> >
> > On 13 May 2016 at 06:02, Ben Discoe <bdiscoe at gmail.com> wrote:
> >> Several of us have noticed radically slowly upload speed for
> >> changesets, roughly since the server move on May 9.  Like, as
> >> painfully slow as it used to be, it's now several times slower.
> >>
> >> It's been discussed with @OSM_Tech on twitter, in this thread:
> >> https://twitter.com/OSM_Tech/status/730857486618664960
> >>
> >> Before I get too hysterical, can somebody tell me what happened, and
> >> can it be fixed?
> >>
> >> OSM_Tech's mysterious message:
> >>   "Large uploads will take around 3 times longer. Small uploads extra
> >> delay should be minimal."
> >>
> >> Does this mean that something did change?  It is database writes that
> >> are taking so much longer?  Changesets with as few as 400 object are
> >> taking several times longer, what constitutes "large" vs. "small"?
> >> Can it be fixed?  Can I donate large sums of money somewhere to help
> >> it get fixed?
> >>
> >> Thanks,
> >> Ben
> >>
> >> _______________________________________________
> >> talk mailing list
> >> talk at openstreetmap.org
> >> https://lists.openstreetmap.org/listinfo/talk
> >
> > _______________________________________________
> > talk mailing list
> > talk at openstreetmap.org
> > https://lists.openstreetmap.org/listinfo/talk
>
> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20160513/b5703ac5/attachment.html>