[openstreetmap/openstreetmap-website] Why are some diff uploads slow? (#1710)
mmd
notifications at github.com
Sun Dec 24 09:34:36 UTC 2017
I ran a very small local test based on [Changeset 53407241](http://www.openstreetmap.org/changeset/53407241), which took around 15 minutes via JOSM upload. Most CPU time was spent on _puma 001_ process (rails). On the console I noticed a rather large number of single row INSERT/DELETE/UPDATE operations:
- Each single node (for current and old nodes)
- Each single tag
- Each single node in a way
- Frequent checks for current changeset
Overall, I'm also suspecting shared/exclusive locks to contribute to the larger overall processing time on the server. With long running transactions, chances increase that other people start working on the same objects.
I think it's time for a bit out of the box thinking, and leaving the current implementation behind for a minute. That's why I gave [pyosmium](http://osmcode.org/pyosmium/) and some bulk SQL statements a short try.
The same 8MB diff file was completely parsed in 140ms. Based on this I generated some bulk insert statements, effectively creating all 4300+ nodes in one go:
```sql
BEGIN;
INSERT INTO "current_nodes" ("latitude", "longitude", "changeset_id", "visible", "timestamp", "tile", "version") VALUES
(423712730, -832172277, 19, 't', '2017-12-24 09:15:27.204641', 1234, 1),
(423712670, -834339064, 19, 't', '2017-12-24 09:15:27.204675', 1234, 1),
(423715247, -832172360, 19, 't', '2017-12-24 09:15:27.204684', 1234, 1),
(423715140, -834339084, 19, 't', '2017-12-24 09:15:27.204691', 1234, 1),
(423717479, -834339184, 19, 't', '2017-12-24 09:15:27.204697', 1234, 1),
(423672264, -832171144, 19, 't', '2017-12-24 09:15:27.204704', 1234, 1),
(423672601, -834336964, 19, 't', '2017-12-24 09:15:27.204710', 1234, 1),
(423674831, -832171161, 19, 't', '2017-12-24 09:15:27.204717', 1234, 1)
[...many more...]
RETURNING (id);
COMMIT;
```
This query only took a few hundred ms on my local db (not much in there). The number of entries (=package size) in one such INSERT statement could of course be made configurable, and the returned node ids used for later replacement operations. They could even be stored in a temporary table.
Regarding modify and delete operations, a huge part of the task is to pull data node by node, tag by tag out of the database, and INSERT it in the history table. I wonder, why we don't let the database do this job and only provide a list of object ids to do the copying in one go. This takes maybe a few seconds.
Here's a very rough idea:
```sql
insert into node_tags ( select node_id, version, k, v from current_node_tags t
inner join current_nodes n on t.node_id = n.id where id in ( 19302, 19301, 19300,
19299, 19298, 19297, 19296, 19295, 19294, 19293, 19292, 19291, 19290, 19289,
19288, 19287, 19286, 19285, 19284, 19283, 19282, 19281, 19280, 19279, 19278,
19277, 19276, 19275, 19274, 19273, 19272, 19271, 19270, 19269, 19268, 19267,
19266));
```
I'm far away from understanding all of the details in the current implementation. Nevertheless, I think there's a lot of potential for improvement here, although it could mean that we can't reuse much of the current implementation. Wouldn't that be a good fit for cgimap?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/openstreetmap-website/issues/1710#issuecomment-353774921
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/rails-dev/attachments/20171224/7fbd8cc0/attachment.html>
More information about the rails-dev
mailing list