[Photon] Expected throughput for global database
David Lenz
david.lenz at istari.ai
Sat Dec 28 12:03:45 UTC 2024
Hi,
i was wondering what the expected throughput for the global database is
when running on a t2.2xlarge (8cpus, 32GB Ram, 500GB gp3 SSD).
We're currently getting ~2.38 it/s with 6 workers sending requests in
parallel.
Logs from a test run on 1k addresses:
----------------------------------------------------------------------------------
2024-12-28 11:49:45 [INFO] __main__ - Script started
2024-12-28 11:49:45 [INFO] __main__ - Reading input data from
s3://***/source/unique_addresses_for_geocoding_1k_sample.csv
2024-12-28 11:49:45 [INFO] botocore.credentials - Found credentials from
IAM Role: photon-geocoder-role
2024-12-28 11:49:45 [INFO] __main__ - Loaded 1000 records to geocode
2024-12-28 11:49:45 [INFO] __main__ - Starting parallel geocoding of
1000 addresses with max_workers=6
Geocoding: 1%|█▎ | 7/1000 [00:00<01:32, 10.75it/s]2024-12-28 11:49:45
[ERROR] __main__ - Network/Request error for address='SPT OILFIELD
EQUIPMENT & VESSELS MANUFACTURERS BUILDING RAS AL KHAIMAH United Arab
Emirates': 400 Client Error: Bad Request for url:
http://localhost:2322/api?q=SPT%20OILFIELD%20EQUIPMENT%20&%20VESSELS%20MANUFACTURERS%20BUILDING%20%20RAS%20AL%20KHAIMAH%20%20United%20Arab%20Emirates&limit=1
Geocoding: 5%|█████████▏ | 49/1000 [00:10<04:48, 3.30it/s]2024-12-28
11:49:55 [ERROR] __main__ - Network/Request error for address='JEWELLERY
& GEMPLEX DUBAI United Arab Emirates': 400 Client Error: Bad Request
for url:
http://localhost:2322/api?q=JEWELLERY%20&%20GEMPLEX%20%20DUBAI%20%20United%20Arab%20Emirates&limit=1
Geocoding: 9%|█████████████████ | 91/1000 [00:41<22:12,
1.47s/it]2024-12-28 11:50:27 [ERROR] __main__ - Network/Request error
for address='MADERO EDUARDO AV. 900 PISO:28 1106 BUENOS AIRES
Argentina': HTTPConnectionPool(host='localhost', port=2322): Read timed
out. (read timeout=10)
Geocoding: 9%|█████████████████▌ | 94/1000 [00:42<15:46,
1.04s/it]2024-12-28 11:50:29 [ERROR] __main__ - Network/Request error
for address='BVRD CASTRO BARROS 1527 5000 CORDOBA Argentina':
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
timeout=10)
Geocoding: 10%|█████████████████▉ | 96/1000 [00:44<15:08,
1.01s/it]2024-12-28 11:50:36 [ERROR] __main__ - Network/Request error
for address='BERMEJO 1175 7600 MAR DEL PLATA Argentina':
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
timeout=10)
2024-12-28 11:50:36 [ERROR] __main__ - Network/Request error for
address='AV CORDOBA 2428 1120 CAPITAL FEDERAL Argentina':
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
timeout=10)
Geocoding: 10%|██████████████████▏ | 97/1000 [00:51<32:54,
2.19s/it]2024-12-28 11:50:36 [ERROR] __main__ - Network/Request error
for address='25 DE MAYO 509 3300 POSADAS Argentina':
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
timeout=10)
2024-12-28 11:50:37 [ERROR] __main__ - Network/Request error for
address=' Argentina': HTTPConnectionPool(host='localhost',
port=2322): Read timed out. (read timeout=10)
Geocoding: 10%|██████████████████▌ | 100/1000 [00:52<20:04,
1.34s/it]2024-12-28 11:50:39 [ERROR] __main__ - Network/Request error
for address='AV MENDOZA D PEDRO D 3899 1294 BUENOS AIRES Argentina':
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
timeout=10)
Geocoding: 10%|██████████████████▊ | 101/1000 [00:54<21:43,
1.45s/it]2024-12-28 11:50:39 [ERROR] __main__ - Network/Request error
for address='RAWSON 3150 1618 RICARDO ROJAS Argentina':
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
timeout=10)
Geocoding: 12%|█████████████████████▊ | 117/1000 [01:14<19:49,
1.35s/it]2024-12-28 11:51:01 [ERROR] __main__ - Network/Request error
for address='REP DE HONDURAS 5663 PB 1414 CAPITAL FEDERAL Argentina':
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
timeout=10)
Geocoding: 12%|██████████████████████▏ | 119/1000 [01:16<17:27,
1.19s/it]2024-12-28 11:51:02 [ERROR] __main__ - Network/Request error
for address='AV 11 DE SEPTIEMBRE KM 85 0 5925 FERREYRA Argentina':
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
timeout=10)
Geocoding: 24%|█████████████████████████████████████████████▍ | 244/1000
[03:10<15:37, 1.24s/it]
2024-12-28 11:52:56 [ERROR] __main__ - Network/Request error for
address='AVENIDA A SIN NRO 7600 MAR DEL PLATA Argentina':
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
timeout=10)
Geocoding: 25%|█████████████████████████████████████████████▊ | 246/1000
[03:11<12:23, 1.01it/s]
2024-12-28 11:52:59 [ERROR] __main__ - Network/Request error for
address='RUTA 34 KM 272 0 PISO:0 DPTO:0 S:0 T:0 M: 0 2324 TACURAL
Argentina': HTTPConnectionPool(host='localhost', port=2322): Read timed
out. (read timeout=10)
Geocoding:
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|
1000/1000 [07:00<00:00, 2.38it/s]
2024-12-28 11:56:45 [INFO] __main__ - Finished parallel geocoding of
1000 addresses in 420.58 seconds
2024-12-28 11:56:45 [INFO] botocore.credentials - Found credentials from
IAM Role: photon-geocoder-role
2024-12-28 11:56:46 [INFO] __main__ - Results saved to
s3://istariai-photon-geocoding/source/unique_addresses_for_geocoding_1k_sample.gz
2024-12-28 11:56:46 [INFO] __main__ - Script finished
----------------------------------------------------------------------------------
Best,
David
--
*Dr. David Lenz*
Co-Founder, istari.ai GmbH
*e:* david.lenz at istari.ai
*w:* www.istari.ai <http://www.istari.ai>
Julius-Hatry-Straße 1, 68163 Mannheim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/photon/attachments/20241228/a6fb30c1/attachment.htm>
More information about the Photon
mailing list