[Photon] Expected throughput for global database

David Lenz david.lenz at istari.ai
Sat Dec 28 12:03:45 UTC 2024


Hi,
i was wondering what the expected throughput for the global database is 
when running on a t2.2xlarge (8cpus, 32GB Ram, 500GB gp3 SSD).

We're currently getting ~2.38 it/s with 6 workers sending requests in 
parallel.

Logs from a test run on 1k addresses:
----------------------------------------------------------------------------------
2024-12-28 11:49:45 [INFO] __main__ - Script started
2024-12-28 11:49:45 [INFO] __main__ - Reading input data from 
s3://***/source/unique_addresses_for_geocoding_1k_sample.csv
2024-12-28 11:49:45 [INFO] botocore.credentials - Found credentials from 
IAM Role: photon-geocoder-role
2024-12-28 11:49:45 [INFO] __main__ - Loaded 1000 records to geocode
2024-12-28 11:49:45 [INFO] __main__ - Starting parallel geocoding of 
1000 addresses with max_workers=6
Geocoding: 1%|█▎ | 7/1000 [00:00<01:32, 10.75it/s]2024-12-28 11:49:45 
[ERROR] __main__ - Network/Request error for address='SPT OILFIELD 
EQUIPMENT & VESSELS MANUFACTURERS BUILDING  RAS AL KHAIMAH United Arab 
Emirates': 400 Client Error: Bad Request for url: 
http://localhost:2322/api?q=SPT%20OILFIELD%20EQUIPMENT%20&%20VESSELS%20MANUFACTURERS%20BUILDING%20%20RAS%20AL%20KHAIMAH%20%20United%20Arab%20Emirates&limit=1
Geocoding: 5%|█████████▏ | 49/1000 [00:10<04:48,  3.30it/s]2024-12-28 
11:49:55 [ERROR] __main__ - Network/Request error for address='JEWELLERY 
& GEMPLEX  DUBAI  United Arab Emirates': 400 Client Error: Bad Request 
for url: 
http://localhost:2322/api?q=JEWELLERY%20&%20GEMPLEX%20%20DUBAI%20%20United%20Arab%20Emirates&limit=1
Geocoding: 9%|█████████████████ | 91/1000 [00:41<22:12,  
1.47s/it]2024-12-28 11:50:27 [ERROR] __main__ - Network/Request error 
for address='MADERO EDUARDO AV. 900 PISO:28 1106 BUENOS AIRES  
Argentina': HTTPConnectionPool(host='localhost', port=2322): Read timed 
out. (read timeout=10)
Geocoding: 9%|█████████████████▌ | 94/1000 [00:42<15:46,  
1.04s/it]2024-12-28 11:50:29 [ERROR] __main__ - Network/Request error 
for address='BVRD CASTRO BARROS 1527 5000 CORDOBA  Argentina': 
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read 
timeout=10)
Geocoding: 10%|█████████████████▉ | 96/1000 [00:44<15:08,  
1.01s/it]2024-12-28 11:50:36 [ERROR] __main__ - Network/Request error 
for address='BERMEJO 1175 7600 MAR DEL PLATA  Argentina': 
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read 
timeout=10)
2024-12-28 11:50:36 [ERROR] __main__ - Network/Request error for 
address='AV CORDOBA 2428 1120 CAPITAL FEDERAL  Argentina': 
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read 
timeout=10)
Geocoding: 10%|██████████████████▏ | 97/1000 [00:51<32:54,  
2.19s/it]2024-12-28 11:50:36 [ERROR] __main__ - Network/Request error 
for address='25 DE MAYO 509 3300 POSADAS  Argentina': 
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read 
timeout=10)
2024-12-28 11:50:37 [ERROR] __main__ - Network/Request error for 
address='    Argentina': HTTPConnectionPool(host='localhost', 
port=2322): Read timed out. (read timeout=10)
Geocoding: 10%|██████████████████▌ | 100/1000 [00:52<20:04,  
1.34s/it]2024-12-28 11:50:39 [ERROR] __main__ - Network/Request error 
for address='AV MENDOZA D PEDRO D 3899 1294 BUENOS AIRES  Argentina': 
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read 
timeout=10)
Geocoding: 10%|██████████████████▊ | 101/1000 [00:54<21:43,  
1.45s/it]2024-12-28 11:50:39 [ERROR] __main__ - Network/Request error 
for address='RAWSON 3150 1618 RICARDO ROJAS  Argentina': 
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read 
timeout=10)
Geocoding: 12%|█████████████████████▊ | 117/1000 [01:14<19:49,  
1.35s/it]2024-12-28 11:51:01 [ERROR] __main__ - Network/Request error 
for address='REP DE HONDURAS 5663 PB 1414 CAPITAL FEDERAL  Argentina': 
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read 
timeout=10)
Geocoding: 12%|██████████████████████▏ | 119/1000 [01:16<17:27,  
1.19s/it]2024-12-28 11:51:02 [ERROR] __main__ - Network/Request error 
for address='AV 11 DE SEPTIEMBRE KM 85 0 5925 FERREYRA  Argentina': 
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read 
timeout=10)
Geocoding: 24%|█████████████████████████████████████████████▍ | 244/1000 
[03:10<15:37,  1.24s/it]
2024-12-28 11:52:56 [ERROR] __main__ - Network/Request error for 
address='AVENIDA A SIN NRO 7600 MAR DEL PLATA  Argentina': 
HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read 
timeout=10)
Geocoding: 25%|█████████████████████████████████████████████▊ | 246/1000 
[03:11<12:23,  1.01it/s]
2024-12-28 11:52:59 [ERROR] __main__ - Network/Request error for 
address='RUTA 34 KM 272 0 PISO:0 DPTO:0 S:0 T:0 M: 0 2324 TACURAL 
Argentina': HTTPConnectionPool(host='localhost', port=2322): Read timed 
out. (read timeout=10)
Geocoding: 
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 
1000/1000 [07:00<00:00,  2.38it/s]
2024-12-28 11:56:45 [INFO] __main__ - Finished parallel geocoding of 
1000 addresses in 420.58 seconds
2024-12-28 11:56:45 [INFO] botocore.credentials - Found credentials from 
IAM Role: photon-geocoder-role
2024-12-28 11:56:46 [INFO] __main__ - Results saved to 
s3://istariai-photon-geocoding/source/unique_addresses_for_geocoding_1k_sample.gz
2024-12-28 11:56:46 [INFO] __main__ - Script finished
----------------------------------------------------------------------------------

Best,
David

-- 
*Dr. David Lenz*
Co-Founder, istari.ai GmbH
*e:* david.lenz at istari.ai
*w:* www.istari.ai <http://www.istari.ai>
Julius-Hatry-Straße 1, 68163 Mannheim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/photon/attachments/20241228/a6fb30c1/attachment.htm>


More information about the Photon mailing list