[Photon] Expected throughput for global database
David Lenz
david.lenz at istari.ai
Wed Jan 8 13:28:03 UTC 2025
Hi Sarah,
Thank you so much for your detailed response and the valuable insights!
We discovered that the low throughput in our tests was likely due to
running Photon in a Docker container. Once we installed Photon directly
on the system, we were able to achieve the expected throughput.
Thanks again for your support!
Best regards,
David
Am 02.01.2025 um 16:17 schrieb Sarah Hoffmann via Photon:
> Hi,
>
> I don't have any recent performance numbers but the demo instance over
> at photon.komoot.io (8CPUs, 128GB RAM, NVME) is currently serving about
> 25 requests/s and is far from being at its capacity.
>
> The regular mass tests I run manage to do about 9 requests/s running
> single-threaded. With 6 workers, I'd expect at least 40 requests/s with
> 6 workers.
>
> Two observations:
>
> For a planet database, you should have at least 64GB RAM if you need
> throughput. Better is 128GB RAM.
>
> The addresses you are trying to geocode are especially bad for Photon.
> There is extra stuff there that is unlikely to be found in the
> underlying OSM data (like SPT OILFIELD EQUIPMENT & VESSELS MANUFACTURERS BUILDING)
> or not supported (like PISO:28). Photon will try very hard to compensate
> by fuzzy matching and that is expensive. Also note that Photon
> currently cannot handle abbreviations well.
>
> Sarah
>
> On Sat, Dec 28, 2024 at 01:03:45PM +0100, David Lenz wrote:
>> Hi,
>> i was wondering what the expected throughput for the global database is when
>> running on a t2.2xlarge (8cpus, 32GB Ram, 500GB gp3 SSD).
>>
>> We're currently getting ~2.38 it/s with 6 workers sending requests in
>> parallel.
>>
>> Logs from a test run on 1k addresses:
>> ----------------------------------------------------------------------------------
>> 2024-12-28 11:49:45 [INFO] __main__ - Script started
>> 2024-12-28 11:49:45 [INFO] __main__ - Reading input data from
>> s3://***/source/unique_addresses_for_geocoding_1k_sample.csv
>> 2024-12-28 11:49:45 [INFO] botocore.credentials - Found credentials from IAM
>> Role: photon-geocoder-role
>> 2024-12-28 11:49:45 [INFO] __main__ - Loaded 1000 records to geocode
>> 2024-12-28 11:49:45 [INFO] __main__ - Starting parallel geocoding of 1000
>> addresses with max_workers=6
>> Geocoding: 1%|█▎ | 7/1000 [00:00<01:32, 10.75it/s]2024-12-28 11:49:45
>> [ERROR] __main__ - Network/Request error for address='SPT OILFIELD EQUIPMENT
>> & VESSELS MANUFACTURERS BUILDING RAS AL KHAIMAH United Arab Emirates': 400
>> Client Error: Bad Request for url:http://localhost:2322/api?q=SPT%20OILFIELD%20EQUIPMENT%20&%20VESSELS%20MANUFACTURERS%20BUILDING%20%20RAS%20AL%20KHAIMAH%20%20United%20Arab%20Emirates&limit=1
>> Geocoding: 5%|█████████▏ | 49/1000 [00:10<04:48, 3.30it/s]2024-12-28
>> 11:49:55 [ERROR] __main__ - Network/Request error for address='JEWELLERY &
>> GEMPLEX DUBAI United Arab Emirates': 400 Client Error: Bad Request for
>> url:http://localhost:2322/api?q=JEWELLERY%20&%20GEMPLEX%20%20DUBAI%20%20United%20Arab%20Emirates&limit=1
>> Geocoding: 9%|█████████████████ | 91/1000 [00:41<22:12, 1.47s/it]2024-12-28
>> 11:50:27 [ERROR] __main__ - Network/Request error for address='MADERO
>> EDUARDO AV. 900 PISO:28 1106 BUENOS AIRES Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> Geocoding: 9%|█████████████████▌ | 94/1000 [00:42<15:46,
>> 1.04s/it]2024-12-28 11:50:29 [ERROR] __main__ - Network/Request error for
>> address='BVRD CASTRO BARROS 1527 5000 CORDOBA Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> Geocoding: 10%|█████████████████▉ | 96/1000 [00:44<15:08,
>> 1.01s/it]2024-12-28 11:50:36 [ERROR] __main__ - Network/Request error for
>> address='BERMEJO 1175 7600 MAR DEL PLATA Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> 2024-12-28 11:50:36 [ERROR] __main__ - Network/Request error for address='AV
>> CORDOBA 2428 1120 CAPITAL FEDERAL Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> Geocoding: 10%|██████████████████▏ | 97/1000 [00:51<32:54,
>> 2.19s/it]2024-12-28 11:50:36 [ERROR] __main__ - Network/Request error for
>> address='25 DE MAYO 509 3300 POSADAS Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> 2024-12-28 11:50:37 [ERROR] __main__ - Network/Request error for
>> address=' Argentina': HTTPConnectionPool(host='localhost', port=2322):
>> Read timed out. (read timeout=10)
>> Geocoding: 10%|██████████████████▌ | 100/1000 [00:52<20:04,
>> 1.34s/it]2024-12-28 11:50:39 [ERROR] __main__ - Network/Request error for
>> address='AV MENDOZA D PEDRO D 3899 1294 BUENOS AIRES Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> Geocoding: 10%|██████████████████▊ | 101/1000 [00:54<21:43,
>> 1.45s/it]2024-12-28 11:50:39 [ERROR] __main__ - Network/Request error for
>> address='RAWSON 3150 1618 RICARDO ROJAS Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> Geocoding: 12%|█████████████████████▊ | 117/1000 [01:14<19:49,
>> 1.35s/it]2024-12-28 11:51:01 [ERROR] __main__ - Network/Request error for
>> address='REP DE HONDURAS 5663 PB 1414 CAPITAL FEDERAL Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> Geocoding: 12%|██████████████████████▏ | 119/1000 [01:16<17:27,
>> 1.19s/it]2024-12-28 11:51:02 [ERROR] __main__ - Network/Request error for
>> address='AV 11 DE SEPTIEMBRE KM 85 0 5925 FERREYRA Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> Geocoding: 24%|█████████████████████████████████████████████▍ | 244/1000
>> [03:10<15:37, 1.24s/it]
>> 2024-12-28 11:52:56 [ERROR] __main__ - Network/Request error for
>> address='AVENIDA A SIN NRO 7600 MAR DEL PLATA Argentina':
>> HTTPConnectionPool(host='localhost', port=2322): Read timed out. (read
>> timeout=10)
>> Geocoding: 25%|█████████████████████████████████████████████▊ | 246/1000
>> [03:11<12:23, 1.01it/s]
>> 2024-12-28 11:52:59 [ERROR] __main__ - Network/Request error for
>> address='RUTA 34 KM 272 0 PISO:0 DPTO:0 S:0 T:0 M: 0 2324 TACURAL
>> Argentina': HTTPConnectionPool(host='localhost', port=2322): Read timed out.
>> (read timeout=10)
>> Geocoding: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|
>> 1000/1000 [07:00<00:00, 2.38it/s]
>> 2024-12-28 11:56:45 [INFO] __main__ - Finished parallel geocoding of 1000
>> addresses in 420.58 seconds
>> 2024-12-28 11:56:45 [INFO] botocore.credentials - Found credentials from IAM
>> Role: photon-geocoder-role
>> 2024-12-28 11:56:46 [INFO] __main__ - Results saved tos3://istariai-photon-geocoding/source/unique_addresses_for_geocoding_1k_sample.gz
>> 2024-12-28 11:56:46 [INFO] __main__ - Script finished
>> ----------------------------------------------------------------------------------
>>
>> Best,
>> David
>>
>> --
>> *Dr. David Lenz*
>> Co-Founder, istari.ai GmbH
>> *e:*david.lenz at istari.ai
>> *w:*www.istari.ai <http://www.istari.ai>
>> Julius-Hatry-Straße 1, 68163 Mannheim
>> _______________________________________________
>> Photon mailing list
>> Photon at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/photon
>
> _______________________________________________
> Photon mailing list
> Photon at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/photon
--
*Dr. David Lenz*
Co-Founder, istari.ai GmbH
*e:* david.lenz at istari.ai
*w:* www.istari.ai <http://www.istari.ai>
Julius-Hatry-Straße 1, 68163 Mannheim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/photon/attachments/20250108/a8b49d41/attachment.htm>
More information about the Photon
mailing list