[OSM-talk] Pictures of opening hours signs for machine learning purposes

Bryce Cogswell bryceco at yahoo.com
Sat Apr 10 03:47:58 UTC 2021


> @Bryce: Did you already make significant efforts regarding deduplicating / sorting or otherwise processing the images? If yes, maybe you could share this altered dataset with Isaac and other interested parties?

I didn’t do any additional work on deduplicating the images. I’m not sure why you think this is important if you’re going to use it for ML training.

> @Bryce: Congratulations! I already saw some correctly recognized specimens! That is certainly encouraging, isn't it? Do you already know if/how you would proceed further? If you would be okay with publishing with what you already have, maybe others could build upon that.
> 
> I remember one idea we had: If users of such a recognition feature would be willing to (automatically, with little/no effort) share the pictures to increase the pool of pictures you could create a virtuos cycle, especially if you can motivate them to either mark detections as correct or let them fix it as needed.

Keep in mind I’m not doing any ML training, so having a larger sample size doesn’t benefit me. I wanted a large number of test images in order to measure the expected accuracy of the OCR and algorithm in a real-world settings. My plan now is to build a stand-alone app for testing during surveying, improve the recognition by building better spatial models of how the text is laid out, and then finally integrate it into Go Map!!

I’m working on this at https://github.com/bryceco/OpeningHoursPhoto <https://github.com/bryceco/OpeningHoursPhoto> but the code is super rough at this point.
The image set it is at https://gomaposm.com/opening_hours/opening_hours.zip <http://gomaposm.com/opening_hours/opening_hours.zip> (12.5GB download)

Bryce

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20210409/b637ee40/attachment.htm>


More information about the talk mailing list