[OSM-talk] ReMAPTCHA Demo BETA 0.2 online! (Was: Hate captchas!!!!)

Mon Mar 31 16:47:40 UTC 2014

On 31/03/2014, Stefan Keller <sfkeller at gmail.com> wrote:
> The scrambling of the Control Word" is as intense as other Captchas but
> it's only 5-6 chars (instead of 10 or more).

Hum, taking another look, it does seem more scrambled than I remember.
But when I first checked, there was only text over the map, not over
the imagery, so you changed stuff :p

It's better, but I still feel it's not as scrambled as the typical
ReCAPTCHA; for example it doesn't have anything overlaying it.

> We can afford this because an OCR needs to find first the boundaries of the
> word - and that's more difficult with labels on a map.

One major flaw with displaying the text on both layers (rendered and
satellite) is that you can  substract one image from the other to be
left with just the text without noise.

Map labels are rare enough in the examples I saw. They might also be
easyly filtered out as being non-scrambled and using known typefaces.

And if we get an area with many labels that somehow confuse a bot but
not a user, we're back to the "got a few words, try to give
combinations at random" problem.

> You have to realize that the other word has to be written just to indicate
> if there is a path - else you can ommit it.


> The fact that there is a path is unknown to our system.

I don't understand that part. You've created the chalenges, so you
must know the answer ?

> So in your estimation, humans always succeded when only typing the "Control
> Word".
> That (human) trick and not knowing the correct answer for the "Control
> Word" applies to all reCAPTCHAs.
> Bots need first to find 1. which one is the Control Word (including
> boundary) and then 2. to try OCR.

Part of my point is that the bot doesn't need to distinguish the
control word from the other word (assuming it OCRed the words
correctly, see above). It has a 33% chance of getting it right
randomly, which is fine. A 10% overall success rate is not an issue
for a bot (but would be for a human).

>> Please drop the "scrambled text" idea altogether. And make solving a
>> CAPTCHA a fun activity in the process.
> Feedback so far was, that it's at least more fun than typing 15 characters
> and helping G* instead of OSM.

That's my feedback too :) Note that I wouldn't be commenting if I
thought the work didn't have merit :p

But I'm afraid that the fun will quickly disapear, because we still
have to squint and type, and because you'll notice that you need to
raise the scrambling-related difficulty because bots still get thru.

>> ...                        A "click features on the
>> satellite imagery" task is one way to do it, but I'm sure there are
>> others.
> This seems like a good idea and I'm open to collect those.

I hope you'll explore the idea, so.

There have been plenty of other attempts with image-based no-typing
CAPTCHAs (search those terms), but I think that they suffered from the
high cost of getting tagged source material. An OSM-backed satellite
imagery CAPTCHA would have a huge amount of source material readily

> Unfortunately nobody came up until now with one, which fulfilled the
> properties of a reCAPTCHA, i.e. fast and easy to understand challenge by
> humans.

I guess it's the usual problem of plenty of people having their idea
about a feature, but it takes one person to actually go and do the
work before they reallize that they had different features in mind, or
that it was a bad idea... Thanks for working on that :)

