[OSM-dev] broken utf8 in minute changeset 200907140650
Ævar Arnfjörð Bjarmason
avarab at gmail.com
Tue Jul 14 17:09:38 BST 2009
On Tue, Jul 14, 2009 at 3:19 PM, Richard Fairhurst<richard at systemed.net> wrote:
> Ævar Arnfjörð Bjarmason wrote:
>> * Potlatch will enter whatever raw binary string the user
>> supplies into the database that the main API would reject
>> as an invalid request, hence the corrupt data
>
> Sort of.
>
> From a client point of view, the bug you filed is that Linux Flash Player
> has long been broken beyond belief and doesn't permit non-ASCII characters
> to be entered into a textfield. (See http://bugs.adobe.com/jira/browse/FP-40
> .)
Yes from a client point of view. But the server portion of Potlatch
shouldn't trust the client side to do data validation. Doing
server-side content validation equivalent to the main API would have
prevented both the issue described in ticket:1936 and presumably this
issue too.
Right now buggy and/or malicious clients can submit data via the SWF
API that can break the API for everyone that uses a real XML parser.
> As I mentioned to you the other day, it would be really useful if some
> Linux-using OSMers could expand the reports at
> http://trac.openstreetmap.org/ticket/1936 so we can find exactly _how_ FP
> for Linux is breaking encoding, and fix it either in Potlatch or at the API.
> From the two examples you give, for two-byte UTF8, it appears to be adding
> 0x03 before the first byte and 0x83 0xC2 after it. But we need to work out
> whether this is a universal pattern for all two-byte UTF8 sequences, and
> what happens with longer sequences. This should be fairly trivial for
> someone with the Rails port installed on a Linux machine, I'd hope.
I regret filing that bug-report as is, it should really be split into
two separate issue:
* Potlatch's server side doesn't do validations of client-supplied
data equivalent to the main API
* Potlatch's server side could be really clever and automagically do
what the user means to work around one specific client bug in one
specific flash player client on Linux
But yes, it would be nice if someone supplied sufficient information
to solve the second issue, but the first issue is the important.
>> And as has been pointed out there's an ambiguity as to what
>> sequences of bytes can be written to the database whether that
>> be full UTF-8 or some XML subset of it.
>
> Indeed.
.. But for the time being modifying the server portion of Potlatch to
only accept what the main API accepts would put a stop to these data
corruption errors.
More information about the dev
mailing list