[OSM-dev] usernames, keys, and values

Anthony osm at inbox.org
Thu Oct 29 18:16:03 GMT 2009


On Thu, Oct 29, 2009 at 1:31 PM, Matt Amos <zerebubuth at gmail.com> wrote:
> On Wed, Oct 28, 2009 at 3:38 PM, Andy Allan <gravitystorm at gmail.com> wrote:
>> I would say that if the dump code and
>> http://www.w3.org/TR/REC-xml/#NT-Char
>> are in conflict, there's a bug in the dump code. But since I'm not
>> going to fix it, maybe I'll keep my opinions quiet :-)
>>
>> As for the rails code, there is (AFAIK) no explicit character
>> checking. The server implicitly relies on libxml to ensure the
>> characters in the XML requests and responses are only those allowed by
>> the XML spec above.
>
> there is explicit checking in the potlatch API, as that doesn't go
> through libxml:
>
> http://trac.openstreetmap.org/browser/sites/rails_port/app/controllers/amf_controller.rb#L909

There doesn't seem to be a spec, so everyone's just making it up as
they go along.

But I'm going to attempt to clarify, with a quote from W3: "In
attribute values, the character information items TAB (#x9), newline
(#xA), and carriage-return (#xD) are represented by "&#x9;", "&#xA;",
and "&#xD;" respectively."
(http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html)

Including a tab, newline, or carriage return unescaped in an xml
attribute would clearly be incorrect.  But as long as it's escaped,
it's valid xml.  <tag k="name" v="line 1&#xA;line 2" /> is valid xml.
It may or may not represent valid OSM data.  This is why I'm saying my
question has nothing at all to do with XML.

Apparently under that potlatch code, tabs, carriage returns, and
newlines are not allowed in keys or values (I don't actually know
ruby/rails enough to say for sure, but that seems to be what Matt just
pointed out).  On the other hand, usernames apparently *can*, at this
point, contain these characters.  Actually changing one's username to
include them would require using an input method other than the web
page, but I don't see any code to forbid this.

On the other hand, the planet dump code is silently changing control
characters to "?".  This could cause problems (for instance, two
usernames might wind up being silently changed to identical values),
though it would probably require a deliberate attack.

I wonder, what happens if someone enters tabs into keys or values
through the API (where there apparently are no checks for this), and
then someone tries to edit it in potlatch?  Looks like a denial of
service attack to me.

It would be a good idea to release an official spec on exactly what
characters are allowed in keys, values, and usernames.  Just
disallowing control characters (decimal value less than 32) altogether
would probably be the best.  But if the decision is made to allow
them, fine, they need to be handled properly.




More information about the dev mailing list