[OSM-dev] strange Osmosis/XML/...? problem yesterday

Andy Allan gravitystorm at gmail.com
Fri Aug 14 16:31:37 BST 2009


On Fri, Aug 14, 2009 at 3:16 PM, Andy Allan<gravitystorm at gmail.com> wrote:
> On Fri, Aug 14, 2009 at 12:54 PM, Frederik Ramm<frederik at remote.org> wrote:
>> Hi,
>>
>> Frederik Ramm wrote:
>>> The result file should have been something like 400 bytes. This sounds
>>> trivial but in the original case where the .osc contained a large number
>>> of these characters, I suddenly had 2 MB of data in one tag.
>>
>> I forgot to mention: I'm posting this here on dev and not on the osmosis
>> list because it seems that other (at least Java) programs are also
>> affected; someone fixed then node later with a commit comment of "JOSM
>> says string too long" or so...
>
> The code points for these gothic characters are fine. See the
> following (awesome) site:
>
> http://decodeunicode.org/en/gothic
>
> A rough transliteration is HEJSPANOA. However, they lie outside the
> Basic Multilingual Plane (BMP) and can't be represented by a 16bit
> integer. Java stores characters internally as 16-bit UCS-2 characters
> and so everything is going horribly wrong.

Installing an SMP-aware font shows what JOSM is doing more easily than
reading Unicode code-points.

http://code2000.net/code2001.htm

I'll keep my (horrid) transliterations going here for the sake of everyone else.

v31 - HEJSPANOA
v32 - HHEHEJHEJSHEJSPHEJSPAHEJSPANHEJSPANOHEJSPANOA

i.e. the first letter, the first two letters, the first three letters etc.

I can see how you can quickly end up with a 2MB tag using this encoding scheme!

Cheers,
Andy




More information about the dev mailing list