[OSM-dev] Planet Dump Timings

Dave Stubbs osm.list at randomjunk.co.uk
Sun Sep 9 12:06:19 BST 2007


On 09/09/07, Sebastian Spaeth <Sebastian at sspaeth.de> wrote:
>
> On Sun, Sep 09, 2007 at 01:47:27AM +1000, Brett Henderson wrote:
> > The rounding differences so far have always been when a 5 is rounded one
> > way or the other so I don't think this is a concern.
>
> Given that the 7th decimal place is something like 1cm (or was it 1mm?) at
> the equator, I am not very concerned that this makes a big difference. It's
> interesting though, that osmosis doesn't seem to do proper rounding here. Do
> you use sprintf to output your values?



osmosis will probably be using some form of DecimalFormat to output doubles.
This uses half-even (unbiased) rounding. sprintf may or may not use this
depending on what system you're on. I'm guessing this will represent the
discrepancy.



> The handling of
> > non-latin characters concerns me but I have no idea if it's a serious
> > problem or not.
>
> > "Kronprinsesse Märthas allÃÂ(c)" is written by planet.rb (viewed by
> > less).
> > "Kronprinsesse MÃÆärthas allÃÆÃÂ(c)" is written by osmosis
> > (viewed by less).
> > "Kronprinsesse Märthas allÃÂ(c)" is displayed by MySQL Query
> > Browser.
> > "Kronprinsesse Märthas allé" is displayed when viewed from the API in
> > firefox.
>
> > Do you know much about unicode? Is there a way I can verify which of
> these
> > outputs is correct (if not both)?
>
> OK, I have to admit that my charset knowledge is rather minimal, so  don't
> know how serious this is. However given that planet.rb and osmosis have
> different output doesn't look good. The API should hand out all strings in
> UTF-8 as far as I knows
>
> It seems that Java uses a modified version of UTF-8 for its stream:
> http://en.wikipedia.org/wiki/UTF-8#Java. looking at your example (loaded
> in Firefox, I see):
> Node 78270
> "Kronprinsesse M=C3=C2=A4rthas all=C3=C2=A9" is written by planet.rb (vie=
> wed by less).
> "Kronprinsesse M=C3=C6=C3=C2=A4rthas all=C3=C6=C3=C2=A9" is written by os=
> mosis (viewed by less).
> "Kronprinsesse M=C3=83=C2=A4rthas all=C3=83=C2=A9" is displayed by MySQL =
> Query Browser.
> "Kronprinsesse M=E4rthas all=E9" is displayed when viewed from the API in=
> =20
> firefox.
>
> For example, the first letter ä is described as
> http://www.fileformat.info/info/unicode/char/00e4/index.htm in
> UTF-8/16/32.
>
> Somebody else who has got more experience with that should know what to do
> with that, as i said, I have no experience with that.



It looks like the input string is not utf-8 to start with.
ie: both the planet.rb and osmosis outputs are wrong.

Is this running from the same system as the normal planet dump?
I'm just wondering if the mysql connectors are in the wrong charset mode, or
else the data was imported badly in the first place.
The correct output for the first letter should be 0xC3 0xA4 whatever.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070909/8763230e/attachment.html>


More information about the dev mailing list