[OSM-dev] UTF-8 problems in informationfreeway?

Stefan Baebler stefan.baebler at gmail.com
Fri Jan 4 11:13:49 GMT 2008


On Jan 4, 2008 11:22 AM, 80n <80n80n at gmail.com> wrote:
> The 100 character truncation was a bug in Osmxapi, which is fixed now.
Goodie.

> About 600 tags (out of 180 million) are affected in Osmxapi's database,
> these will get corrected as they percolate through from the osmosis feed.
Surprisingly high number, given that the test was rather synthetic,
with extremely high proportion of accented characters. But i guess
this i not uncommon in exotic parts of the world (Asia comes to mind).

Anyway, I've just changed the test node (in potlatch this time) so it
should appear in the next hourly changeset.

Keeping my fingers crossed and an aye on
http://osmxapi.hypercube.telascience.org/api/0.5/node%5bplace=town%5d%5bbbox=14.5,46.1,14.8,46.2%5d

Stefan

> On Jan 4, 2008 10:17 AM, Stefan Baebler <stefan.baebler at gmail.com> wrote:
> > Khm, but where to set the new limit?
> >
> > According to http://www.facstaff.bucknell.edu/rbeard/name.html
> > it should be 310 bytes = (
> >
> length("Krungthepmahanakonbowornratanakosinmahintarayudyayamahadiloponoparatanarajthaniburiromudomrajniwesmahasatarnamornpimarnavatarsatitsakattiyavisanukamphrasit")
> > -1) * 2 to accomodate slightly shorter names, with all of the
> > characters being exotic, needing 2 bytes to encode :)
> >
> > Or the limits can be simply taken from the official OSM schema (for
> > ways at least, tags of nodes are a mess with semicolons).
> >
> > Stefan
> >
> >
> >
> >
> >
> >
> > On Jan 4, 2008 9:45 AM, 80n <80n80n at gmail.com> wrote:
> > > Hmmm... yes it's truncating at 100 characters.  Working on a fix...
> > >
> > >
> > >
> > > On Jan 4, 2008 7:26 AM, Stefan Baebler <stefan.baebler at gmail.com> wrote:
> > > > Hi again!
> > > >
> > > > Osmxapi behaves much better now, but there is a problem with my test
> node
> > > > in planet.osm it is:
> > > > <node id="29161753" timestamp="2007-12-22T05:59:49Z" lat="46.1356895"
> > > >
> > > > lon="14.7445634">
> > > > <tag k="created_by" v="JOSM"/>
> > > > <tag k="name" v="Moravče"/>
> > > >
> > > > <tag k="is_in" v="Slovenia, Europe"/>
> > > >
> > > > <tag k="place" v="town"/>
> > > >
> > > > <tag k="note" v="Testing 34 random UTF-8
> > > > characters:ČčŽžŠšĐđĆć€ÄäËëÖöÜüŁłßÇç÷פ§ÉéÁáÂâ"/>
> > > > </node>
> > > > while
> > > >
> > >
> http://osmxapi.hypercube.telascience.org/api/0.5/node%5bplace=town%5d%5bbbox=14.5,46.1,14.8,46.2%5d
> > > > gives
> > > > <node id="29161753" lat="46.1356895" lon="14.7445634"
> > > >
> > > > timestamp="2007-12-22T05:59:49Z">
> > > > <tag k="is_in" v="Slovenia, Europe"/>
> > > >
> > > > <tag k="name" v="Moravče"/>
> > > >
> > > > <tag k="note" v="Testing 34 random UTF-8
> > > > characters:ČčŽžŠšĐđĆć€ÄäËëÖöÜüŁłßÇç÷פ§ÉéÁ�.."/>
> > > >
> > > > <tag k="place" v="town"/>
> > > > </node>
> > > >
> > > > Note that the last 2 characters in note tag should be "Ââ".
> > > > Planet.osm is ok, but osmxapi seems to misinterpret some characters.
> > > > any ideas?
> > > >
> > > > UTF characters in hourly diffs and their import into osmxapi still
> need
> > > > to be checked.
> > > >
> > > > Stefan
> > > >
> > > >
> > > >
> > >
> > >
> >
>
>


More information about the dev mailing list