[OSM-dev] UTF8 problem with last night's daily .osc
richard at systemeD.net
Sat Aug 30 09:12:01 BST 2008
Frederik Ramm wrote:
> Frederik Ramm wrote:
>> Closer inspection reveals that this is a tag value that has been
>> truncated at character #255, which happens to be in the MIDST of an
>> UTF-8 sequence. Ouch! Who truncates tags to 255 characters?
> It's a bit embarassing to keep talking to myself here but in case
> else is interested:
> The culprit is way #26604650 which was newly created with Potlatch
> 0.10b, apparently with the tag value being truncated in the middle
> of an
> UTF-8 sequence
Well, the relevant bit of the migration is
create_table "current_way_tags", myisam_table do |t|
t.column "id", :bigint, :limit => 64
t.column "k", :string, :default => "", :null
t.column "v", :string, :default => "", :null
and a :string means a MySQL 255-character VARCHAR (http://
rails-datatypes/)... so yes, that'll be why it's happening.
So I guess the solution is either for Osmosis to conform to Postel's
Law; or to change the datatype (presumably breaks indexing?); or for
Potlatch/amf_controller, which don't currently have any limit on key/
value lengths (well, 64k :) ), to preprocess keys/values by
truncating at the nearest UTF-8 boundary before 255 bytes.
Suggestions welcome as to how this should be done.
More information about the dev