[OSM-dev] way 27483626 UTF-8 truncation
brett at bretth.com
Sat Oct 4 01:15:54 BST 2008
Florian Lohoff wrote:
> On Fri, Oct 03, 2008 at 01:36:31PM +0100, Matt Amos wrote:
>> Subject: [OSM-dev] way 27483626 UTF-8 truncation
>> i just noticed that the hourly change file
>> 2008100310-2008100311.osc.gz has an invalid UTF-8 string in the note
>> tag for way 27483626 (
>> http://www.openstreetmap.org/browse/way/27483626/history ). i have
>> trunctated it to the nearest word, so this email is just to give
>> forewarning that hourly or daily diff imports today might have a bit
>> of trouble.
>> its the same problem as discussed here
> Another 2 change files contain utf-8 bugs and osmosis refuses to process
Any idea which nodes or ways are broken in these?
This isn't an osmosis bug. The database now has incorrect/corrupted tag
data in the history tables that needs to be corrected. Following the URL:
results in random results from the API.
If we can identity the broken records we can ask TomH nicely to fix
them. I can then move osmosis backwards in time to re-generate the
affected time period. I don't know how this broken data gets created in
the first place. There was some discussion about this the last time it
happened, I'll have to try to dig up the emails.
It's not simple to fix osmosis to prevent this occurring. Osmosis is
reading doubly encoded data from the database and removing the double
encoding as it writes to the xml file. It's a hack and there is no
simple way of verifying the data before it gets written to the file. I
have a local process running at home verifying the output which has
detected the problem, but I was asleep at the time it occurred :-)
More information about the dev