[OSM-dev] Question: Tags key and value maximum length

Lars Francke lars.francke at gmail.com
Tue Apr 21 11:55:40 BST 2009


In that case I want to propose a change as 255 chars is a rather
arbitrary value for a maximum and I definitely see use cases where
more characters are needed. I just had a look at the first node that
was truncated (20833623) and Hamburg is now missing a lot of its post
codes. If anyone ever runs the opengeodb import again I can imagine
that there might be problems. In the first 10 truncations there was
one tag that probably was an error the rest was perfectly valid text
:(

255 chars is a bit small for a few descriptions, automatic imports or
notes as well. If you want a limit at least choose something like 1000
or even 10000. As the truncated tags show it isn't used very often but
there are valid uses.

But as we are using PostgreSQL now a limit is not really neccessary.
>From the PostgreSQL documentation:

"The storage requirement for a short string (up to 126 bytes) is 1
byte plus the actual string, which includes the space padding in the
case of character. Longer strings have 4 bytes overhead instead of 1.
Long strings are compressed by the system automatically, so the
physical requirement on disk might be less. Very long values are also
stored in background tables so that they do not interfere with rapid
access to shorter column values. In any case, the longest possible
character string that can be stored is about 1 GB. (The maximum value
that will be allowed for n in the data type declaration is less than
that. It wouldn't be very useful to change this because with multibyte
character encodings the number of characters and bytes can be quite
different anyway. If you desire to store long strings with no specific
upper limit, use text or character varying without a length specifier,
rather than making up an arbitrary length limit.)

"There are no performance differences between these three [character,
character varying, text] types, apart from increased storage size when
using the blank-padded type, and a few extra cycles to check the
length when storing into a length-constrained column. While
character(n) has performance advantages in some other database
systems, it has no such advantages in PostgreSQL. In most situations
text or character varying should be used instead."

Please consider this as I think it will make OSM a bit more future proof.

Thanks,
Lars




More information about the dev mailing list