[OSM-dev] [Talk-us] Abbreviating names in tools
Toby Murray
toby.murray at gmail.com
Thu Oct 4 20:47:09 BST 2012
On Thu, Oct 4, 2012 at 1:26 PM, andrzej zaborowski <balrogg at gmail.com> wrote:
> Hi,
> there's been a lot of talk at one point about abbreviating names in
> the OSM database vs. doing it when processing the data at consumers
> end. Since mapnik now supports alternative label placements I gave
> rendering automatically abbreviated names a try. This resulted in a
> (so far) tiny C library (https://github.com/balrog-kun/shrtnms) that
> abbreviates the names of map features that you give it. It's very
> rough but it already handles a couple of the main corner cases. It's
> just a start on collecting the list of all the abbreviations
> applicable in all the map languages. Currently only has basic lists
> for Polish, Spanish and English, where the rules don't differ so much.
> As an example German is more tricky.
>
> I see two ways to use it for rendering:
>
> * inside mapnik stylesheets, perhaps by calling the C function from
> the SQL queries. This would require adding the postgres bindings,
> something I've never done.
> * inside osm2pgsql so that the abbreviated names are stored in table columns.
>
> As a quick hack I went for the latter option. It gets tricky with
> hstore and multilingual maps but it works as a first attempt. I have
> a patch at http://osm.trail.pl/osm2pgsql/0002-Generate-short-name-columns.patch
> that makes necessary (small) changes and a snapshot of the library
> code, to osm2pgsql code.
>
> It ignores the actual language of a name and just tries to apply all
> the possible abbreviations from its list. This will eventually need a
> solution perhaps based on the location of a given map object and a
> fixed list of country polygons with their main languages. Perhaps it
> can use a common solution with highway shields rendering.
>
> One of the stylesheets at osm.trail.pl currently uses this code (only
> Europe imported at this time, but I CC'd talk-us because there was a
> long thread about this at one time). You can see that at z16 here
> "Norbroke Street" is spelt in full:
> http://a.osm.trail.pl/osmapa.pl/16/32723/21790.png
>
> and z15 shows Norbroke St when there's not enough space. (That
> stylesheet would also merge the two segments of Norbroke St and only
> show the name once, had that street not had a gap there.)
> http://b.osm.trail.pl/osmapa.pl/15/16361/10895.png
>
> How to apply
>
> In addition to the osm2pgsql patch you also need to add the
> auto-generated tags short_name and shortest_name to your osm2pgsql
> .style file so that they end up in the mapnik db. Then in the mapnik
> 2 stylesheet where you use a <TextSymbolizer
> foo=bar>[name]</TextSymbolizer>, you need to change it to:
>
> <TextSymbolizer foo=bar
> placement-type="list">[name]<Placement>[short_name]</Placement><Placement>[shortest_name]</Placement></TextSymbolizer>
>
> If there's a tag short_name or shortest_name in OSM data, they'll
> override the autogenerated versions. The difference between those two
> columns is in the degree to which they try to shorten the name, for
> instance for name=West Fulton Street, the new tags will be:
>
> short_name=W Fulton St
> shortest_name=Fulton
>
> For Polish if a street is named after a person, the person's
> first/second names will be first shortened to their initials and then,
> if necessary, omitted as is normally seen in cartography. For Spanish
> all articles and prepositions are also stripped. Unfortunately if the
> same rules are applied to US city names that often come from Spanish
> (Los Angeles) the same thing will happend (resulting in "Angeles"
> which makes no sense). I also notice that different spanish
> abbreviations are used in Spain (Avenida -> Avda.) than in Latin
> America (Avenida -> Av.), so it's something to have in mind if running
> a global tile server or other service.
>
> Suggestions and lists of missing phrases are welcome, perhaps through
> github (but I'm away from my main machine this week). If this is
> something that users want to see in map tiles then that list will be
> needed at some point even though the code doesn't yet handle all the
> nuances correctly.
Very nice work. I'm not sure if this will be of help to you or not but
as part of writing my ogr2osm translation for TIGER 2011 I converted
the TIGER technical documentation appendix E (Feature Name Types) from
PDF to a CSV file for easier processing. TIGER probably uses some
abbreviations that people may not always expect but it is a handy list
to have at least.
https://github.com/ToeBee/ogr2osm-translations/blob/master/tiger2011_abbrev.csv
Toby
More information about the dev
mailing list