[OSM-dev] [OSM-talk] osm2pgsql hstore (was: Wind turbines no longer rendered on mapnik layer)

Fri Feb 17 12:12:48 GMT 2012

Here is a Spatialite database for testing

http://latuviitta.org/documents/relation_test.zip

First trials suggest that joined tables with proper indexes are not at all
slow and compared with the standard tables created by osm2pgsql the query
times can be several times faster. This is not a fair comparison because
osm2pgsql tables are missing the attribute indexes but so they normally do
in the Mapnik chain. Query times vary, so try yourself. Data is osm_lines
from Berlin from some months ago.

By the way, recent development around GDAL suggests that Spatialite may be
remarkably faster with spatial and attribute queries than PostGIS.

query from osm_line from osm2pgsql
=======================
select geometry, highway from osm_line
where highway='primary'
0.531 seconds

query from joined tables
===============================
select * from line_geometry, tags
where tags.tag='highway' and tags.value='primary'
and tags.join_id=line_geometry.osm_id
0.059 seconds

About the database
==================

I made two new tables from osm_line table, one for geometries and another
for tags. I did only populate tags table with some tags. You may continue
with that, all the SQL commands I used are listed below. There may be
errors from my trials and errors pasted there but mostly they should be
OK. Some #comments added, too.

#create tables "tags" and "line_geometry"
CREATE TABLE tags (
join_id,
tag,
value)

CREATE TABLE line_geometry (
geometry,
osm_id)

#populate line_geometry from osm_line
insert into line_geometry (geometry,osm_id)
select geometry as geometry, osm_id as osm_id
from osm_line

#create indexes for joins and key/value queries
CREATE   INDEX line_osm_id_idx
ON "line_geometry"
(
osm_id
)

CREATE INDEX tags_join_id_idx
ON "tags"
(
join_id
)

CREATE INDEX tags_tag_idx
ON "tags"
(
tag
)

CREATE INDEX tags_value_idx
ON "tags"
(
value
)

#arrange some data into the tags table as key-value pairs

insert into tags(join_id,tag,value)
select osm_id,'highway',highway
from osm_line where highway is not null

insert into tags(join_id,tag,value)
select osm_id,'railway',railway
from osm_line where railway is not null

insert into tags(join_id,tag,value)
select osm_id,'foot',foot
from osm_line where foot is not null

insert into tags(join_id,tag,value)
select osm_id,'maxspeed',maxspeed
from osm_line where maxspeed is not null

-Jukka Rahkonen-

Jukka Rahkonen kirjoitti:
> Jochen Topf wrote:
>> On Fri, Feb 17, 2012 at 07:44:41AM +0000, Jukka Rahkonen wrote:
>>> Frederik Ramm <frederik <at> remote.org> writes:
>>>
>>> >
>>> > Hi,
>>> >
>>> > On 02/16/2012 07:25 PM, Graham Jones wrote:
>>> > > This reminded me of a question I have been meaning to ask for quite
>>> a
>>> > > while - is a database generated by osm2pgsql with an hstore
>>> expected
>>> to
>>> > > perform similarly to one without?
>>> >
>>> > I never ran one with hstore when I think of what this must mean for
>>> the
>>> > database engine, and storage space, then I shudder and would not be
>>> > surprised by the factor 5 you mentioned.
>>>
>>> How about doing it with relations? Let the importer program create four
>>> tables
>>> osm_point
>>> osm_line
>>> osm_polygon
>>> osm_relation
>>>
>>> These tables would each have two attributes: Geometry and osm_is.
>>> Geometries in
>>> the osm_relation would be of type "geometry collection", that is, a
>>> collection
>>> of whatever, and it could hold for example the tranport route relation
>>> with the
>>> route and bus stops and everything in one PostGIS geometry object.
>>>
>>> Then there should be one or four tables for tags (everything in one
>>> table or
>>> split to suit the geometry tables). The three attributes would be
>>> osm_id, key,
>>> and value. Osm_id would be used as a foreing key for joining geometries
>>> and
>>> attributes. If osm_id can not be guaranteed to be unique then there
>>> should be
>>> point_id, line_id, polygon_id and relation_id added to suitable places.
>>>
>>> Every attribute would be indexed. For Mapnik use where all that is
>>> needed is to
>>> do simple SQL selects.  I guess that this database would be faster to
>>> query than
>>> hstore even it also contains all the tags. It might be faster for
>>> Mapnik
>>> than
>>> the current tables with extra wide attribute schema because now
>>> attributes are
>>> not indexed at all (or are they?). SQL queries according to tags and
>>> values or
>>> even with a part of the tag and value strings would be easier to do
>>> than
>>> from
>>> hstore. People could enhance attributes for some special uses by
>>> converting
>>> max_speeds and other values which are actually measures from strings to
>>> integers
>>> or doubles etc.
>>>
>>> This option is so obvious that I believe that someone must have tried
>>> it
>>> already. It would be nice to hear about experiences.
>>
>> Much too slow. Requiring joins on every query is not a good idea. And
>> having
>> the right indexes is important. You can't just index everything and hope
>> it
>> would do the right thing. (The most important index btw is the geometry
>> index
>> not the attribute indexes, although having the right attribute indexes
>> here
>> or there can help.)
>
> Sorry, havin spatial index is so obvious that it did not even comen into
> my mind to mention it. Do you have cold numbers about speed differences or
> suggestions for good SQL queries to test with if I will get interested
> enough to do a test some day? With this database structure I do not
> foresee big difficulties with indexing, and joins with foreing keys is
> vastly used in relational databases so perhaps it is not necessarily so
> slow.
>
> -Jukka Rahkonen-
>
>> Jochen
>> --
>> Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/
>> +49-721-388298
>>
>>
>
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
>