[OSM-dev] Questions on PostGIS

Wed Feb 6 13:27:45 GMT 2008

Hello Brett,

Brett Henderson schreef:
> Hi All,
>
> I was hoping to knock up a simple PostgreSQL schema (ie. just planet 
> data, no history) for manipulating large data sets, primarily to help 
> with extraction of bounding boxes.  I've written my own custom database 
> within osmosis but I don't think I'll ever get it to scale as I hoped so 
> a real database appears to be necessary.
>
> I'm planning to use the PostGIS support but my knowledge is extremely 
> lacking in this area.  
try http://postgis.refractions.net/docs/. The documentation is realy 
straight forward and easy to understand.
> I'm prepared to go away and do some more research 
> but given that others already have experience with this they might have 
> some ideas on how this should be done.  I haven't even looked at the way 
> the mapnik database works so point me in that direction if its 
> appropriate.  Just keep in mind that I'm trying to keep things fairly 
> simple if possible.
>
> So far I've created a node table as follows (tags are in a separate table):
> CREATE TABLE node
> (
>   id bigint NOT NULL,
>   user_name text NOT NULL,
>   coordinate point NOT NULL,
>   tstamp timestamp without time zone NOT NULL,
>   CONSTRAINT pk_node PRIMARY KEY (id)
> )
> WITHOUT OIDS;
> ALTER TABLE node OWNER TO postgres;
>
>   
* I don't know what version of postGIS you are using, but it is advised 
not to use the one shipped with postGRES. See if you can uninstall it 
and take the latest postGIS from http://postgis.refractions.net/
Anyway, the datatype point should best be replaced with GEOMETRY.
* In postGIS there is a GeometryColumn constructor:

SELECT AddGeometryColumn('parks', 'park_geom', 128, 'MULTIPOLYGON', 2 );

Which makes it possible for you to first define a regular table and add the geometry later.

When it comes to points: I often read X and Y values in normal double columns in the database. When the data is loaded, I add the Geometry column and then do a:

ST_PointFromText ('point (X Y)', 4326)  (4326 is the SRID for the lat/lon wgs84 coordinatesystem.

> CREATE INDEX idx_node_location
>   ON node
>   USING gist
>   (circle(coordinate, 0::double precision));
>
> Does this look to be on the right track?  What is the right way of 
> indexing point types?  Or should I be using different data types?  I'm a 
> bit confused about the difference between geometry types and types such 
> as point, circle, etc.
>   
I think the syntax for your GIST index simply needs to be:

CREATE INDEX idx_node_location
  ON node
  USING gist
  (my_geometry_column);

>   More importantly, how do I query within a bounding box?
>   
This is a query I use with PHP to get all the geometry elements within a 
Google Earth Window (dynamicaly determined on users field of view).
This simply says: select kml from addr_kml where my point is withing the 
box that is created dynamically.
Note the && they are important. This is also documented on the postgis site.

'select kml from addr_kml where
        wkb_geometry && SetSRID(ST_MakeBox3d(
        ST_GEOMFromText( \'POINT( '. $bboxEast . ' '. $bboxSouth . ' 
)\', 4326 ),
        ST_GEOMFromText( \'POINT( '. $bboxWest . ' '. $bboxNorth . ' 
)\', 4326 ))::box3d, 4326);';

> One other thing I noticed is that PostGIS has its own methods for adding 
> geometry columns to tables.  Is this something I should be using?  Will 
> this adversely affect the size of the on-disk data?
>   
It is more convenient for human users, but has no technical difference 
to what you are doing. so no.
> The next question is how do I support ways?  My current schema just has 
> a way table and a way_node table which aligns with the way the 
> production mysql db works.  Perhaps if I used geo-spatial types instead 
> I could do smarter things with bounding box queries (ie. find ways that 
> cross a box without any nodes residing within it).  My main concerns are 
> that this may 1. drastically increase load times 2. increase data size.
>
> The way tables in question are below (again way tags are elsewhere):
>
> CREATE TABLE way
> (
>   id bigint NOT NULL,
>   user_name text NOT NULL,
>   tstamp timestamp without time zone NOT NULL,
>   CONSTRAINT pk_way PRIMARY KEY (id)
> )
> WITHOUT OIDS;
> ALTER TABLE way OWNER TO postgres;
>
> CREATE TABLE way_node
> (
>   way_id bigint NOT NULL,
>   node_id bigint NOT NULL,
>   sequence_id smallint NOT NULL,
>   CONSTRAINT pk_way_node PRIMARY KEY (way_id, sequence_id)
> )
> WITHOUT OIDS;
> ALTER TABLE way_node OWNER TO postgres;
>   
This is indeed an issue with the datastructure from OSM. Building 
queries to combine nodes to dynamically generate ways on every request 
is a performance killer!
Basically I think it would be best to write a script that will generate 
REAL ways (polylines) in the postGIS database in a real geometric table. 
That way you will take optimized use of geometry functions.
> Sorry for the unstructured mess of thoughts above but any input would be 
> appreciated,
>   
No problem
> Cheers,
> Brett
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
>