[Tile-serving] [openstreetmap/osm2pgsql] Handling of schemas (Issue #2011)

Mon Jul 24 08:43:11 UTC 2023

We have an issue with the way we are handling schemas that we need to solve. This has been made worse [with the changes in PostgreSQL 15 around schema permissions](https://www.crunchydata.com/blog/be-ready-public-schema-changes-in-postgres-15).

Historically osm2pgsql didn't support schemas at all, everything was simply done without mentioning any schema, which normally meant tables, indexes, etc. ended up in the `public` schema which is always there and used to be writable by every database user. Then users wanted to be able to put tables, etc. into specific schemas, so we added functionality for that. If you set a schema (on the command line or in the flex config file), that schema is used, otherwise the former behaviour is used.

Note that there is a difference between "not using a schema at all" and "using the `public` schema". In the second case the schema is hardcoded to be `public`, it just happens to be the one that would also be used by default in many cases. In the first case no schema is specified, which means PosgtreSQL will use the `search_path` and uses the first schema that exists in that search path to create objects. By default the `search_path` is `"$user", public` which means if there is a schema with the name of the user running osm2pgsql, that schema is used, otherwise `public` is used.

It used to be that nobody has a schema with the name of their user, so no schema basically meant `public`. But with the change in Pg 15 this assumption is wrong. The `public` schema isn't writable any more in databases that you don't own, which lead to more people using schemas, specifically the schema with the same name as their user, which is what a lot of docs recommend you do. This breaks the implicit assumption that "no schema == `public`" which we have in our minds and which is also in our code in some places.

Now there are probably three common setups for osm2pgsql databases:
1. Use one user as owner of the database and use the **same** user for osm2pgsql. In that case you don't need any special schemas, the owner of the database can create objects in the `public` schema.
2. Use one user as owner of the database and use a **different** user for osm2pgsql. Add a schema with the same name as the user used by osm2pgsql, don't mention any schemas on the osm2pgsql command line/config. That schema should be used by osm2pgsql if the `search_path` was not changed from the default. (This case is currently broken by some assumption in the case that "no schema == `public`".)
3. Create specific schema(s) for osm2pgsql and explicitly use them on the command line/config. No problem there.

Now there is the question how we want things to be. There are basically two options:
1. Remove any assumptions in the code that  "no schema == `public`". Things will mostly "just work", but the user will have to understand how the `search_path` affects schema choice and that they possibly need that schema named after the user. Any fiddling with the `search_path` will affect the behaviour of osm2pgsql, possibly breaking things in more or less subtle ways, especially if the `search_path` is changed after import but before updates are run or so.
2. Always force use of a schema everywhere in the osm2pgsql code. The default would change from "no schema" to the `public` schema. If you can't write to the public schema osm2pgsql will error out early on. We can clearly tell the user what to do in that case (create schema, set the schema in osm2pgsql or make `public` writable). Changes to the `search_path` will not affect osm2pgsql any more. The code becomes simpler, because we don't need to differentiate any more between "no schema" and "some schema". Users don't have to understand the details of the `search_path` any more.

This issue came to light now with #2010 which is caused by a "no schema == `public`" assumption in the new code that deals with the `osm2pgsql_properties` table. To figure out whether that table exists the code checks the `pg_tables` table in the database and it uses `public` as schema. What we would have to do is take the `search_path` into account for that check which would be difficult. Instead we probably would just try to read from the table and if that fails assume that the table is not there.

All in all I think option (2) would be the better solution, because it is conceptually simpler and doesn't rely on the `search_path`. But it is a change from current behaviour and would mean some users will have to add command line options / configs settings which they didn't have to do before.

@pnorman, @lonvia opinions?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/issues/2011
You are receiving this because you are subscribed to this thread.

Message ID: <openstreetmap/osm2pgsql/issues/2011 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20230724/238be3b9/attachment.htm>