[Tile-serving] [openstreetmap/osm2pgsql] Making indexes more flexible (Issue #1780)
Jochen Topf
notifications at github.com
Wed Sep 28 15:17:47 UTC 2022
*All of the following concerns only the flex output. We are not planning to change any of the other outputs.*
One of the most asked for features is to have more functionality around indexes:
* #27
* #37
* #1311
* #1357
* #1691
## Current situation
Osm2pgsql will always create indexes on the id columns of all tables if the database is created for update (with `--slim` and without `--drop`). These indexes are needed for osm2pgsql itself, so it can do updates properly. This will not change.
In addition osm2pgsql always creates a GIST index on the geometry column of each table. If a table doesn't have a geometry column, no index is created. If it has several, only the first will get the index.
Indexes are created in parallel (up to the number set with `--number-processes`), unless `--disable-parallel-indexing` is set.
## Features we want
* Allow tables without any indexes
* Allow indexes on any column (of any column type)
* Allow indexes on several columns at once
* Allow indexes on expressions
* Allow any type of index (btree, gist, ...)
* Allow setting of WHERE conditions
* Allow setting of TABLESPACE
* Allow setting of other options like UNIQUE etc.
See also the [PostgreSQL docs for indexes](https://www.postgresql.org/docs/current/sql-createindex.html).
Also any solution must be backwards compatible, so running osm2pgsql with an unchanged config file and command line options must still do the same.
## Proposal
Add `indexes` field to the options given to the [`osm2pgsql.define_table()` command](https://osm2pgsql.org/doc/manual.html#defining-a-table).
The `indexes` fields can have one of the following values:
* `nil` (default) means to create an index on the first geometry field. This takes care of the backwards compatibilty.
* `false` means that we do not want any indexes on this table.
* a list of index definitions describing what indexes we want. The list may be empty which is the same as setting `indexes` to `false`.
## Examples
Here is an examples config showing the old way of doing things:
```Lua
local data_table = osm2pgsql.define_table{
name = "data",
ids = { type = 'any', id_column = 'osm_id', type_column = 'osm_type' },
index = nil,
columns = {
{ column = 'name', type = 'text' },
{ column = 'tags', type = 'jsonb' },
{ column = 'geom', type = 'geometry' }
}
}
```
You can set this instead if you don't want any index:
```Lua
indexes = false,
```
Or this does the same:
```Lua
indexes = {},
```
Or you can indicate exactly what you want:
```Lua
indexes = {{ column = 'geom', method = 'gist', fillfactor = 'auto' },
{ column = 'name', method = 'btree', tablespace = 'name_index_ts' },
{ column = {'street', 'housenumber'} },
{ expression = 'upper(name)', include = {'id'}, where = 'ever' }}
```
## Index options
The following index options are available. Some values for these options can be checked by osm2pgsql to allow nice error messages. Others can not be checked by osm2pgsql and we can only report the error message from the database.
Generally not all options (such as `fillfactor`) are supported for all index methods but we can not reliably check this because it might be different in different database versions.
We can create the indexes directly after creating their tables to check whether the `CREATE INDEX` command is successful and remove it again afterwards. Otherwise users might only get the error message after many hours of import.
### `column`
The column name or an array of column names to build the index on. Must be at least one column. Names are checked against available column names for this index. Can not be used together with `expression`.
### `method`
The indexing method, `btree` (default), `gist`, etc. Checked against the list of methods supported by the databases, as returned by the query `SELECT * FROM pg_am WHERE amtype='i';`.
### `tablespace`
The tablespace to use for this index. Default (`nil`) is the tablespace set by the `index_tablespace` option on the table or the default tablespace if none is set. Checked against the list of available tablespaces (`SELECT spcname FROM pg_tablespace;`).
We can keep the `index_tablespace` option on the table or possibly mark it as deprecated later on.
### `expression`
An expression interpreted as-is by the database. No check is done. Can not be used together with `column`.
### `include`
Extra columns to include, default is none. Checked against the list of columns of this table. Only available for PostgreSQL 11+.
### `fillfactor`
Set the fill factor. Allowed values are integers 1 to 100 or `auto` (default) which means set to `100` for non-updateable databases (created without `--slim` or with `--drop`) and leave unset otherwise. (This is the current behaviour for backwards compatibility.)
### `where`
Add a WHERE clause to the index creation. The content is passed through to the database without check.
### `unique`
Create a UNIQUE index. Must be set to `true` or `false` (default).
## Showing progress and help with crash recovery
See also #207 for the question of how processing progress can be shown in a better way.
With modern PostgreSQL versions it is possible to monitor index creation progress through the `pg_stat_progress_create_index` system table. If we create a table `osm2pgsql_indexes` and add all indexes to be created to that table, we can always get the current status with a simple SQL query.
We can add the SQL `CREATE INDEX` command that osm2pgsql issues to that table. This would allow advanced users to re-run index creation manually if the import failed while creating indexes. In the long run this could be part of a more automatic approach to resume failed imports.
* #799
* #1751
--
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/issues/1780
You are receiving this because you are subscribed to this thread.
Message ID: <openstreetmap/osm2pgsql/issues/1780 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20220928/aa02aafc/attachment.htm>
More information about the Tile-serving
mailing list