[Tile-serving] [openstreetmap/osm2pgsql] New "flex" backend (#1036)

Jochen Topf notifications at github.com
Thu Dec 19 10:44:21 UTC 2019


In the past weeks I have started working on a new output backend. I am calling this the "flex" backend, because it is more flexible than any of the other backends and should eventually be able to support all use cases currently supported by the other backends and many more. This flexibility is mostly due to an increase in the use of Lua scripts which are used for configuration as well as more powerfull callback functions.

This work has mostly been triggered by #230, but it also touches many other issues like #901 .

Here are some design thoughts:

# Osm2pgsql design for a more flexible backend and Lua integration

We write a new backend according to the ideas presented in this document. The
new backend should be able to do everything the current backends can do and
would replace them in the long-term. (This also means that the C++-only
transforms without Lua will not be supported forever.)

# Table configuration

The new setup can work with any number of database tables. We provide Lua
functions define tables and their columns. Different tables can have different
column setups, for instance some tables might have lots of columns for tags
("old-style"), some have only an HSTORE or JSONB column for tags.

# Lua callbacks

For every OSM object (node, way, or relation) that is processed by osm2pgsql, a
Lua callback is called with the data from this object as parameter. The Lua
callback can process the data of the OSM object in any way it likes and emits
zero, one, or more table entries (for any table) that should be added by
osm2pgsql. This is done by calling an `add_row()` function on those tables.

Processing in a Lua callback includes but is not limited to:

* Decide which table or tables (if any) the data should go into.
* Map attributes and tags from OSM objects to tables and their columns.
* Convert data types (oneway=yes as boolean, width=10ft as integer in meters,
  put several tags into an HSTORE or JSON(B) column, etc.)
* Convert geometries (in limited ways), like turning around line strings or
  calculating centroids.

Part of this functionality would be supported by C++ functions callable from
Lua (something like geom = MakeCentroid(geom)).

# Getting data of related objects

For some use cases we need data from related objects that we want to add to the
currently processed object. We might, for instance, want the roles and tags
from all relations a way is a member of to be added to the way data in some
form.

This is solved by allowing Lua functions to
* tell osm2pgsql that it should process certain OSM objects (`mark` the object)
* store data in temporary storage in memory
* retrieve data from temporary storage

So in the mentioned use case the following would happen:
1. When processing a way with, say a `highway` tag, the Lua script tells
   osm2pgsql: Please find all relations having this way as member and call
   a Lua callback function later to re-process this way.
2. In a later step osm2pgsql processes relations (either because they are in
   the input data and/or it gets relations out of the existing database) and
   calls the Lua callback function with the data of the way from step 1 and
   the relation data.
3. This Lua callback can now use all of the data it gets and put them into
   any table. In the mentioned use case it could add a PostgreSQL ARRAY
   with relation tags to the way, or merge specific tags in a comma-separated
   string, or whatever is needed.

Note that this design means that finding related OSM objects is restricted to
one level of relatedness. So you can bring together ways with their nodes and
relations with their member nodes, ways, or relations. But further "nesting"
of relations can't be resolved. This simplifies the design here considerably
and should support enough uses cases. More complicated processing has to be
done somewhere else.

# Drawbacks of the design

The design has its problems:

* Reliance on Lua: osm2pgsql would long-term not work without Lua any more.
  If this is seen as a problem, we could always keep the C++ transforms and
  users of that just don't have all possibilities then.
* Lua and performance: We would rely more heavily on Lua scripts which could
  have performance impacts. From the experience with current osm2pgsql it
  seems that this isn't too bad and we just have to be careful when
  implementing things. The bottleneck is probably always going to be the
  database anyway.
* Because the database schema becomes more flexible and more processing is
  done in Lua, there might be more cases where you need a full re-import
  after some changes. On the other hand: We *want* pre-processing where it
  is cheap and easy instead of relying on complex SQL queries at rendering
  time. This is a tradeoff that every user has to make themselves. We allow
  lots of flexibility by having support for HSTORE and JSONB, so that's
  always an option.

# Advanced Issues

## Extra data

For some uses cases it might be useful to allow extra data to be added to
the processing step. For instance we might want to have height data from
somewhere outside OSM and add it to geometries. Or we want to add to each
highway the information whether it is in a country with left- or
right-hand-traffic.

The design of the Lua API should keep this in mind and allow this extra data
to be made available somehow, but this is for a future step.

## Prefiltering in C++

It might be useful for better performance to allow some kind of optional
pre-filtering in C++ before Lua functions are called. So we might want to
be able to define that a Lua function is only called for all OSM objects with
a `highway` tag. This way we can save us the expensive Lua calls for everything
that's not a highway.

We can see how the performance looks before deciding whether we need this, but
it would change the configuration needed pretty fundamentally, so we have to do
this before finalizing the implementation.

## Import vs. Update Mode and Separation of Passes

The Lua scripts should have the information whether we are in "create" or
in "append" mode. Or we have separate scripts for the modes.

We might also want to tell the Lua scripts whether they are run in pass 1
(when reading the data) or in pass 2 (when working on "marked" data).

It is currently unclear how the two passes needed for append mode and the
two passed needed for the more complex relationship processing will interact.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/issues/1036
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20191219/430c5aa2/attachment.html>


More information about the Tile-serving mailing list