[Tile-serving] [openstreetmap/osm2pgsql] Memory usage during phase 2 flex processing (#1535)

mboeringa notifications at github.com
Sat Jul 10 12:23:38 UTC 2021


> You can't compare the amount of memory used by some Lua structure with how much this will take in the database. Memory usage in Lua will be much larger.

Yes, I realize that the type of data structure used, and the way the data is stored, can make a huge difference. I recently had to handle a slightly similar issue, where I needed to store unique IDs and the vertex count of polygons for a multi-threaded Python application. My first, naive, approach was to store this information as a nested Python lists-in-list structure, with a separate sub-list for each polygon record. With a few hundred million records, memory soared to over 130 GB... Reading up more about Python objects and memory consumption, I finally settled on re-implementing this as one big 'numpy array', which probably reduced memory consumption by a factor 20x.

> To figure out where the problem is, I suggest running the exact same config, but with the one line removed where you are actually storing anything in the global variable.

Yes, thanks for the suggestion. I will attempt that. It will take some time before I can report the results though, as I have another process running that I would like to finish first. 

The code involved though, is this by the way:

```
phase2_admin_ways = {}

...

function osm2pgsql.process_way(object)
    if osm2pgsql.stage ==  1 then
        if clean_tags(object.tags) then
            return
        end

        local area_tags = isarea(object.tags)
        if object.is_closed and area_tags then
            add_polygon(object.tags)

            if z_order(object.tags) ~= nil then
                add_transport_polygon(object.tags)
            end
        else
            add_line(object.tags)

            if z_order(object.tags) ~= nil then
                add_transport_line(object.tags)
            end

            if roads(object.tags) then
                add_roads(object.tags)
            end
        end
    elseif osm2pgsql.stage == 2 then
        -- Stage two processing is called on ways that are part of admin boundary relations
        local props = phase2_admin_ways[object.id]
        if props ~= nil then
            tables.admin:add_row({admin_level = props.level, multiple_relations = (props.parents > 1), geom = { create = 'line' }})
        end
    end
end

function osm2pgsql.process_relation(object)
    -- grab the type tag before filtering tags
    local type = object.tags.type
    object.tags.type = nil

    if clean_tags(object.tags) then
        return
    end
    if type == "boundary" or (type == "multipolygon" and object.tags["boundary"]) then
        add_line(object.tags)

        if roads(object.tags) then
            add_roads(object.tags)
        end

        add_polygon(object.tags)

    elseif type == "multipolygon" then
        add_polygon(object.tags)

        if z_order(object.tags) ~= nil then
            add_transport_polygon(object.tags)
        end
    elseif type == "route" then
        add_line(object.tags)
        add_route(object)
        -- TODO: Remove this, roads tags don't belong on route relations
        if roads(object.tags) then
            add_roads(object.tags)
        end
    end
end

function osm2pgsql.select_relation_members(relation)
    if relation.tags.type == 'boundary'
       and relation.tags.boundary == 'administrative' then
        local admin = tonumber(admin_level(relation.tags.admin_level))
        if admin ~= nil then
            for _, ref in ipairs(osm2pgsql.way_member_ids(relation)) do
                -- Store the lowest admin_level, and how many relations it used in
                if phase2_admin_ways[ref] == nil then
                    phase2_admin_ways[ref] = {level = admin, parents = 1}
                else
                    if phase2_admin_ways[ref].level == admin then
                        phase2_admin_ways[ref].parents = phase2_admin_ways[ref].parents + 1
                    elseif admin < phase2_admin_ways[ref].level then
                        phase2_admin_ways[ref] = {level = admin, parents = 1}
                    end
                end
            end
            return { ways = osm2pgsql.way_member_ids(relation) }
        end
    end
end
```


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/issues/1535#issuecomment-877629626
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20210710/3659a002/attachment.htm>


More information about the Tile-serving mailing list