[Tile-serving] osm2pgsql DB keeps corrupting on power loss

Sat Aug 10 13:35:26 UTC 2019

Hi,

I was able to free up another ~100GB of space(64GB from moving my swap 
to a different disk, and 47GB from turning off reserved blocks 
temporarily). I have restarted the import and it's currently generating 
the planet_osm_polygon indexes.

I'm using a 1TB SSD but I have a Nominatim planet import which is using 
up some disk space. The flat file and planet.osm.pbf both live on a RAID 
5 array of 3 10K RPM SAS disks.

Based on version number, I'm running the latest version of osm2pgsql. 
However, it's probably been around 2 months since I last downloaded the 
source code and compiled it.

The SSD is currently 94% full but it stopped going up(It's been 
generating planet_osm_polygon indexes for around 10 hours). The last 
time the disk storage was climbing was 2-3 hours ago.

It looks like it should be good this time, but if it's not I'll try 
symlinking the nominatim DB onto a different disk.

Stephen

On 2019-08-09 4:30 p.m., Sarah Hoffmann wrote:
> On Thu, Aug 08, 2019 at 08:31:28AM -0300, Stephen D wrote:
>> Hi again,
>>
>> The import unfortunately still ran out of space, but it got farther. I'm
>> missing 1 index, which I'm regenerating now, and 1 table is still unlogged
>> (planet_osm_polygon)
> Interesting. I would have expected that the dropped middle tables give
> you enough space to do the copies. I trust that you are using the latest
> version of osm2pgsql and run the import with --drop?
>
>> I'm regenerating the index now. Afterwards, can I simply run "ALTER TABLE
>> planet_osm_polygon SET LOGGED"? I'm assuming no, since the data won't be
>> sorted geographically that way.
> You could do it. The geographic sorting is really just an optimisation.
> The original tables are otherwise fine. However, my understanding is that
> tables can't just be converted and a 'SET LOGGED' internally also results
> in making a copy of the table. In that case you likely run out of disk
> space again.
>
> At this point I would really recommend buying a larger disk. If that is
> not an option, have a look at compressed zfs. People have reported that
> it plays well with the osm2pgsql import.
>
> Sarah
>
>> Thanks,
>>
>> Stephen
>>
>> On 2019-08-05 4:28 a.m., Sarah Hoffmann wrote:
>>> Hi,
>>>
>>> On Sun, Aug 04, 2019 at 07:19:06PM -0400, Kevin Kenny wrote:
>>>> On Sun, Aug 4, 2019 at 6:30 PM Stephen - junkmail <junkmail at scd31.com> wrote:
>>>>> If I understand that right, it means planet_osm_polygon and planet_osm_line are unlogged. That makes sense as they're the ones being corrupted. I am absolutely positive I didn't use the --unlogged option, especially when I reimported after I received your email. Is there anything else that would cause the tables to be unlogged?
>>>> If I recall correctly, the tables are unlogged during the import for
>>>> speed. That's ordinarily not a risk, since osm2pgsql doesn't provide
>>>> any method to restart an import in progress.
>>>>
>>>> But what you're doing is to build the indices manually, without
>>>> turning logging on at the end, after osm2pgsql has aborted. That's
>>>> certain to leave logging turned off, since neither you nor osm2pgsql
>>>> has turned it on.
>>> To elaborate on that: the data is initially imported into unlogged
>>> tables and then copied over into normal tables after the import is
>>> finished. This is done so that the final tables are sorted
>>> geographically which increases speed for rendering later quite a bit.
>>> The obvious disadvantage is that you need twice the disk space when
>>> importing.
>>>
>>> If your import aborted for lack of disk space, then you are not only
>>> missing the indexes but your table also wasn't copied into normal
>>> tables.
>>>
>>> Try importing with the --disable-parallel-indexing. This will force
>>> the tables to be copied one after another instead of in parallel and
>>> should save a bit of space (at the expense of taking a bit longer).
>>>
>>> Sarah