[osmosis-dev] Cutting PBF file into 1° tiles

Mon Apr 18 13:01:06 UTC 2016

On 18/04/2016 12:23, Jochen Topf wrote:
> On Mo, Apr 18, 2016 at 11:52:24 +0200, Sylvain Melin wrote:
>> On 18/04/2016 11:00, Jochen Topf wrote:
>>> On Mo, Apr 18, 2016 at 10:10:06 +0200, Sylvain Melin wrote:
>>>> My plan is to :
>>>> - exploit a planet sized pbf file
>>>> - cut it into 1° tiles using osmosis
>>>> - filter and extract the data from these tiles as shapefiles using libosmium
>>> If you are writing your own program anyway to create those shapefiles, why
>>> don't you do the splitting in this step *after* creating the geometries and
>>> before writing them into shapefiles? That is probably much easier to do than
>>> based on the PBF due to the structure of the OSM data files.
>>>
>>> Jochen
>> Maybe I'm wrong but because I don't want to parse the full planet.osm.pbf
>> every time I want to extract a small set of data.
>> The processing time seems to grow exponentially with the size of source file
> The time of what processing exactly? I don't see anything in what you are doing
> that should scale worse then linearly. Of course if you don't have enough memory
> you'll run into problems.
>
>> so having an intermediate level with 1° sized pbf containing everything
>> seems very practical to me.
> In theory yes, but, as you noticed, you'll have to handle all objects specially
> that straddle tile boundaries.
>
>> Also, my osmium program loops over the target tile and parse the appropriate
>> pbf :
>>
>> /for each j in [-90,89]//
>> //{//
>> //        for each i in [-180,179]//
>> //        {//
>> //                create osmium::handler//
>> //                parse i_j.pbf with osmium::io::Reader//
>> //                extract data to single handler with osmium::apply//
>> //        }//
>> ////}/
>>
>> Do you think it would be more efficient to have a single big PBF and extract
>> data to several handlers ?
> It will probably be most efficient to just do everything in one go. And only at
> the moment where you are writing out the finished feature into the shapefile,
> decide in which shapefile it should belong. You'll only have one handler, but
> 180*360 output shapefiles.
>
>> Is it even possible without filling the RAM ?
> Depends on how much RAM you have. You'll need 32GB RAM for the node location
> store. And you'll need same RAM to buffer the output, because you can't write
> to 180*360 files at the same time efficiently. Maybe fewer files would be
> better? (Also you'll have not only one shape file for each tile, but probably
> dozens for all the different layers of data, which makes this problem worse.)
>
> So if you don't have this kind of memory, you have a problem.
>
> You can also have a look at
> https://github.com/joto/osm-history-splitter
>
> which should be more efficient at splitting a planet into smaller files than
> Osmosis. But people have reported some issues with this software. It is on my
> TODO list to look at this and fix them, but that will take a while.
>
> Jochen
Ok I got it ! Unfortunately, I don't have enough RAM for this method.

I did not thought about it before but given the small amount of data I 
need, I wonder if using xapi to request data per degree isn't the most 
obvious way to get the data I need, unless xapi has the same kind of 
problem with the borders.

I'll also take a look at osm-history-splitter.

Thank you very much !

Sylvain