[osmosis-dev] Cutting PBF file into 1° tiles

Sylvain Melin s.melin at alsim.com
Wed Apr 20 13:47:21 UTC 2016


On 18/04/2016 15:01, Sylvain Melin wrote:
> On 18/04/2016 12:23, Jochen Topf wrote:
>> On Mo, Apr 18, 2016 at 11:52:24 +0200, Sylvain Melin wrote:
>>> On 18/04/2016 11:00, Jochen Topf wrote:
>>>> On Mo, Apr 18, 2016 at 10:10:06 +0200, Sylvain Melin wrote:
>>>>> My plan is to :
>>>>> - exploit a planet sized pbf file
>>>>> - cut it into 1° tiles using osmosis
>>>>> - filter and extract the data from these tiles as shapefiles using 
>>>>> libosmium
>>>> If you are writing your own program anyway to create those 
>>>> shapefiles, why
>>>> don't you do the splitting in this step *after* creating the 
>>>> geometries and
>>>> before writing them into shapefiles? That is probably much easier 
>>>> to do than
>>>> based on the PBF due to the structure of the OSM data files.
>>>>
>>>> Jochen
>>> Maybe I'm wrong but because I don't want to parse the full 
>>> planet.osm.pbf
>>> every time I want to extract a small set of data.
>>> The processing time seems to grow exponentially with the size of 
>>> source file
>> The time of what processing exactly? I don't see anything in what you 
>> are doing
>> that should scale worse then linearly. Of course if you don't have 
>> enough memory
>> you'll run into problems.
>>
>>> so having an intermediate level with 1° sized pbf containing everything
>>> seems very practical to me.
>> In theory yes, but, as you noticed, you'll have to handle all objects 
>> specially
>> that straddle tile boundaries.
>>
>>> Also, my osmium program loops over the target tile and parse the 
>>> appropriate
>>> pbf :
>>>
>>> /for each j in [-90,89]//
>>> //{//
>>> //        for each i in [-180,179]//
>>> //        {//
>>> //                create osmium::handler//
>>> //                parse i_j.pbf with osmium::io::Reader//
>>> //                extract data to single handler with osmium::apply//
>>> //        }//
>>> ////}/
>>>
>>> Do you think it would be more efficient to have a single big PBF and 
>>> extract
>>> data to several handlers ?
>> It will probably be most efficient to just do everything in one go. 
>> And only at
>> the moment where you are writing out the finished feature into the 
>> shapefile,
>> decide in which shapefile it should belong. You'll only have one 
>> handler, but
>> 180*360 output shapefiles.
>>
>>> Is it even possible without filling the RAM ?
>> Depends on how much RAM you have. You'll need 32GB RAM for the node 
>> location
>> store. And you'll need same RAM to buffer the output, because you 
>> can't write
>> to 180*360 files at the same time efficiently. Maybe fewer files 
>> would be
>> better? (Also you'll have not only one shape file for each tile, but 
>> probably
>> dozens for all the different layers of data, which makes this problem 
>> worse.)
>>
>> So if you don't have this kind of memory, you have a problem.
>>
>> You can also have a look at
>> https://github.com/joto/osm-history-splitter
>>
>> which should be more efficient at splitting a planet into smaller 
>> files than
>> Osmosis. But people have reported some issues with this software. It 
>> is on my
>> TODO list to look at this and fix them, but that will take a while.
>>
>> Jochen
> Ok I got it ! Unfortunately, I don't have enough RAM for this method.
>
> I did not thought about it before but given the small amount of data I 
> need, I wonder if using xapi to request data per degree isn't the most 
> obvious way to get the data I need, unless xapi has the same kind of 
> problem with the borders.
>
> I'll also take a look at osm-history-splitter.
>
> Thank you very much !
>
> Sylvain
>

I finally found a proper method to do this.

I wrote a bash script that uses overpass api to request and filter the 
data, and convert the resulting osm.xml file to shapefile with my osmium 
program.

Overpass api does not clip data on the edges of the bounding box.
Also, I only have the data I need on my hard drive and I'm sure it's up 
to date.

Thank you for your help.
I hope it will help people facing the same issue.

Regards,
Sylvain





More information about the osmosis-dev mailing list