[OSM-talk] First drop in planet size ?

Sun Mar 14 12:39:55 GMT 2010

pray tell how do you process the compressed bzip data?
is there documentation on this?
thanks,

On Sat, Mar 13, 2010 at 9:51 PM, Nic Roets <nroets at gmail.com> wrote:

> No. It runs on the uncompressed planet, like this :
> bzcat /osm/planet-10*.osm.bz2 |   /osm/gosmore/bboxSplit \
>   -85.05113   73.12500    9.44906  180.00000 gzip 0720048510241024.osm.gz \
>   -25.48295  120.58594   72.91964  180.00000 gzip 0855020310240587.osm.gz \
>   -85.05113   98.43750   13.23995  172.61719 gzip 0792047410031024.osm.gz \
> ...
>
> I'm not too worried about further optimizations: Unlike wikipedia,
> there isn't the same urgency to have up-to-date. Except for disaster
> relief.
>
>
> On Sat, Mar 13, 2010 at 10:42 PM, jamesmikedupont at googlemail.com
> <jamesmikedupont at googlemail.com> wrote:
> > you are bunziping the code ? you are scanning the bzip blocks?
> > it is faster than the bunzip. But maybe you mean that it is very fast.
> >
> > I have experimented with bziprecover to extract blocks on their own,
> > i made a perl script to extract blocks from a wikipedia file that can be
> > used to run the processing  of the huge file by many people in parallel.
> >
> >
> https://code.launchpad.net/~jamesmikedupont/+junk/openstreetmap-wikipedia<https://code.launchpad.net/%7Ejamesmikedupont/+junk/openstreetmap-wikipedia>
> >
> > It is a tool to extract lat/long coords from the wikipedia articles.
> >
> > Such a processing of the large files would allow us to team up and all
> help.
> > We really need to just have an index file of all the blocks so that we
> can
> > find the ones that we need. Imagine being able to process the bzip file
> > directly!
> >
> > mike
> >
> > On Sat, Mar 13, 2010 at 9:31 PM, Nic Roets <nroets at gmail.com> wrote:
> >>
> >> Hello James,
> >>
> >> I wanted to split the planet into overlapping bboxes like this (click
> >> to see actual size):
> >> http://dev.openstreetmap.de/gosmore/
> >>
> >> On talk I described how I was dissatisfied with osmosis's memory
> >> consumption. So I came up with this observation: Most entities will
> >> end up in one or two extracts. And when it's two, it's in a pattern
> >> that is often repeated, say Africa bbox and Middle East bbox. Never
> >> Africa and Canada. So of the 2^168 possible combinations only around
> >> 3000 is actually used.
> >>
> >> So bboxSplit allocates 16 bits for each entity. Those are then indexes
> >> into the array of 'youniouns'. If a new node comes along, I check it
> >> against list of bboxes and it typically matches 1 or 2. So to find out
> >> quickly if I already have that combination of bboxes, I also have an
> >> STL map on the array of younions. A hashtable would have been faster.
> >>
> >> Ways and relations also trigger the code that merge younions.
> >>
> >> bboxSplit is faster than the corresponding bunzip and any program that
> >> uses libxml, i.e. very fast.
> >>
> >> Regards,
> >> Nic
> >>
> >> On Sat, Mar 13, 2010 at 10:03 PM, jamesmikedupont at googlemail.com
> >> <jamesmikedupont at googlemail.com> wrote:
> >> > That is very deep c++ code!
> >> > care to comment on how it works?
> >> > would be very interested to understand its performance ! looks very
> >> > fast.
> >> > mike
> >> >
> >> > On Sat, Mar 13, 2010 at 7:06 PM, Nic Roets <nroets at gmail.com> wrote:
> >> >>
> >> >> My understanding is that all Xml compliant* parsers will abort at the
> >> >> file offsets that Frederik mentions.
> >> >> My advice is to use the egrep filter when in doubt, because you will
> >> >> loose no more than a dozen lines in a planet file of billions of
> >> >> lines.
> >> >>
> >> >> *: (My split program is not compliant and will happily ignore these
> >> >> errors:
> >> >>
> >> >>
> >> >>
> http://trac.openstreetmap.org/browser/applications/rendering/gosmore/bboxSplit.cpp
> )
> >> >>
> >> >> On Sat, Mar 13, 2010 at 7:44 PM, John Mitchell <
> mitchelljj98 at gmail.com>
> >> >> wrote:
> >> >> > Will this also be a problem if you try to import via osm2pgsql into
> >> >> > postgres?
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > John
> >> >> >
> >> >> > On 3/13/10, hbogner <hbogner at gmail.com> wrote:
> >> >> >> Thx for help, I'll try it.
> >> >> >>
> >> >> >> Now I have to follow 'dev' too :D
> >> >> >>
> >> >> >> Nic Roets wrote:
> >> >> >>> There's a bug in the code that generated this week's planet. You
> >> >> >>> should either wait until next week or filter the planet with the
> >> >> >>> following command:
> >> >> >>> bzcat /osm/planet-10*.osm.bz2 |egrep -v '&#[0-9]*;'|...
> >> >> >>>
> >> >> >>> There has been a long discussion on 'dev', mentioning other
> >> >> >>> remedies.
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> talk mailing list
> >> >> >> talk at openstreetmap.org
> >> >> >> http://lists.openstreetmap.org/listinfo/talk
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > John J. Mitchell
> >> >> >
> >> >> > _______________________________________________
> >> >> > talk mailing list
> >> >> > talk at openstreetmap.org
> >> >> > http://lists.openstreetmap.org/listinfo/talk
> >> >> >
> >> >>
> >> >> _______________________________________________
> >> >> talk mailing list
> >> >> talk at openstreetmap.org
> >> >> http://lists.openstreetmap.org/listinfo/talk
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20100314/f2f70f67/attachment.html>