[OSM-dev] Timestamp in PBF files

Jochen Topf jochen at remote.org
Thu Nov 22 16:19:11 GMT 2012


On Wed, Nov 21, 2012 at 07:00:32PM +0100, Frederik Ramm wrote:
> On 11/21/12 18:46, Jochen Topf wrote:
> >On Tue, Nov 20, 2012 at 09:17:59PM -0600, Scott Crosby wrote:
> >>How many nodes in the planet lack a latitude or longitude? Using a MAXINT
> >>encoding will cost about 8 bytes for each missing latitude or longitude.
> >> It's possible to reduce this to 2-3 bytes, but the format gets
> >>uglier/hackier. IMHO, probably not worth that cost.
> >
> >I just counted those cases. In the history dump from October 2012 there are
> >2344 nodes without coordinates. Hardly worth thinking about...
> 
> That sounds implausibly low.
> 
> Given that
> 
> 1. every deleted node should be in that file without coordinates
> 2. we're currently at node id 2.03 billion,
> 3. there are 1.66 billion visible nodes in the database
> 
> we should have something like 370 million deleted nodes.
> 
> Hm, we probably have to remove from that number those nodes that
> were deleted in ancient times where we've meanwhile dropped the
> history, and maybe some from the first TIGER import where we
> manually removed them from the database, but still - at least every
> node deleted in the past couple of years *should* show up with
> visible=false in the full history dump, and any node with
> visible=false *should* not have coordinates.
> 
> Either there's an error in my thinking, or in your count, or in the
> script that does the history export ;)

I checked this in some more detail. The cases I found were cases from years ago
(last is from May 2008). Apparently the OSM server did not check coordinates
for validity back then. So all these nodes were in the database and lat and/or
lon happened to have the MAXINT value I use to signify undefined coordinates.
Of course they should never have had those values, but they did. So these cases
are not the redacted node coordinates.

I don't know why there are no redacted nodes, Matt mentioned something that he
hasn't implemented that yet. But that would mean we have non-ODbL-clean data in
the full history dump. Frankly this gets all a bit too confusing for me. I
hope the people who have implemented these things will at some point document
them and/or fix those cases.

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/  +49-721-388298



More information about the dev mailing list