[OSM-talk] The long tail

Thu Jul 6 14:32:44 BST 2006

On Thu, Jul 06, 2006 at 12:15:33PM +0100, SteveC wrote:
> The difference is that the history data, the bits that say 'node foo was
> here and then moved to there' arn't included. I've been wrong in the
> past, but isn't the entire point that the newer data is better than the
> old? To me it looks like instead of distributing all the source code to
> openstreetmap like we do, people are insisting I produce the database
> behind subversion will all the changes for the past two years.
> 
> Once you accept that privacy is important (which I'm sure some people
> wont) and that we don't have a policy... Then you end up with something
> that is pretty much like planet.osm but more work to produce. But I'll
> still do it, it'll be cool for the animations of progress if nothing
> else!

The reason for the need for this database is not social, like
planet.osm, but technical: the number of times someone in the OSM
project has had to invent data and attempt to create benchmarks and
statistics against data that is completely unlike the actual OSM
database is not useful to the project. Having the database available
allows for more precise statistical analysis, which leads to more being
able to help with speeding up the database.

I'm not understanding how it's more work to produce, though -- you've
got no monster SQL queries. I don't know the exact format of the data,
but it seems that something like:

mysqldump >> file
cat file | mysql (on slave)
mysql:
  * drop table gps_points
  * drop table gpx_file_tags
  * drop table gpx_files
  * drop table gpx_pending_files
  * drop table users
  * for i in ['areas','meta_nodes','meta_ways','meta_areas',
              'meta_segments', 'nodes', 'segments', 'way_segments', 
              'way_tags', 'ways']:
       update i set user_id=0;
       update i set timestamp=0;
       (or alter table drop column user_id, alter table drop column
         timestamp)
Then re-dump it. Dropping tables should be practically instantaneous,
you have no need to create XML, and you have no need for the current
monster SQL queries -- so if someone wants a more regular planet.osm,
they could build it themselves from this dataset. (Of course, you plan
to make it more current anyway.

> In any case, hopefully after Saturday we'll have daily planet.osm dumps
> (with that naming convention someone specced out) which I should have
> worked harder to produce in the past. Then, we can integrate the cool
> openlayers stuff crschmidt's done in to the front page and get away from
> the flaky tiles we're serving, if NickW doesn't mind too much that we'd
> be dropping his work.

I think that would be beneficial to the project, but I'm biased ;) 

> I think if we get it right, most people will be happy to allow their
> data to be used under the new terms. And besides I don't know the
> statistics but I think the key contributors data wise are actually
> pretty small in number and on this list.

I'm excited about this, as I think it opens up a number of possibilities
for the project and users of the project that did not previously exist.

-- 
Christopher Schmidt
Web Developer