[OSM-dev] getnodes query optimization

Sat Jun 3 00:08:40 BST 2006

On Sat, Jun 03, 2006 at 12:23:38AM +0200, Lars Aronsson wrote:
> Christopher Schmidt wrote:
> 
> > (The 'order by nodes.timestamp' probably makes a difference, but most
> > apps should hopefully not depend on the order of nodes.)
> Storing current versions together with history is a mistake in 
> itself.  History should be kept away in a separate ("archive") 
> table, to speed up everyday operations on current data.  But now 
> you were asking about the retrieval query, not the table 
> structure.

However, it's possible to create the equivilant of this with slightly
less database upheaval: simply add a flag to the dataset that states
when something is current. SteveC is under the impression that this will
slow things down unneccesarily: I feel that the 4x improvement it offers
in selects would be beneficial enough to make it a worthwhile change.

I'll have harder numbers later, but I think that there may be
significant benefits to be realized *without* gigantic changes in the
way things are done: Minor modifications can offer a large benefit
without horribly impacting the existing code/infrastructure. 

> Then again, if we were to reorganize the database, it probably 
> would make sense to switch from line segments to polylines and 
> PostGIS.  So far, the reason against is that PostGIS doesn't 
> natively handle editing history, because it was designed for 
> traditional GIS, not for a map wiki.  If you are going to propose 
> this change and be successful, you have to come from inside the 
> wiki world.  GIS experience is not enough.  People won't listen.

MySQL doesn't handle editing history any more than Postgres or any other
database does. You write it into your application, and you could do this
with Postgres just as easily as you could with MySQL. I'm not arguing
that OSM should neccesarily move towards a Geometic Database -- I think
that fighting that battle would be fighting something I've already lost
;) -- but Ijust making clear that using a database with geometric
extensions or built ins is not any more or less able to presever old
data (unless I'm severely misunderstanding something.)

> While I'm dissenting the current technical platform, I'm focusing 
> on gathering more line segments for Sweden.  Without the data, it 
> doesn't matter what data structures or algorithms we use.  With 
> the data, we can change the structure and algorithms later.

I can't help with gathering line segments: I have a full dataset which
would be of about 20x higher quality level than TIGER, and is released
under an Attribution license, but with the request not to load the
database more than is neccesary, I'm not going to load it. (The data,
for the record, is the MassGIS department of transportation roads layer
-- but like TIGER, it's an accurate, high-density data source which
doesn't help out the OSM project all that much, insofar as it's already
free.) Until I load this data in, gathering data is kind of silly: no
need to draw streets I'm likely to just re-import later.

-- 
Christopher Schmidt
Web Developer