[OSM-dev] Suggestion to replace created_by tags

Richard Fairhurst richard at systemeD.net
Sat Apr 28 13:37:19 BST 2007

I'm about to make a post about the data model. Please shoot me now.

At present, some editors tag everything they do with the created_by  
tag. So if you draw and upload a way using JOSM, the way will be  
tagged as created_by=JOSM, as will its constituent segments, as will  
its nodes.

There are a handful of issues with this.

1. Not all clients do it, and because it's a tag, it's not enforced.  
So you might have something tagged 'created_by=JOSM' which has  
subsequently been modified by another editor or a script. This  
defeats the point of the tag...

2. ...and means there's no effective versioning. You can't strip out  
edits by a dodgy client if the data is still tagged as  

3. It makes "untagged" data appear tagged. It would be really handy  
to be able to do SELECT * FROM current_nodes WHERE (latitude BETWEEN  
a AND b) AND (longitude BETWEEN c AND d) AND tags IS NOT NULL - in  
other words, find all the POIs within a bounding box in one easy  
query. created_by prevents this.

4. It's an extra burden on the database.

I'd suggest that we get rid of the created_by tag: and, instead,  
introduce a new 'client id' into the XML message body spoken by the API.

This would be a unique identifier for the client (JOSM, coastline  
script, Potlatch, whatever) making this particular edit. For  
efficiency, I'd suggest it could be numeric (with a dictionary on the  
wiki), and could potentially also include a version number. So 2.80  
might indicate the node was created/modified by version 80 of the  
coastline script. That way, if a bug appears in version 81 (and that  
alone) which generating corrupt data, it can easily be removed.

On the database, this would then be stored in a new column in the  
nodes, segments and ways tables. (There'd be no need to include it in  
current_nodes, current_segments and current_ways, which are the most  
frequently read.) Because a new row is created for each edit, we  
would then have full versioning.

This should be a pretty easy change to implement, perhaps as part of  
the 0.4 API, and could potentially save us data problems in the future.


More information about the dev mailing list