[OSM-talk] Downloading Version 3 of all bus stops in a country

Safwat Halaby swiftfast at gmx.com
Wed Sep 27 12:35:47 UTC 2017

I think my last two replies never got through and were sent privately
instead. Here's a rephrasing. (which is possibly better anyways).

On Wed, 2017-09-27 at 10:53 +0200, Marc Gemis wrote:
> isn't it possible that the 2017 contains data from e.g. 2014, which
> has not been reviewed in the meantime ?
> Which would then mean that the 2015 edit is more recent.
> Or is there a review date in the "database" ?

There is no internal review date, and what you describe could happen.
This is the only drawback I can think of. The first run would be
imperfect in that regard, but the government publishes new GTFS dumps
on a daily basis, so after the first run this wouldn't be an issue at
all, and the "review date" would be "yesterday" or at worst "during the
last few weeks".

This is still better than any practically possible manual work
considering our mapper density and the amount of stops. No one has been
able to manually review or edit all those stops and it's been 5 years
and the stops are horribly stale.

Note that:

1. extra tags (shelter, wheelchair) will be preserved despite the
2. the new database is considered pretty good quality, so issues will
likely be negligible or nonexistent.


On Wed, 2017-09-27 at 11:07 +0200, Jo wrote:
> If there is a conflict regarding position or tags, they should be
> resolved
> by a human mapper. If I were to apply the newer is better approach,
> we
> would constantly be reverting back to the positions the operators
> think
> their stops are at.
> It's important to respect the mappers work, because without mappers
> quickly dies off.
> It might be complex to put such a solution in place, but it should be
> obvious that if data flows in 2 directions, too much simplification
> doesn't
> work.

The script will not "constantly revert".

1. Government publishes database with bad edit X: script adds it
2. user fixes it with fix Y
3. Goverment publishes database which still has bad edit X:

The script ignores 3, because "newer is better" and the user edit is
newer. No constant reverts are involved.

However, in the following scenario, the script would indeed introduce a
bad edit to OSM:

1. Government publishes database with bad edit X: script adds it
2. user fixes it with fix Y
3. Goverment publishes database with a NEW BAD EDIT Z.

So the script rests on the assumption that whenever a government
updates a bus stop, it's because the bus stop has actually changed or
because a newer better measurement was made, so Z is always expected to
be better than X (and usually better than Y because it's newer in time)
and this scenario shouldn't exist. Z is always assumed better.

If a provider is unreliable such that it makes random edits in its new
database that are LESS accurate than its older database, then this
script is not suitable for dealing with such a provider and the
complexity of your project is needed. This does not seem to be the case
in Israel. To quote anonymous_gushdan_mapper from the forums:

> Israel has something like 30,000 bus stops,
> and they change daily all across the country.
> There's no way human mappers could ever verify
> the accuracy of all of them, unless you have
> someone working full-time on
> this. However, the data is considered extremely
> accurate, and inaccuracies are quite rare.
> We do have a system that announces the name
> of the next stop, which uses this data.

> I think people from other countries don't realize that this
> is not a single, private operator data, nor it's a single
> city data - it's government generated data that controls
> the entire public transportation network in the country.
> If a bus stop is not in this dataset, it doesn't exist.
> There will never be anything more accurate for bus stops
> in Israel than this dataset.

> If accuracy is important to us, we *must* implement this
> importing script, otherwise the data on OSM will get
> stale quickly - just like the current data is stale
> and shows a lot of bus stops that have been
> since then moved or canceled.


So, I could probably naively always trust the GTFS and it'll still be
mostly fine. But I still want to give mappers the ability to fix
coordinates and such, without overriding their edits the next run
(unless they've been updated by the government to newer values during
that time). 

Lastly, I'd like to point out that no local mappers are complaining
about this and the feedback is so far positive.


More information about the talk mailing list