[OSM-talk] JOSM shp file plugin

Gregory Arenius gregory at arenius.com
Thu Oct 21 07:00:55 BST 2010


Hi,

 There isn't a day gone past where the vast gaps in the OSM dataset, the
>> missing address nodes, missing turn restrictions, missing building
>> outlines, missing subdivisions, missing everything and whatnot don't
>> hugely degrade the usefulness of the project.
>>
>
> OSM is not a data dumping ground, OSM is a community project. Importing all
> these things without a community to support them is worth less than nothing,
> it hurts the project rather than helping it.
>
> If you have a shape file with building outlines, configure your Mapnik
> instance to render the buildings from that.
>

So anybody who wants to see building outlines should spend thousands of
hours tracing them by hand, a mind numbingly boring, tedious task or just
run their own private Mapnik instance and render with that?  What kind of
statement is that?  Why don't you go configure your Mapnik not to use any
data from an import and use that instead?  Its not a fair statement to make.

 There is a huge amount of data out there that is under an acceptable
> license to import into OSM that would be a great asset to the project.
>

No, no, and no again. OSM is not a pool to collect the free geodata of the
> world. Because you are right - there is an *awful* lot of geodata available
> and we do _not_ want to burden our infrastructure with dead stuff that
> nobody cares about.


You really don't think that there is data out there that we could import
that would be an asset to the project? None? At all?

Sure, there is data out there that we don't need and don't want in OSM
because its not as good as what we've got or its not the type of data that
the project is about and we don't need to burden our infrastructure with
that.  I'm not saying that we should be a "dumping ground" of free geodata
and that everything out there should go in.  I'm say that there is a lot of
great stuff out there and we should figure out how to bring that in.

 You can say "just go collect it manually" but if we know the data is
> already there we're not going to put in years of work duplicating it
> just to appease this anti-import mindset that some on this list have.
>

Let's say it is a pro-community mindset. Prove that there's the manpower and
> the interest to maintain the imported data and you might have a point.
>

I've put in a lot of transit data, such as bus stops, by hand.  How do you
prove that there is the manpower and interest to keep this updated?  You
can't.  In fact, the city updates their GTFS feed more often and more
accurately than I can hope to keep up with all the changes they make by
doing everything on foot.  It is something that people use and would like to
see in OSM so it certainly isn't "dead stuff".  What we need is a good
toolchain to do imports and be able to import changes from upstream sources
like tiger and GTFS feeds where appropriate.

The US has lots of free data.  You seem to think that importing this data
hurts the US because people who just "look" at the map don't see open spaces
to fill in and therefore don't contribute and create community.  That if
only we didn't do imports the community would form to gather the data by
hand and everything would be good.  I don't think this is the case.  A
community didn't form in the US pre-tiger import when the map was a blank
slate here.  We didn't because we knew that data was there and that
importing that data would make a lot more sense than trying to duplicate it.

Take for instance the San Francisco address data that I've been working on
cleaning up so that it can be imported.  Having address data in OSM makes it
a much more useful dataset, especially for routing.  As far as addresses go
in San Francisco a few shops and restaurants currently have them entered in
OSM.  There also a couple dozen blocks that have address range ways
alongside them.  Other than that there is no address data at all in OSM for
San Francisco.  We can import this dataset which is really pretty good to
start with and will be even better once I've cleaned it up a bit more.  It
will probably be about 200k nodes.  At a rough estimate, given how many
miles of streets would need to be walked and how much data would have to be
input I'd say it would take somewhere between 3-6 thousand man hours to
duplicate.  Why should we not do it?  Just because we can't prove that we'll
be able to maintain it?  Its not like the addresses jump around frequently.

I know that in Europe, especially Germany, the whole army of mappers with
boots on the ground thing is working really well and thats great.  Over here
in the US we don't have that.  It would be nice if we did but we don't.
What we do have is a lot of PD government data, much of which is constantly
being maintained and updated by the government.  Lots of us would like to
work with what we have and make good use of those government datasets, some
of which are really good.

I guess I'm just frustrated that anytime someone even thinks the word
'import' that they suffer an onslaught of condescending 'imports are bad'
and 'community, community, community' diatribes.  This thread is a great
example.  Someone wondered about making a tool that could help make imports
easier to perform.  Nobody talked about technical details of it.  Whats the
best way to do it?  Whats techniques can we use to help prevent duplicates
if they're is an overlap between the datasets?  Are there already existing
conflation tools we could integrate as well?  Are there ways we could flag
large uploads automatically so that we can check and make sure that data is
coming from a legal source?  Could we set up a test server so that people
can work through the process without using the live servers and without
having to go through the high barrier to entry step of setting up their own
test servers?  None of this gets discussed.  It really feels like every time
a discussion about how to make an import tool is discussed it gets hijacked
by this whole anti-import debate.  I don't think that is a proper way to go
about things.

Cheers,
Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20101020/76abd50e/attachment.html>


More information about the talk mailing list