[Talk-us] Import of EPA data
David Fawcett
david.fawcett at gmail.com
Mon Dec 14 19:27:57 GMT 2009
My preference is that you roll the data import back. The spatial
accuracy is poor, and I know that there is more current data for at
least my state. I am new to OSM, but importing 100k points with known
accuracy issues does not make sense. Many of these points are
difficult to crowd-improve because acquiring the info needed to
improve the data is not as simple as overlaying it on an air photo and
moving the line.
I also don't think that man_made=envionmental_hazard is an appropriate
tag. I think if environmental data like this is imported into OSM,
that some standard tags should be developed to classify it. (maybe
this has already been done, I haven't found it yet)For this data,
perhaps using an EPA namespace. Right now, these points show up on
the default Mapnik render with no symbols and because they have long
names, they cover up significant map space. It would be useful to use
tags that would allow the renderer stylesheet to include or exclude
the environmental POIs.
If I were to want to use the default render for a project basemap in
the past, I would now want do to a custom render, so these features
that are prominent, but irrelevant to many mapping projects would not
appear.
I would like to have us come up with a more standard language on how
to tag and attribute these things. I think that since these types of
features change fairly frequently, there should be a way to update
these in an automated way.
In reality, aside from the POIs, maybe data like this isn't really a
good fit for OSM. It is difficult to manage a copy of rapidly
changing data that other people maintain. Maybe it is better to have
resources describing how people can mash up this kind of data with OSM
base data instead of importing it. An extreme example would be
traffic speed data. It would be crazy to submit this data to OSM and
keep it up to date (minute). This data changes more slowly than that,
but where is the line?
I hope that you haven't seen my comments as 'abuse'. I haven't
intended any of my critique and suggestions to be an attack on you or
what you are doing. My intention is to see that data like this is
imported using the most recent and accurate sources, that it is
tagged/classified in the best way possible, and that there is a scheme
to keep it up-to-date.
I think that for the same reason that data related to potential
environmental impacts is important, we need to be careful with the
accuracy. If a Superfund site shows up in your front yard and it
really belongs 10 miles away, it is a little different than having a
coffee shop POI placed there.
David.
On Sun, Dec 13, 2009 at 4:08 AM, jamesmikedupont at googlemail.com
<jamesmikedupont at googlemail.com> wrote:
> Dear Team,
> I am willing to put some work into this but I need a clear directive :
>
> Do you want to just revert my EPA import or should I put the work into
> fixing it?
>
> please give me a direction. I get abuse for importing "junk", but on
> the other side people have been happy to get this data. So there is
> some conflict here.
>
> It is techinally possible to fix the data, here my algorithm :
>
> 1. pull the 10 changesets off the server, using the export routine.
> Convert from osmchange to josm format.
> 2. Replace the CAPS name with a nicer name. Devise some rules to
> convert the WWTP to waste water treatment plant.
> 3. Check if the item has been decommissioned (simple list lookup), if
> so make it non visible
> and mark it .
> 4. Check the state/area if they dont want this data (nj has state level)
> 5. Check the map if there are any other overlapping nodes in a certain
> radius (needs to have the world file)
> /- Ideally OSM would have an export routine to include nearby nodes.
> 6. Check the map if there are any additional nodes with the same
> name... now this will be very hard. But there should be a way to find
> possible fuzzy matches in an area.
>
> Now, in fact this processing would be best done on a state or county
> level. First you would want to split up the data into chunks and
> distribute the processing. I don't know about the chunking mechanisms
> for the USA data. Of course not all areas even have EPA hazards.
>
> But we could take a set of shape files for the chunks, use them to
> split up the EPA data, create a weighted list of areas with the most
> nodes and then extract the world files for those areas.
>
> Now, there are other things to do :
> A. Be able to pull out the EPA record for each node and augment it
> with the given data. decide based on that data to create better
> symbols. This will create a huge load on the severs and could be
> considered a form of DDOSing. That is why I have not started to do so.
> Ideally the EPA will update the KML file and include the basic
> infomation about it, the type of the hazard and the date of activity.
>
> B. If the company is just listed as a regulated producer of waste, and
> the is no hazard, we should want to include the listing for the fact
> that it is a POI.
>
> C. Now the points that have been deleted by people or modified already
> should be fed back to the EPA as an update.
>
> D. If you look at the EPA webpage, they are using BING maps and
> contain many more local records of each doctors house with a xray
> device. I have not found where to get this data from, but it can be
> used to complete the map with more POIS. there are about 10 x more
> points in that dataset.
>
> Well these are my ideas for processing the data, let me know if anyone
> supports further work in this area.
>
> thanks,
>
> mike
>
>
> On Sun, Dec 13, 2009 at 10:49 AM, Minh Nguyen <mxn at zoomtown.com> wrote:
>> Ngày 12/12/09 7:03 AM,
>> jamesmikedupont at googlemail.com viết:
>>> The ref is for the node itself.
>>> If you follow them, you will find a ton of information about the item
>>> from the EPA.
>>> It has been suggested to change this to website.
>>
>> Wouldn't "url" be a better tag for it? For your example, the "ref" would
>> actually be more like "110010106081".
>>
>> --
>> Minh Nguyen <mxn at zoomtown.com>
>> [[en:User:Mxn]] [[vi:User:Mxn]] [[m:User:Mxn]]
>> AIM: trycom2000; Jabber: mxn at 1ec5.org; Blog: http://notes.1ec5.org/
>>
>>
>> _______________________________________________
>> Talk-us mailing list
>> Talk-us at openstreetmap.org
>> http://lists.openstreetmap.org/listinfo/talk-us
>>
>
> _______________________________________________
> Talk-us mailing list
> Talk-us at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-us
>
More information about the Talk-us
mailing list