[OSM-talk-be] Importing csv, good practices and tips : Multipharma Case Study
Lennard
ldp at xs4all.nl
Thu May 19 09:54:53 UTC 2011
> 1. How to be sure not to override existing data ?
Compare with existing pharmacies before importing. For instance, you can
download all pharmacies through xapi and open those in an editor, together
with your to-be-imported data. Comparing every location will be tedious,
especially the first time, but does give very good results.
You could also run both the xapi result and your file through a script to
compare locations and notify you if there are two pharmacies within a
certain distance from each other. This is easier for large datasets, but
more prone to errors.
> 2. How to maintain such a list ?
> I mean, if I receive an updated CSV in some month, should I upload the
> whole set ? What about duplicates then ?
You definately don't upload the whole set, but only the changes. If you
get a new csv, the easiest would be to compare that to the csv that was
the source of the last import and note any changes. Then go over those
manually or with a script. That's easier than having to recheck every
pharmacy in the country.
If the original data has a Multipharma reference ID, you could add that as
a ref=* key (or a more specific variant). That alone would make updates
much easier to perform. But do keep in mind that people could remove or
alter the ref, so a basic nearness test with existing data is still a good
idea.
> 3. After work, the current fields in the CSV are
> 1. Latitude
> 2. Longitude
> 3. Country
> 4. Name
> 5. addr:postcode
> 6. addr:city
> 7. addr:street
> 8. addr:housenumber
> 9. phone
> 10. fax
> 11. amenity
Country is not essential and should probably be dropped.
If there are proper boundaries and place=* nodes, addr:city is not
strictly necessary either, but it makes sense to add it anyway.
> Should I add more fields ? Source ? addr:full ? something else ?
Source=*, definately. Also make a note of this import on the wiki 'Import
Catalogue', with contact details and a note of the license of the original
data release to OSM.
addr:full is probably not needed, because you already have the more
specific addr: keys.
> 4. What is the preferred file encoding when submitting ? Iso-8859-1
> or
> UTF8 ? Is suppose this is the later, but I want to be sure.
All key values are UTF8 encoded.
A final note: OSM has a test API. You're well advised to upload your data
there first, to test your conversion and upload process and to check the
results. It's at http://api06.dev.openstreetmap.org/
--
Lennard
More information about the Talk-be
mailing list