[Talk-ca] NRN GML file splitter
Frank Steggink
steggink at steggink.org
Wed Sep 16 03:37:29 BST 2009
Hi,
Because the geobase2osm.py script always takes about 20 mins for the
Quebec data, I've created a fairly simple Python (2.6) script to split
the big province-wide GML file up into smaller portions, of about the
size of one NTS tile. It can be found here: [1] This script creates many
files with the file name of the original GML file, with the NTS tile
name (as 999x99) added. It leaves a margin of about 1 km (0.01 degree
lat, 0.015 degree lon), to correct for the small shift of the Canadian
NAD83 datum, compared to WGS84.
This script has not been tested well! I've verified that it works with
tile 021M01 only. (The generated OSM files have the same size.) Now
geobase2osm only takes a few seconds, but that will be more for busy
tiles. And since Quebec only has road data, without street names, etc.,
I don't know how it stands up in other provinces. I'll do the
verification later, but if someone wants to play around with it, feel
free to do so. It is also not that cleaned up. For example, the function
"getTileName" should actually be "getTileNames", because it returns a
list of names. It's also barely annotated, programming is not
particularly defensive, and you'll have to do without any ATPs :)
The only input parameter is the name of the province-wide GML file.
Execution time of the Quebec file (1.1GB) is between 4 and 5 mins on my
machine, and it generates 851 files. The algorithm is like this:
* Read a feature (gml:featureMember tags), and store it as a set of
strings (one for each line)
* Extract geometry coordinates (only for features with gml:Point or
gml:LineString)
* Calculate bounding box
* Calculate NTS tiles overlapping the bounding box
* Write the feature to the output file
* Create the output file first, if it doesn't exist, and write a
"header"
* When done, close all output files
Parsing is done line by line, so it won't work for arbitrary GML, even
if it covers the Canadian territory. Also the interpretation of features
and geometries is geared towards the NRN files. I can't guarantee it
will work on other data, but it might be worth a try. (I would need to
look into it first.) Comments are welcome :)
Have fun,
Frank
[1] http://www.steggink.org/temp/nrn_splitter.py.txt
More information about the Talk-ca
mailing list