[Talk-ca] NRN GML file splitter

Frank Steggink steggink at steggink.org
Wed Sep 16 03:37:29 BST 2009


Hi,

Because the geobase2osm.py script always takes about 20 mins for the 
Quebec data, I've created a fairly simple Python (2.6) script to split 
the big province-wide GML file up into smaller portions, of about the 
size of one NTS tile. It can be found here: [1] This script creates many 
files with the file name of the original GML file, with the NTS tile 
name (as 999x99) added. It leaves a margin of about 1 km (0.01 degree 
lat, 0.015 degree lon), to correct for the small shift of the Canadian 
NAD83 datum, compared to WGS84.

This script has not been tested well! I've verified that it works with 
tile 021M01 only. (The generated OSM files have the same size.) Now 
geobase2osm only takes a few seconds, but that will be more for busy 
tiles. And since Quebec only has road data, without street names, etc., 
I don't know how it stands up in other provinces. I'll do the 
verification later, but if someone wants to play around with it, feel 
free to do so. It is also not that cleaned up. For example, the function 
"getTileName" should actually be "getTileNames", because it returns a 
list of names. It's also barely annotated, programming is not 
particularly defensive, and you'll have to do without any ATPs :)

The only input parameter is the name of the province-wide GML file. 
Execution time of the Quebec file (1.1GB) is between 4 and 5 mins on my 
machine, and it generates 851 files. The algorithm is like this:
* Read a feature (gml:featureMember tags), and store it as a set of 
strings (one for each line)
* Extract geometry coordinates (only for features with gml:Point or 
gml:LineString)
* Calculate bounding box
* Calculate NTS tiles overlapping the bounding box
* Write the feature to the output file
    * Create the output file first, if it doesn't exist, and write a 
"header"
* When done, close all output files

Parsing is done line by line, so it won't work for arbitrary GML, even 
if it covers the Canadian territory. Also the interpretation of features 
and geometries is geared towards the NRN files. I can't guarantee it 
will work on other data, but it might be worth a try. (I would need to 
look into it first.) Comments are welcome :)

Have fun,

Frank

[1] http://www.steggink.org/temp/nrn_splitter.py.txt





More information about the Talk-ca mailing list