[Talk-us] TIGER expansion bot

Serge Wroclawski emacsen at gmail.com
Tue Nov 27 02:29:26 GMT 2012

Hello all,

After the OSM US Google Hangout two weeks ago, there was talk of
bringing back the effort I started six months ago to create a TIGER
expansion bot to run against the roads in the US.

I've brushed off the code and made several improvements to it (more on
this later in the mail).

In order to facilitate community involvement, I've talked with the OSM
US board and we're going to have a process by which the code is
officially vetted.

That process begins with this email. I'm making the most recent
version of the code available at:

(there was a URL for the previous version, but this is where the
current, up to date code will live).

I encourage people to review the code.

In addition, on Thursday, November 29th at 8pm EST on Google Plus,
we'll have another public hangout where I'll do a code walkthrough.
This will be an opportunity for people to bring up questions or
concerns they have about specific code issues.

>From there, baring any major issues, I'll send a followup email to
this email where I'll make a final request for comment. This will be
for specific code issues, and people are encouraged to send in any
specific code related issues, and we'll have that review period open
for one week.

After that, the code will be executed, and that execution period will
probably be several days, as I'll be manually supervising the
execution myself.

In anticipation of the code walkthrough on Thursday, I'll give a high
level overview of the code, as well as the changes from the version
six months ago.

The code is written in Python, and it uses a simple XML parser to
parse OSM XML. I have a simple framework for handling this in the
pyxbot.py file, which handles the parsing and selection tprocess.

The tiger.py file contains TIGER specific expansion code, and the
selection process is quite simple. The selector looks for ways which
have a "highway" key and a "name" key present in the tag.

The selected tags then go through a transformation, which looks for
name, name_1, name_2, etc and looks for corresponding tiger tags
(tiger:name_base), etc. It then pieces apart the name from the
existing name and reconstructs it using the expanded tiger tags. If
the new name is different, then it is stored.

If the name is already properly expanded, then the way is ignored but
if there's a problem with the tag expansion, then that way information
is stored elsewhere for review.

The review file (a CSV file) contains information about all the ways
that didn't process properly, such as the way ID, the (primary) name,
and the reason for the failure.

This file can then later be review later, or fed into a MapRoulette

Now, for those folks who looked at the code six months ago, these are
the major changes:

1. I've expanded the expansion table quite a bit, through extensive testing.

2. I've added the review file functionality

3. I've added name_1, etc. functionality.

4. The code is more modular than it was

5. The code is easier to run from the command line

So, the code is out there. If you have technical questions, I'll go
into more depth Thursday.

- Serge

[1] I'm hacking on the MapRoulette code to make it easier to add new
challenges, such as this.

More information about the Talk-us mailing list