[OSM-dev] GSoC Week 1-3

Paul Norman penorman at mac.com
Sat Jun 22 07:29:43 UTC 2013

This is my first report. Due to later scheduling conflicts I started coding
early, so this report covers 3 weeks.

Note: This was four calendar weeks, but one week was SOTM-US and due to visa
restrictions I could do no GSOC work there at all. Accordingly I'll just
divide up this 3 week period into halves.

First half:

I did initial benchmarking of the import speed with osmosis to pgsnapshot.
One of the interesting parts of working with OSM data is getting to work
with lots of data. One of the downsides of this is you can find out after
running a benchmark for 6 hours that 32GB is no longer enough memory for an
InMemory node location store. That's useful to know, but not exactly what I
was hoping to find out.

The results of this were the gathering of some data on speed and size of
various snapshot options which should be valuable in determining if PostGIS
linestrings or JOINs through way_nodes are more efficient. I've put the data
up on the wiki, but it's in a form that's a bit difficult to read right now
and requires some clean-up.

I also made some improvements to the pgsnapshot loading SQL in osmosis,
making it clearer for what should be the common use case.

Second half:

Cgimap's deployment situation is a bit unique. It's an important part of the
osm stack and relied on by everyone who edits OSM, but it's probably the
least deployed of any part of the stack. It really only comes into play with
high-load API servers, so I'm not sure exactly who is running it. Unlike
osm2pgsql or the rails port, it doesn't have well-developed install
instructions. It also requires loading data into an apidb, which is annoying
in itself.

It also has interesting wiki documentation, which I'll probably completely
rewrite in the course of GSoC. This half was a lot of false starts, trying
different things, and swearing at fcgi. I also did an install on a clean EC2
instance to check for undocumented dependencies, and found a few issues.

On the plus side, I now have cgimap running with lighttpd and a map? call is
successfully returning data from an extract I loaded up in an apidb. On the
minus side, 'make test' still fails even though the program is actually
working okay.

Upcoming week:

My first priority is to build with the other API calls (enable-experimental)
and get them working. This *should* be quick

The next is to take writable_pgsql_selection and start on the simpler calls.
To do this I'll have to work through the code flow to see how to start this
without needing to rewrite everything before getting any results.

For non-coding work, I need to convert the problems I've found into github
issues, pull requests or documentation fixes.

More information about the dev mailing list