[osmosis-dev] Single node ways

Toby Murray toby.murray at gmail.com
Thu Nov 15 08:05:27 GMT 2012


On Wed, Nov 14, 2012 at 9:24 AM, Shaun McDonald
<shaun at shaunmcdonald.me.uk> wrote:
>
> On 14 Nov 2012, at 11:48, Paweł Paprota <ppawel at fastmail.fm> wrote:
>
>> On 11/14/2012 12:33 PM, Brett Henderson wrote:
>> >
>>> It sounds good to me.  Osmosis typically tries to maintain data accuracy
>>> with no surprises, so I'm not particularly happy with the current
>>> situation of dropping ways even if they are invalid.
>>>
>>> Jochen Topf was the one who originally introduced the checks in
>>> WayGeometryBuilder to ensure a Way contained at least two nodes.  He
>>> might have some thoughts on whether we can remove the checks.  Perhaps
>>> it was simply introduced to avoid the additional overheads of having to
>>> do st_isvalid() checks?
>>>
>>
>> Based on my experience with processing geometry for OSM objects I'd strongly discourage having any invalid geometries in the database. This leads to very unpleasant surprises with ST_Union, ST_Intersection and other spatial functions. Upgrading the GEOS library (which PostGIS uses) helps a bit but still many operations can behave very strangely and after hours/days of debugging you find yourself hitting the "invalid geometry" wall.
>>
>> Whether Osmosis should be responsible for maintaining valid geometries is kind of a different question I think - depends on policy. But whatever you decide, it needs to be communicated front and center in documentation what geometry is created.
>
> The problem is that if you are using Osmosis to shunt the data into a database that you use to find and highlight these invalid geometries for the community to go and fix in the source data.
>
> I think that Osmosis could have a filter to drop invalid data, or even the inverse of only outputting the invalid data.

Yes, that is how I discovered this "feature" in the first place. I was
generating a list of single node ways from my pgsnapshot database for
someone who wanted to fix them. There didn't seem to be as many as I
thought there should be. When I went looking I noticed that the only
ones in my database are from after I started minutely replication.

Which brings me back to "invalid geometries already exist in the
database." Although Pawel's point about this causing weirdness with
some of the postgis functions is something to consider. While some of
them do already exist in the database, taking out these checks would
increase the number of them on a fresh planet import by quite a bit.
Like, an order of magnitude or two.

Is it possible to check the validity of a linestring in java? I see
the LineString class has a checkConsistency method however it is
returning false for all linestrings even if they are valid. I'm not
seeing another obvious method. If this were possible, I would suggest
adding an option to the write-pgsql(-dump) tasks to control this
behavior. Something like includeInvalidLinestrings=yes/no which would
allow the user to choose. This would also remove the
multi-node-at-same-location ways from the database.

Even if checking for validity is not possible, the option could still
be added. Maybe "avoidInvalidLinestrings"? If "no" then shove
everything in. Otherwise keep the current behavior and drop single
node ways to minimize invalid linestrings. This would also partially
address Pawel's concern about calling this issue to the user's
attention since the option along with a description would be listed in
the detailed usage.

And what about zero node ways? As I mentioned before, technically
these appear to be valid and are assigned a static "empty geometry"
value (not null). Right now they are being excluded along with single
node ways. Should they be included regardless of what happens with
single node ways?

Toby



More information about the osmosis-dev mailing list