[Tile-serving] [openstreetmap/osm2pgsql] Large values cause import failure when using int4 columns (#955)

Andy Allan notifications at github.com
Fri Sep 27 13:54:27 UTC 2019


I thought I'd give this a go, "how hard can it be", and so on. Turns out, not so easy, at least not easy for me.

My first attempt was to change the parsing pattern (and variables) to be `int32_t`, to match the size of integers that postgresql expects. In table.cpp:

```diff
     case COLUMN_TYPE_INT: {
         // For integers we take the first number, or the average if it's a-b
-        long from, to;
-        int items = sscanf(value.c_str(), "%ld-%ld", &from, &to);
+        int32_t from, to;
+        int items = sscanf(value.c_str(), "%d-%d", &from, &to);
         if (items == 1) {
             m_copy.add_column(from);
```
Unfortunately this shows a big problem, which is the behaviour of `sscanf` when parsing out-of-range numbers. In this case, a tag value of `"10000000000"` (ten billion) gets outputted into the database as `1410065408`! The c++ standard says, of sscanf,

> If this object does not have an appropriate type, _or if the result of the conversion cannot be represented in the object_, the behavior is undefined.

This is totally unhelpful! My compiler (g++) appears to output the value modulo UINT32_MAX. Which is totally useless, since we can't detect whether the output is valid or has been wrapped (in this case, wrapped around multiple times).

My second approach is to keep the parsing into long, and just restrict the output range:

```diff
         // For integers we take the first number, or the average if it's a-b
         long from, to;
         int items = sscanf(value.c_str(), "%ld-%ld", &from, &to);
-        if (items == 1) {
+        if (items == 1 && from <= std::numeric_limits<int32_t>::max() && from >= std::numeric_limits<int32_t>::min()) {
             m_copy.add_column(from);
```

This works surprisingly well. The out-of-range behaviour for `"%ld"` patterns is completely different from `"%d"` patterns! For a tag value like `"10000000000000000000"`, it outputs `9223372036854775807` i.e. it clamps to INT64_MAX, rather than doing any kind of modulo. That's much more useful.

However, it's still relying on undefined behaviour of sscanf, which makes me uncomfortable. A bit of reading around the internet suggests that there's no practical way of using sscanf without exposure to undefined behaviour while reading larger-than-expected numbers. 

https://stackoverflow.com/questions/28007600/detecting-integral-overflow-with-scanf

One suggestion is to limit the number of digits read, so for our example, we could limit the parsing to 18 digits. That's not a problem, since we're dropping anything larger than int32 anyway.

```diff
         // For integers we take the first number, or the average if it's a-b
-        long from, to;
-        int items = sscanf(value.c_str(), "%ld-%ld", &from, &to);
-        if (items == 1) {
+        int64_t from, to;
+        int items = sscanf(value.c_str(), "%18ld-%18ld", &from, &to);
+        if (items == 1 && from <= std::numeric_limits<int32_t>::max() && from >= std::numeric_limits<int32_t>::min()) {
             m_copy.add_column(from);
```
This has the same output as to my second option, but I think by limiting the pattern to `%18ld` we are avoiding the undefined behaviour. I also think we should change from `long` to `int64_t` to ensure this works and doesn't break somewhere where long is only 32 bits.

But maybe this is all barking up the wrong tree, and we should move to some kind of `strtol` based approach as mentioned in the above stackoverflow post. That would be beyond my capabilities!

If the `long` -> `int64_t` and `%ld` -> `%18ld` stuff is OK, then I can work up a pull request.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/issues/955#issuecomment-535949338
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20190927/c6351293/attachment-0001.html>


More information about the Tile-serving mailing list