[Tile-serving] [openstreetmap/osm2pgsql] Gazetteer: move main tag filtering into processing function (#1018)

Jochen Topf notifications at github.com
Fri Dec 6 10:14:00 UTC 2019


joto requested changes on this pull request.

This is a more superficial review as I don't understand the intricacies of what you are doing here.

>  
-    for (auto const &item : tags) {
-        char const *k = item.key();
-        if (prefix.compare(0, plen, k) == 0 &&
-            (k[plen] == '\0' || k[plen] == ':')) {
-            ret.push_back(&item);
+    char const *operator ()(osmium::Tag const &t) const noexcept
+    {
+        if (strncmp(t.key(), m_domain, m_len) == 0

We should get into the habit of properly using `std::strncmp` etc. Strictly speaking this gives you the C++ standard version of that function whil `strncmp` is the old C function. This might not matter for `strncmp`, but it does matter in some cases, for instance `std::abs()` vs. `abs()`, because `std::abs()` has overloads for different types, but `abs()` hasn't.

Same thing above with `strlen`, other cases of `strncmp` etc.

> -                if (!postcode) {
-                    postcode = v;
-                }
-            } else if (strcmp(addr_key, "country") == 0) {
-                if (!country && strlen(v) == 2) {
-                    country = v;
-                }
-            } else {
-                bool first = std::none_of(
-                    m_address.begin(), m_address.end(), [&](ptag_t const &t) {
-                        return strcmp(t.first, addr_key) == 0;
-                    });
-                if (first) {
-                    m_address.emplace_back(addr_key, v);
-                }
+            bool first = std::none_of(

`bool const`?

>          }
+
+        return nullptr;

The `DomainMatcher` class is sufficiently complex that it should have some tests.

>  {
-    std::vector<osmium::Tag const *> domain_name;
-    if (std::get<2>(tag) & SF_MAIN_NAMED_KEY) {
-        domain_name = domain_names(std::get<0>(tag), o.tags());
-        if (domain_name.empty()) {
-            return false;
-        }
-    }
+    // first throw away unnamed mains
+    auto mend = std::remove_if(m_main.begin(), m_main.end(), [&](pmaintag_t const &t) {
+            auto flags = std::get<2>(t);

`auto const`? More like these below, I will not mention it again.

>              return false;
-        }
+            });
+
+    // any non-fallback mains left?
+    bool has_primary =
+        std::any_of(m_main.begin(), mend, [](pmaintag_t const &t) {
+            return !(std::get<2>(t) & SF_MAIN_FALLBACK);
+            });
+
+    if (has_primary) {
+        // remove all fallbacks
+        mend = std::remove_if(m_main.begin(), mend, [&](pmaintag_t const &t) {

Should probably not reuse `mend` here.

> +{
+    for (auto const &tag : m_main) {
+        buffer.new_line();
+        // osm_id
+        buffer.add_column(o.id());
+        // osm_type
+        char const osm_type[2] = {
+            (char)toupper(osmium::item_type_to_char(o.type())), '\0'};
+        buffer.add_column(osm_type);
+        // class
+        buffer.add_column(std::get<0>(tag));
+        // type
+        buffer.add_column(std::get<1>(tag));
+        // names
+        if (std::get<2>(tag) & SF_MAIN_NAMED_KEY) {
+            DomainMatcher m(std::get<0>(tag));

`m{...}`

>  
-        if (first) {
+            if (first) {
+                buffer.add_null_column();
+            } else {
+                buffer.finish_hash();

Rather complex code to get the `new_hash/finish_hash` pair right. Can we either
1. just use an empty hash instead of a null (which would greatly simplify this code) or
2. wrap this somehow in a class?


> -    // admin_level
-    buffer.add_column(m_admin_level);
-    // address
-    if (m_address.empty()) {
-        buffer.add_null_column();
-    } else {
-        buffer.new_hash();
-        for (auto const &a : m_address) {
-            if (strcmp(a.first, "tiger:county") == 0) {
-                std::string term;
-                auto *end = strchr(a.second, ',');
-                if (end) {
-                    auto len = (std::string::size_type)(end - a.second);
-                    term = std::string(a.second, len);
+            buffer.new_hash();
+            for (auto const &a : m_address) {

One-letter variable names are so 80s :-)

> -    buffer.add_column(m_admin_level);
-    // address
-    if (m_address.empty()) {
-        buffer.add_null_column();
-    } else {
-        buffer.new_hash();
-        for (auto const &a : m_address) {
-            if (strcmp(a.first, "tiger:county") == 0) {
-                std::string term;
-                auto *end = strchr(a.second, ',');
-                if (end) {
-                    auto len = (std::string::size_type)(end - a.second);
-                    term = std::string(a.second, len);
+            buffer.new_hash();
+            for (auto const &a : m_address) {
+                if (strcmp(a.first, "tiger:county") == 0) {

This code is complex enough for an extra function with tests.

> @@ -53,6 +53,22 @@ class db_deleter_place_t
     std::vector<item_t> m_deletables;
 };
 
+class gazetteer_copy_mgr_t : public db_copy_mgr_t<db_deleter_place_t>
+{
+public:
+    gazetteer_copy_mgr_t(std::shared_ptr<db_copy_thread_t> const &processor)
+    : db_copy_mgr_t<db_deleter_place_t>(processor),
+      m_table(std::make_shared<db_target_descr_t>("place", "place_id"))
+    {}
+
+    using db_copy_mgr_t<db_deleter_place_t>::new_line;
+
+    void new_line() { new_line(m_table); }

Maybe this is a bit too "clever" with the `using` and the overload?

> +    CHECK(1 == node_count(conn, 201, "building"));
+    CHECK(0 == node_count(conn, 202, "building"));
+}
+
+TEST_CASE("Main tag deleted")
+{
+    import("n1 Tamenity=restaurant x12.3 y3\n"
+           "n2 Thighway=bus_stop,railway=stop,name=X x56.4 y-4\n");
+
+    auto conn = db.connect();
+
+    CHECK(1 == node_count(conn, 1, "amenity"));
+    CHECK(1 == node_count(conn, 2, "highway"));
+    CHECK(1 == node_count(conn, 2, "railway"));
+
+    update("n1 Tatiy=restaurant x12.3 y3\n"

`atiy` ? Maybe something like `not_a=restaurant` or so to make clearer what's happening here?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/osm2pgsql/pull/1018#pullrequestreview-328079461
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/tile-serving/attachments/20191206/a2f9676b/attachment-0001.html>


More information about the Tile-serving mailing list