<HTML><BODY>Hello!<br><br>First, to follow up on the issue of uploading order of huge changes spanning multiple<br>changesets. I submitted a JOSM enhancement request asking to look into it. Other<br>tools are capable to reorder objects in changesets to make them more readable to<br>humans. And it should be doable to implement similar behavior in JOSM: [1].<br>Apparently JOSM developers consider current uploading strategy good enough.<br>So we will continue using the tools we currently have with logic they offer now.<br>We can also try to "upload more often", see below.<br><br>Now I want to present you a reworked scheme of data processing with regard to<br>this import. I give links to two small OSM data samples for your inspection<br>and feedback below.<br><br>=== Overview of the process<br><br>1. Split import raster file in even smaller pieces of about 5×5 km size. Processing<br>smaller pieces (previously a unit of work was chosen to be a county which can be<br>very huge) among others allow for faster loading into JOSM, smaller individual<br>changeset sizes, lower risk to deal with huge multipolygons. They also create<br>some new challenges but they can be discussed later.<br><br>2. Create a raster mask layer to import raster data to mask out areas where<br> data conflicts are sure to happen or very likely to happen. Note that this<br> stage is done only once for the whole country: the resulting raster mask<br> layer is later split into smaller tiles to match import data tiles.<br> How it is done:<br> 2.1 Download current vector OSM data from Geofabrik in form on SHP files<br> 2.2 Open it in QGIS<br> 2.2 Choose area layers that have features we want to have in the mask layer:<br> buildings, landuse (and possibly pois).<br> To have shared boundaries between existing water and new landuse ways,<br> water shapes are not included into the mask, as the import does not<br> bring new water ways, and all intersections between new landcover and<br> existing water should be addressed at conflation (see below).<br> 2.3 Query for all tag variations and choose those that we do not want to<br> mask land cover, namely "military" and "natural_reserve", possibly others.<br> Delete such features from mask layers.<br> 2.4 Merge these layers into a single layer and export it as GeoTIFF raster<br> having same projection, resolution and boundaries as the input data.<br> Now both raster files can be compared pixel by pixel.<br> 2.5 An overview of how the mask layer looks like: https://atakua.org/p/nmd/sweden-mask-layer-overview.png.<br> White areas are what we are after (if they are not water, of course).<br><br>3. Apply the mask layer to individual tiles of import data.<br> This is done by `gdal_polygonize.py` from GDAL library.<br> All pixels that have non-empty value in the mask layer are considered to be<br> null in the input layer, which will prevent the vectorizing process to draw<br> vectors through such areas.<br><br>4. Tag and filter resulting vector data with scripts. Details are given in<br> the import plan. Compared with earlier iterations of the process,<br> smaller pieces of vector data are vectorized separately of each other.<br> Compared with earlier process, areas with already present OSM data were<br> already marked as missing, resulting in modification in the conflation<br> process.<br><br>Data processing pipeline up to this point is depicted on the following diagram.<br><br>+----------------------+ +-----------------------------+<br>| | | |<br>| NMD-raster image | | Geofabrik export data in SHP|<br>| (new data to import) | | (data already in OSM) |<br>+------------------+---+ +--------+--------------------+<br> | |<br> | |<br> | | gdal_rasterize<br> | |<br> | +--------v-------------------+<br> | | |<br> | | OSM raster tile in TIFF |<br> | | |<br> | +--------+-------------------+<br> | |<br> | | negate_raster.py<br> | |<br> | +--------v--------------------+<br> | | |<br> | | Mask tile (empty/not empty) |<br> | | |<br> | +-+---------------------------+<br> | |<br> v v<br> gdal_vectorize -mask<br> +<br> |<br> |<br> +---------v------------+<br> | |<br> | Vector data in GML |<br> | |<br> +---------+------------+<br> |<br> | nmd-gml-to-osm.py<br> |<br> +---------v------------+ +-------------------------------+<br> | | | |<br> | Vector data in OSM | | JOSM loaded actual data layer |<br> | | | |<br> +---------+------------+ +------------------+------------+<br> | |<br> +-----> Open in JOSM, merge layers, <----+<br> fix warnings and problems<br> +<br> v<br> +-------+----+<br> | |<br> | Changeset |<br> | |<br> +------------+<br><br><br>It is possible in principle to manually refine the resulting changeset up to the<br>point when it is good enough to be uploaded. However, we have to create a new<br>JOSM plugin/tool that can snap nodes of new ways to existing ways nodes. This is<br>needed to aid with merging closely placed new and old ways on the boundaries<br>of the mask layer, as it is the real bulk of remaining job. A need for such<br>plugin is showcased below.<br><br>=== Example data<br><br>To showcase this approach two tiles of import data are provided that underwent the<br>flow just described. Both tiles are about 9×12 km and are taken from two locations<br>in Katrineholm county. Links to the files: [2], [3].<br><br>The files have prefixes in their names that encode tile position, and suffixes<br>describing their role.<br><br>* 0233-Katrineholms_XXX_XXX.tif - source raster data to be imported<br>* 0233-mask_XXX_XXX.tif - raster mask generated from the latest Geofabrik export<br>* 0233-Katrineholms_XXX_XXX-osm-export.osm - layer downloaded from the OSM via JOSM.<br> You can download it yourself in your editor by usual means.<br>* 0233-Katrineholms_XXX_XXX-masked.osm - vectorized import data to be meant for<br> uploading.<br>* 0233-Katrineholms_XXX_XXX-merged-result.osm - combination of previous two<br> layers, the main file for your inspection and comments.<br>* 0233-Katrineholms_XXX_XXX-nomask.osm - a reference file showing how import<br> data would look like if no masking was applied to it. You can ignore it.<br><br>Note how the `-masked.osm` and `-osm-export.osm` have mutually exclusive coverage<br>of the same tile. Where there is data in one layer, there is a hole in another,<br>and vice versa. Notice also that for the tile 001_001,<br>`-masked.osm` and `-nomask.osm` are almost identical to each other because<br>there is no previously added conflicting data (the mask TIF is tiny),<br>while for 009_001 the `-masked.osm` layer is very small,<br>precisely because that tile is almost completely mapped by hand.<br><br>I will now talk about the `-merged-result.osm` files.<br>Please note that *no manual editing was done* to them. This is to showcase<br>the "vanilla" state of data right after scripts finished. It shows typical<br>remaining data quality issues that require manual/tool-assisted resolution by an<br>uploader working on a tile.<br><br>=== Known issues and call to action<br><br>One would expect certain classes of data quality problems in the import data<br>at this stage, which would need attention of an uploader.<br>Below I list, in no particular order, classes of problems known to me and<br>outline expected ways to address them.<br><br>1. Overlapping adjacent land use areas, such as forest slipping into water.<br> They are fixed by the uploader using the upcoming "Snap to" tool.<br> Situations unsolved by such a tool are due to too much of a difference between a<br> previously mapped lake and a newly added forest. They are often an indication<br> that either previously mapped water boundary is rather roughly outlined, or<br> that there is an error in the import data.<br> It is the goal of this import to have less than 1 "slipping" node per 1000<br> of correctly snapped nodes on shared shore/forest lines. This should be<br> considered an acceptable signal to noise ratio for a rather minor error which,<br> when rare, is trivial to interpret and fix when visually discovered by an<br> uploader or future mappers.<br><br>2. Self-intersecting new ways, inner ways peeking out of outer ways, overlapping<br> new ways. These are caused in part by differences in coordinates<br> precision used by external tools (11 digits after the dot) and OSM (7 digits).<br> Such issues are automatically detected by JOSM validator and will be<br> addressed by individual uploaders. None of them should be present in a final<br> changeset.<br><br>3. New islands in lakes not being part of that lake relation. Should be a rare<br> case for very small islets not previously mapped.<br> Strictly speaking it does not adhere to lake mapping recommendations<br> ("make a multipolygon with all islands as inner ways"). I do not consider<br> it to be a huge problem in practice, as it is rare in practice (not many islets<br> are left unmapped). Having an islet mapped as a patch of forest inside a lake<br> has non-ambiguous meaning for a future mapper.<br><br>4. New ways along roads, railroads and similarly shaped linear man-made objects<br> may look "jagged", and contain extra nodes. In certain situations this<br> might be considered as acceptable, e.g. a cutline under a power line in a forest<br> never has straight borders as the forest tries to conquer back some territory.<br> In most cases it has to be corrected, e.g. when a farmland goes along a motorway.<br> Such cases are to be assessed by individual uploaders and should be fixed using<br> "Simplify Way" tool (recommended threshold value: 20), other filters or manually.<br><br>5. Tiny polygons of about 3-16 nodes appearing at the borders of larger areas.<br> Some of them make sense if you look at aerial imagery, but most are just<br> annoying noise. They can be filtered by `filter-osm.py` script or other<br> automatic means before editing or uploading. Many of them cause JOSM<br> validator warnings and as such cannot slip unnoticed.<br><br>6. Duplicated nodes (different classes detected by JOSM). This is partly an<br> artefact of data conversion script, party coordinates precision loss issue.<br> It is automatically fixable inside JOSM with one button press and as such is<br> not worth paying too much attention to.<br><br>My call to action for this mailing list is to inspect the two OSM files I provided<br>for any other classes of inconsistencies that I overlooked. It makes sense to<br>think of quality issues in classes as something happening systematically, not<br>just "I think this particular way should be curved a little to the left".<br>Something that repeats over and over can create a lot of trouble when scaled to<br>a size of the full import. Looking forward to your feedback on them.<br><br>References:<br><br>1. https://josm.openstreetmap.de/ticket/17664<br>2. https://atakua.org/p/nmd/showcase-masked/001_001.tar.bz2<br>3. https://atakua.org/p/nmd/showcase-masked/009_001.tar.bz2<br><br><br>С наилучшими пожеланиями,<br>Григорий Речистов.<br>Med vänliga hälsningar,<br>Grigory Rechistov<br>With best regards,<br>Grigory Rechistov<br><br></BODY></HTML>