[OSM-talk] Japan KSJ2 Import

Shu Higashi s_higash at mua.biglobe.ne.jp
Tue Jun 21 04:59:13 BST 2011


Hi Frederik,

I'm a member of OpenStreetMap Japan community.

I'll ask the necessity of those tags to the community members.
Please give us some time.

It would be helpful for us
if there are something like norm or guidance for source tag.
Because we do not have clear conclusion on
how to use source tags when original source data was modified.

Shu Higashi

2011/6/21, Frederik Ramm <frederik at remote.org>:
> Hi,
>
>     is someone on this list involved in OSM in Japan? I'll go to talk-jp
> with the issue if not, but maybe the right people are reading this here
> also.
>
> I noticed that a lot of data has been imported from a "KSJ2" data set,
> and this data has many tags that I consider unnecessary.
>
> The whole import seems to comprise about 3.5 million nodes, 680k ways,
> and 9000 relations.
>
> 3.3 million nodes are tagged with something like
>
>      <tag k="KSJ2:coordinate" v="32.787857 130.687672"/>
>      <tag k="KSJ2:lat" v="32.787857"/>
>      <tag k="KSJ2:long" v="130.687672"/>
>
> which means that the node coordinates are stored three times - once in
> the node itself and twice in the tags.
>
> About 3.3 million objects are tagged with something like
>
>      <tag k="note" v="National-Land Numerical Information (Railway)
> 2007, MLIT Japan"/>
>      <tag k="note:ja" v="??????(?????)??19??????"/>
>      <tag k="source" v="KSJ2"/>
>      <tag k="source_ref"
> v="http://nlftp.mlit.go.jp/ksj/jpgis/datalist/KsjTmplt-N02-v1_1.html"/>
>
> which is a lot of text where in my opinion a simple source tag on the
> changeset would have been sufficient. (The overwhelming majority of
> source_ref tags, 2.9 million, point to "KsjTmplt-N03.html", but another
> 17 are in use; the distribution for note:ja is similar, with two
> messages being used 1.8 and 1.0 million times respectively, and a
> handful of others in use.)
>
> 3.1 million nodes used by ways are tagged with something like
>
>      <tag k="KSJ2:curve_id" v="c00100298"/>
>      <tag k="KSJ2:filename" v="N03-090320_40_new.xml"/>
>
> which strikes me as a bit unnecessary as well; if really required, then
> that could go on the way using the nodes and not on every single node!
>
> In addition to that, we have 1.1 million objects tagged with
>
>      <tag k="created_by"
> v="National-Land-Numerical-Information_MLIT_Japan"/>
>
> - also something that we usually but on changesets, and that seems to
> duplicate information already in the source and note tags.
>
> There are also about 360k occurrences, on nodes used by ways, of the
> tags KSJ2:INT, KSJ2:INT_label, KSJ2:LIN, KSJ2:OPC, KSJ2:RAC; I have no
> idea what these are for but do they have to go on the nodes really?
>
> I would like to see this (in my opinion) superfluous information
> removed. We would get rid of about 30 million tags. The size of the
> Japan dataset (in XML form) would shrink by 13% from 13.1 to 11.5 GB,
> the .osm.pbf would shrink by 14% from 585 to 501 MB. About 1 GB of
> database storage would be saved on the central OSM database server.
>
> Needless to say, any software that processes the Japan dataset would
> also run faster and consume less resources.
>
> Can anybody comment on this? Are any of the tags that I mentioned above
> actually used by anyone for anything?
>
> In addition, there are 22 multipolygons from the same import, with more
> than 1000 members each (the top three being #1337942 with 10865 members,
> #1060553 with 5637, and #1069424 with 4518). While it is not wrong for a
> multipolygon to have so many members, this makes the affected areas very
> difficult to render and edit, and has the potential to bring
> unsuspecting relation processing software to a halt. Most of these
> multipolygons cannot even be downloaded via the API becuase it takes so
> long. I would like these multipolygons (all natural=wood I believe)
> split up into smaller entities.
>
> It would be great if someone involved with the Japan community could
> deal with these issues; but I would also be willing to do it myself if
> that's ok with the community in Japan.
>
> Finally, I am unsure if the KSJ2 import is even complete; if it is not,
> and still ongoing, then the numbers reported above might not even be the
> last word. In that case I would like to ask whoever is masterminding the
> import to maybe modify their scripts to include less superfluous tags.
> (Objects in question seem to be uploaded by a variety of users so I
> cannot detect from the object history alone who runs the import.)
>
> Bye
> Frederik
>
> --
> Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"
>
> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk
>



More information about the talk mailing list