[OSM-dev] REXML speed again

David Sheldon dave-osm at earth.li
Sun Jun 18 13:17:25 BST 2006


I had a look at the code in SVN this morning and was quite surprised to
see that the GPX generating code that is used by trackpoints.rb (which
in turn is used quite a lot by JOSM) still used REXML.

In a lot of cases the speed here will be limited by the database, but
I'm sure we don't want to be wasting more CPU time on the web server
than we have to.

trackpoints.rb gets up to 5000 points out of the database, then sends
them to the user as a simple GPX file.

I wrote a test app that generated 5000 random points then generated the
XML. I then benchmarked this on the current code, the code re-written
using libxml, and a short bit of custom code that generates the XML by
string concatenation. The results were (for 5 runs of 5000 points):

               user     system      total        real
Old code:  16.050000   0.410000  16.460000 ( 17.939686)
New code:   1.590000   0.020000   1.610000 (  1.765993)
Custom  :   0.370000   0.000000   0.370000 (  0.411103)

As you can see, libxml is about 10 times faster than REXML, and creating
simple XML by hand is another 4 times faster. 

Hand-crafted XML is easy in this case as the only things that we are
putting in our XML are floats, and they will not need escaping. The
method used here is:

def makeGPX(points)
  ret = "<gpx xmlns='http://www.topografix.com/GPX/1/0/' creator='OpenStreetMap.org' version='1.0'><trk><trkseg>"

  points.each do |p|
    ret << "<trkpt lat='#{p.latitude}' lon='#{p.longitude}' />"
  end

  ret << "</gpx>"
end

This can then be called instead of the last three lines of
trackpoints.rb.

Alternatively I attach a patch to osm/gpx.rb that will use libxml.
trackpoints.rb appears to be the only code that uses gpx.rb, so nothing
else should be affected by this change.

I think that architecturally using libxml is better and more
maintainable, but the makeGPX method is faster, and speed is something
that users percieve us having problems with. I'll let someone else
decide which solution to go-live with.

David
-- 
"[Hackers] then only have to crack the password to take control"
  -- IT Week on a terrible Unix security flaw
-------------- next part --------------
Index: gpx.rb
===================================================================
--- gpx.rb	(revision 1078)
+++ gpx.rb	(working copy)
@@ -1,86 +1,82 @@
 module OSM
 
-  require 'rexml/document'
+  require 'xml/libxml'
 
-  #include REXML
-
   class Gpx
 
-    include REXML
+    include XML
   
     def initialize
-      @root = Element.new 'gpx'
-      @root.attributes['version'] = "1.0"
-      @root.attributes['xmlns'] = "http://www.topografix.com/GPX/1/0/"
-      @root.attributes['creator'] = "OpenStreetMap.org"
+      @doc = Document.new()
+      @doc.root = Node.new('gpx')
+      @doc.root['version'] = "1.0"
+      @doc.root['xmlns'] = "http://www.topografix.com/GPX/1/0/"
+      @doc.root['creator'] = "OpenStreetMap.org"
     end
 
     def addnode(node)
-      el1 = Element.new 'wpt' 
-      el1.attributes['lat'] = node.latitude
-      el1.attributes['lon'] = node.longitude
-      el1.text = node.uid.to_s
-      @root.add el1
+      el1 = Node.new('wpt')
+      el1['lat'] = node.latitude.to_s
+      el1['lon'] = node.longitude.to_s
+      el1 << node.uid.to_s
+      @doc.root << el1
     end
 
     def addtrk(trk)
-      el1 = Element.new 'trk'
-      el2 = Element.new 'trkseg'
+      el1 = Node.new('trk')
+      el2 = Node.new('trkseg')
 
       trk.each do |p|
-        el3 = Element.new 'trkpt'
-        el3.attributes['lat'] = p.latitude
-        el3.attributes['lon'] = p.longitude
-        el2.add el3
+        el3 = Node.new('trkpt')
+        el3['lat'] = p.latitude.to_s
+        el3['lon'] = p.longitude.to_s
+        el2 << el3
       end
 
-      el1.add el2
-      @root.add el1
+      el1 << el2
+      @doc.root << el1
       
     end
 
     def addline(line, node_a, node_b)
-      el1 = Element.new('trk')
-      el2 = Element.new('name')
-      el2.text = line.to_s
-      el1.add el2
+      el1 = Node.new('trk')
+      el2 = Node.new('name')
+      el2 << line.to_s
+      el1 << el2
 
-      el3 = Element.new('trkseg')
+      el3 = Node.new('trkseg')
       
-      el4 = Element.new('trkpt')
-      el4.attributes['lat'] = node_a.latitude
-      el4.attributes['lon'] = node_a.longitude
-      el5 = Element.new('name')
-      el5.text = node_a.uid.to_s
+      el4 = Node.new('trkpt')
+      el4['lat'] = node_a.latitude.to_s
+      el4['lon'] = node_a.longitude.to_s
+      el5 = Node.new('name')
+      el5 << node_a.uid.to_s
 
-      el4.add el5
-      el3.add el4
+      el4 << el5
+      el3 << el4
 
-      el6 = Element.new('trkpt')
-      el6.attributes['lat'] = node_b.latitude
-      el6.attributes['lon'] = node_b.longitude
-      el7 = Element.new('name')
-      el7.text = node_b.uid.to_s
+      el6 = Node.new('trkpt')
+      el6['lat'] = node_b.latitude.to_s
+      el6['lon'] = node_b.longitude.to_s
+      el7 = Node.new('name')
+      el7 << node_b.uid.to_s
 
-      el6.add el7
+      el6 << el7
 
-      el3.add el6
+      el3 << el6
 
-      el1.add el3
+      el1 << el3
 
-      @root.add el1
+      @doc.root << el1
 
     end
 
     def to_s
-      return @root.to_s
+      return @doc.to_s(false)
     end
 
     def to_s_pretty
-      hum = ''
-
-      @root.write(hum, 0)
-      return hum
+      return @doc.to_s
     end
 
   end


More information about the dev mailing list