[josm-dev] Jumbo Patch

Petr Nejedly Petr.Nejedly at Sun.COM
Mon Dec 17 23:49:50 GMT 2007


Gabriel Ebner napsal(a):
> On Mon, Dec 17, 2007 at 11:22:54PM +0100, Petr Nejedly wrote:
>> Frederik Ramm napsal(a):
>>> On the other hand, if we re-write anyway, then we might just ditch
>>> Java altogether and do it in C++, or maybe simply use Merkaator as a
>>> start (don't know how well it is designed on the inside but it is a
>>> fast C++ app that compiles easily on Win, Mac and Linux which is
>>> probably enough for us). If people are really concerned about memory
>>> usage and such (as in "you can't keep a list of ways with every node
>>> as even the empty list will consume 80 bytes") there's just no beating
>>> good old C++.
>> Really? Oh come on, you need to know your libraries, regardless of the language.
>> Smallest possible growing list implementation would consume 32B (for single-slot,
>> but still with the possibility of growing transparently) - one object with two
>> fields and an object array.
> 
> I think we can get this down to 24 bytes (2 words for the VM[1], one to the
> head element) by using a singly-linked list and representing nil as null.
> 
> [1] Are there any recent/accurate informations on the object layout of Sun's
> VM?  The best I've found is this:
> http://java.sun.com/developer/technicalArticles/Networking/HotSpot/
> But it's almost 10 years old.

The information there is right.
"new Object()" is 8B (assuming 32bit JVM), as is any stateless instance.
Object[0] and Object[1] are both 16B, because HotSpot has 8B object granularity.

If the size of the backref collection was real concern (i.e. we would agree the
info belongs there, which it doesn't, and it would really end up on all node instances),
we could still go for:

private Object backRefs;

if (backRefs == null) {
     // case for no way referencing this node
     ...
} else if (backRefs instanceof Way) {
     Way theOnlyOne = (Way)backRefs;
     ...
} else {
     assert backRefs instanceof Way[];
     Collection<Way> ways = Arrays.asList((Way[])backRefs);
}

4B for 0-1 ways, 24B for 2-3 ways

>> inline one of them into the other data structure (Node in this case) thus saving
>> one object header, but the fields will still be there. And in both cases, you can
>> take the path of reused field and 4B/singleslot for typical case. At least as long
>> as RTTI works for you in your C++ implementation.
> 
> You don't even need RTTI as you don't usually pass around lists as pointers of
> unknown type.
> 
> I think the biggest saving in moving to C++ would come from being able to use
> non-reference composite types. 
Sure, this is the biggest weakness of Java as I see it. Smart API design can overcome it though.

> Say we have this code now:
> 
>   public class Node {
>     public Coordinate coord;
>   }
> 
>   public class Coordinate {
>     public double lat, lon;
>   }
> 
> A Node instance takes 56 bytes now: 24 bytes for Node, 32 bytes for
> Coordinate.

No, the Node alone chimes in at 48B and there are two Coordinates (before/after projection),
each consuming 24B.
If the merging code ever gets in the way and parse all the timestamps
(which it used to do a month ago, hopefully got fixed by latest improvement
in the MergeVisitor), you'll get another Date instance.
But in fact, the biggest consumer of the heap in DataSet are tons and tons
of duplicated strings. I have a patch to post later that deals with this
(260MB->160MB for czechia.osm's 500.000 Nodes)

> By not using a reference type we could save 24 bytes here (one pointer + one
> header).
12B only (do you count with 8B words or what?)

But imagine this:

public interface Coordinate {
     public double getLatitude();
     public double getLongitude();
}

public class Node implements Coordinate {
     private double lat, lon;
}

Of course we can afford that only because Coordinate has no logic attached,
otherwise we'd need to duplicate the logic in Node.

-- 
Petr "Nenik" Nejedly, NetBeans/Sun Microsystems, http://www.netbeans.org
355/113 -- Not the famous irrational number PI, but an incredible simulation!




More information about the josm-dev mailing list