There are a few issues that can tweek the interpretation of your results.
* Memory usage
17MB is not unheardof simply to load the JVM ... What I would be interested
is Delta VM .... that is,
load the JVM, run a very simple program, snapshot the VM used, then run
your spamprobe tests within the JVM and see where it goes.
Even still, the concepts of garbage collection are such that unless you
really force otherwise it will allocate more memory instead of reclaim.
This can be a "feature" ... the java JVM is not optimizing for minimum
memory used. Likely your never hitting the GC at all until exit ...
* GC on exit.
The main theoretical advantage (IMHO) to a GC environment vs say C or C++ is
that it doesnt free ... so you have no time calling free()/delete
However, the tests you run, if your running end-to-end (start, run, exit) in
fact DO call free at the end (or final GC ...)
This is why App Servers are used (among many other reasons). You prime the
pump, run the app, and DO NOT include the exit/GC time as part of your
calculations. It would be interesting to simulate this. i.e. make a java
main that runs some dummy stuff to 'prime the pump' then set your timer to
0, run the spamprobe, then report the time. THEN exit.
Of course, is this "fair" calculations ? all depends on what you want to
calculate. If your calculating a useage that means start/run/exit then
But if your trying to calculate the time of the "operation" then maybe so
... the theory here is that GC can run when nobody cares ... (say between
events ... )
Anyway ... this is the logic I learned when taking a C# class last year ...
it makes some kind of perverted sense in the right usage pattern ...
The usage pattern of a JVM starting/running/exiting for short operations is
not a usage pattern that java excells at ... even if ran fast in between ...
This is why java running in an App server type environment is what is
typically used ... For spamprobe that would mean keeping a JVM Alive for
long periods ... you might (and I mean *might* ... not *probably will*)
start to see some of the performance your hopping for in that environment.
I *have seen* real java programs running enterprise transaction systems,
running 500+ transactions/sec doing real work, faster and more scalable then
equivilent C programs ... but they take 30+ seconds to get up and going ...
I wouldnt be supprised if the next N+1 version of Windows has a CLR in the
OS or something to "hide" the startup overhead so that the one-shot usage
model becomes efficient.
----- Original Message -----
From: "Brian Burton" <bburton@...>
To: "Spamprobe Users" <spamprobe-users@...>
Sent: Tuesday, November 15, 2005 9:20 PM
Subject: [Spamprobe-users] fun with Java
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> Wow. I like to think of myself as a guy who isn't prone to naive
> flights of fancy. I'm a practical man. Sure, I program in Java for a
> living but I have a good understanding of the trade offs involved. Java
> has huge advantages in development time and security over C and C++.
> The price you pay for these (big) benefits is increased memory
> consumption and slower execution. When I wanted performance I'd switch
> to C++ for raw execution speed. I'd pay the price in increased
> development time, memory management debugging, etc. I knew all this,
> had experienced it in the past, and didn't think much more of it.
> Then last week I read about the release of GCJ 4.x and how much GCJ had
> improved since I last played around with it. I started to get really
> excited at the prospect of a completely open source Java execution
> environment that would run on just about all open source OS's. It
> brought back the grand dreams I'd had when I experimented with mono a
> year or so ago. After all I've been programming in C++ for years and
> like the language but, I have to admit, I enjoy Java development a whole
> lot more.
> So I decide to give GCJ a fair test. Over the last week or so I've been
> porting some of the spamprobe code over to Java. What I have is
> basically just the parsing code. The Java implementation uses the same
> well factored (IMHO) design as the C++ spamprobe implementation so there
> wouldn't be any question about differences in algorithms causing
> performance differences. This would be an excellent barometer for what
> to expect if I moved forward with a full Java implementation.
> Writing the Java port was a hoot. I was able to strip out all of the
> nasty C++ memory management code (yuck) since in Java the GC would take
> care of all of that for me. I could also write and run unit tests using
> junit to test each class as I went. I could use the excellent debugging
> and refactoring tools in IDEA to make the coding more fun. It was a lot
> of fun and I couldn't wait to get to the point where I had a program to
> Well, finally this evening I was able to do a side by side comparison.
> I wrote a program in Java that parses an mbox file and dumps out some of
> the headers for each message along with the tokens in the message in
> alphabetical order. There isn't any database access involved. Just
> straight parse and print. I then modified the spamprobe.cc file to add
> a new command that does exactly the same thing.
> I compiled the Java program using gcj to produce a native executable
> (using -O4). That gave me three different programs to test: spamprobe
> (native C++), a gcj compiled native Java program, and a bytecode Java
> program running inside of Sun's JDK (using hotspot).
> I downloaded a big mbox file containing a lot of spams to my laptop
> (yuck, will have to scrub the drive to remove the slime later) proceeded
> to run all three programs parsing the file multiple times per run and
> dumping to /dev/null.
> Here is the summary (average over several runs):
> Program Seconds Res. MB
> spamprobe 49 1.8-2.4
> native gcj program 124 17.0
> hotspot program 202 40-400
> I had a lot of trouble running under hotspot. That's really weird. I
> got OutOfMemory errors from the VM until I ran it with -Xmx512m (max
> heap size 512 MB!). That's got to be a bug in the VM. I'll have to
> investigate it more.
> So the native C++ program came in at approx. 2.5x faster and used
> approx. 1/10 the resident memory size of the native java program. Wow.
> I was really hoping that a natively compiled Java program would be
> around 80-90% the performance of a C++ program. Not even close.
> The memory usage difference floored me. I spent a LOT of time on the
> memory management code in spamprobe so I knew it was pretty tight.
> Still, I didn't expect the Boehm GC to let the heap grow quite as big as
> it did.
> I'll continue to play with this (it's fun) and see if there are knobs I
> can turn or buttons I can push to change gcj's optimizations to get
> better performance.
> I just wanted to share this with the list in case anyone found it
> All the best,
> -----BEGIN PGP SIGNATURE-----
> Version: PGP Desktop 9.0.2 (Build 2425)
> -----END PGP SIGNATURE-----
> This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
> Register for a JBoss Training Course. Free Certification Exam
> for All Training Attendees Through End of 2005. For more info visit:
> Spamprobe-users mailing list