Archive for the ‘ Java ’ Category

32-bit or 64-bit JVM? How about a Hybrid?

Before x86-64 came along, the decision on whether to use 32-bit or 64-bit mode for architectures that supported both was relatively simple: use 64-bit mode if the application requires the larger address space, 32-bit mode otherwise. After all, no point in reducing the amount of data that fits into the processor cache while increasing memory usage and bandwidth if the application doesn’t need the extra addressing space.

When it comes to x86-64, however, there’s also the fact that the number of named general-purpose registers has doubled from 8 to 16 in 64-bit mode. For CPU intensive apps, this may mean performance at the cost of extra memory usage. On the other hand, for memory intensive apps 32-bit mode might be better in if you manage to fit your application within the address space provided. Wouldn’t it be nice if there was a single JVM that would cover the common cases?

It turns out that the HotSpot engineers have been working on doing just that through a feature called Compressed oops. The benefits:

  • Heaps up to 32GB (instead of the theoretical 4GB in 32-bit that in practice is closer to 3GB)
  • 64-bit mode so we get to use the extra registers
  • Managed pointers (including Java references) are 32-bit so we don’t waste memory or cache space

The main disadvantage is that encoding and decoding is required to translate from/to native addresses. HotSpot tries to avoid these operations as much as possible and they are relatively cheap. The hope is that the extra registers give enough of a boost to offset the extra cost introduced by the encoding/decoding.

Compressed Oops have been included (but disabled by default) in the performance release JDK6u6p (requires you to fill a survey), so I decided to try it in an internal application and compare it with 64-bit mode and 32-bit mode.

The tested application has two phases, a single threaded one followed by a multi-threaded one. Both phases do a large amount of allocation so memory bandwidth is very important. All tests were done on a dual quad-core Xeon 5400 series with 10GB of RAM. I should note that a different JDK version had to be used for 32-bit mode (JDK6u10rc2) because there is no Linux x86 build of JDK6u6p. I chose the largest heap size that would allow the 32-bit JVM to run the benchmark to completion without crashing.

I started by running the application with a smaller dataset:

JDK6u10rc2 32-bit
Single-threaded phase: 6298ms
Multi-threaded phase (8 threads on 8 cores): 17043ms
Used Heap after full GC: 430MB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC

JDK6u6p 64-bit with Compressed Oops
Single-threaded phase: 6345ms
Multi-threaded phase (8 threads on 8 cores): 16348ms
Used Heap after full GC: 500MB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops

The performance numbers are similar and the memory usage of the 64-bit JVM with Compressed Oops is 16% larger.

JDK6u6p 64-bit
Single-threaded phase: 6463ms
Multi-threaded phase (8 threads on 8 cores): 18778ms
Used Heap after full GC: 700MB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC

The performance is again similar, but the memory usage of the 64-bit JVM is much higher, over 60% higher than the 32-bit JVM one.

Let’s try the larger dataset now:

JDK6u10rc2 32-bit
Single-threaded phase: 14188ms
Multi-threaded phase (8 threads on 8 cores): 73451ms
Used Heap after full GC: 1.25GB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC

JDK6u6p 64-bit with CompressedOops
Single-threaded phase: 13742ms
Multi-threaded phase (8 threads on 8 cores): 76664ms
Used Heap after full GC: 1.45GB
JVM Args: -XX:MaxPermSize=256m -Xms3328m -Xmx3328m -server -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops

The performance difference and memory overhead are the same as with the smaller dataset. The benefit of Compressed Oops here is that we still have plenty of headroom while the 32-bit JVM is getting closer to its limits. This may not be apparent from the heap size after a full GC, but during the multi-threaded phase the peak memory usage is quite a bit larger and the fact that the allocation rate is high does not help. This becomes more obvious when we look at the results for the 64-bit JVM.

JDK6u6p 64-bit
Single-threaded phase: 14610ms
Multi-threaded phase (8 threads on 8 cores): 104992ms
Used Heap after full GC: 2GB
JVM Args: -XX:MaxPermSize=256m -Xms4224m -Xmx4224m -server -XX:+UseConcMarkSweepGC

I had to increase the Xms/Xmx to 4224m for the application to run to completion. Even so, the performance of the multi-threaded phase took a substantial performance hit when compared to the other two JVM configurations. All in all, the 64-bit JVM without compressed oops does not do well here.

In conclusion, it seems that compressed oops is a feature with a lot of promise and it allows the 64-bit JVM to be competitive even in cases that favour the 32-bit JVM. It might be interesting to test applications with different characteristics to compare the results. It’s also worth mentioning that since this is a new feature, it’s possible that performance will improve further before it’s integrated into the normal JDK releases. As it is though, it already hits a sweet spot and if it weren’t for the potential for instability, I would be ready to ditch my 32-bit JVM.

Update: The early access release of JDK 6 Update 14 also contains this feature.
Update 2: This feature is enabled by default since JDK 6 Update 23.

via 32-bit or 64-bit JVM? How about a Hybrid? « Ismael Juma.

Apache Tomcat 6.0 – Clustering/Session Replication HOW-TO

Apache Tomcat 6.0Clustering/Session Replication HOW-TO

      The Apache Tomcat Servlet/JSP Container

In this release of session replication, Tomcat can perform an all-to-all replication of session state using the DeltaManager or perform backup replication to only one node using the BackupManager. The all-to-all replication is an algorithm that is only efficient when the clusters are small. For larger clusters, to use a primary-secondary session replication where the session will only be stored at one backup server simply setup the BackupManager.
Currently you can use the domain worker attribute (mod_jk > 1.2.8) to build cluster partitions with the potential of having a more scaleable cluster solution with the DeltaManager(you’ll need to configure the domain interceptor for this). In order to keep the network traffic down in an all-to-all environment, you can split your cluster into smaller groups. This can be easily achieved by using different multicast addresses for the different groups. A very simple setup would look like this:

via Apache Tomcat 6.0 – Clustering/Session Replication HOW-TO.

HowTo – Tomcat Wiki

How do I obtain a thread dump of my running webapp ?

You can only get a thread dump of the entire JVM, not just your webapp. This shouldn’t be a big deal, but should be made clear: you are getting a dump of all JVM threads, not just those “for your application”, whatever that means.

Getting a thread dump depends a lot on your environment. Please choose the section below that matches your environment best. The more universal and convenient options are presented first, while the more difficult ones or those for specific setups are provided later. Generally, you should start at the top of the list and work your way down until you find a technique that works for you.

If you are running Sun JDK 1.6 or higher

Sun’s JDK (not the JRE) ships with a program called jstack (or jstack.exe on Microsoft Windows) which will give you a thread dump on standard output. Pipe the output into a file and you have your thread dump. You will need the process id (“pid”) of the process to dump. Use of the program jps (jps.exe on Microsoft Windows) can help you determine the pid of a specific Java process.

If you are on *NIX running Sun JDK

Sun provides jstack on *nix systems from version 1.4 onward. See the above tip if you have such an environnment.

If you are running on *NIX

Send a SIGQUIT to the process. The thread dump will be sent to stdout which is likely to be redirected to CATALINA_BASE/logs/catalina.out.

To send a SIGQUIT, use kill -3 <pid> from the command line.

If you are running Tomcat as a service on Microsoft Windows

Edit your service to add the “//MS//” option to the command line. This enabled the “Monitor Service” which puts an icon in the system tray while Tomcat is running. Right-clicking the Tomcat monitor in the system tray allows you to produce a thread dump in stdout.

If you have Tomcat running in a console

*NIX: Press CRTL- Microsoft Windows: press CRTL-BREAK

This will produce a thread dump on standard output, but may not be possible to capture to a file.

How do I use Hibernate and database connection pooling with Tomcat?

See TomcatHibernate

How do I set up Tomcat virtual hosts in a development environment?

See TomcatDevelopmentVirtualHosts

via HowTo – Tomcat Wiki.