RR's Random Ramblings: Java Tuning in a Nutshell

Monday, March 12, 2012

Java Tuning in a Nutshell - Part 1

While delivering a training recently, I got a request to put together a JVM tuning cheat sheet. Given the 50+ parameters available on the Sun hotspot, this request is understandable. The diagram below is what I came up with. I’ve tried to narrow down the most important flags that will solve 80% of JVM performance needs with 20% of the tuning effort. This article assumes basic JVM tuning knowledge - the different generations used in the Sun hotspot JVM, different garbage collection algorithms available, etc. Although this is intended primarily for enterprise grade Oracle Fusion Middleware products, it applies to most server JVM’s with large heaps and hosted on server class, multi-core machines. This is not an exhaustive list, only low hanging fruit. In fact, many JDK1.6 users need no tuning at all - the JVM picks good defaults and ergonomics does a decent job. Follow this only if the default behavior is not good enough (for instance, frequent garbage collections, low throughput, long GC pauses, etc). In my experience, a non-trivial production topology with Oracle Fusion Middleware products often requires this level of tuning. This includes Oracle WebLogic Server (JavaEE apps), Oracle Coherence, Oracle Service Bus, Oracle SOA Suite, BPM, AIA and other enterprise FMW apps running on the Sun hotspot JVM. I’ve used a mind map below to help visualize the relationship and dependencies between various JVM tuning flags. In the diagram, the flags in black are the ones to try first; the ones in gray are optional; anything not covered here can be ignored! :)

I’ve categorized the flags into 4 groups:

Garbage collection (GC): The garbage collection algorithm is one of the two mandatory tunables for java performance tuning. Start with UseParallelOldGC. If GC pauses are not acceptable, switch to UseConcMarkSweepGC (prioritizes low application pause times at the cost of raw application throughput). Specify parameter ParallelGCThreads to limit GC threads (yes limit, the default is usually too high for multiple Weblogic servers sharing a large, multi-core machine). Recommendations for values and other flags will be covered later.
Heap tuning: This is the other mandatory tunable. I’m using ‘heap’ as an umbrella term for all Java memory spaces. Technically, Perm and Stack are not part of the java heap in Sun hotspot. Required flags in my tuning exercise are total heap size (Xmx, Xms), young generation size (Xmn) and permanent generation size (PermSize, MaxPermSize). Xss tuning is optional. I only use it when tuning on a 32-bit heap-constrained JVM; reducing Xss only to squeeze memory out from native space so more is available for Xmx. In any case, never set Xss below 128k for Fusion Middleware (default is usually 512k to 1m depending on OS).
Logging: GC logging is mandatory only for the duration of the tuning exercise itself. However, due to its low overhead (typically only one line written per collection, which itself is relatively infrequent), it is highly recommended for production as well. Otherwise, you will not be able to make an educated tuning decision if/when things don't work as expected.
(Optional) Other Performance: These are only used for fine tuning when performance is the driver for the tuning exercise. Even then, try these only after GC and heap are well tuned to begin with.

The primary requirement that warrants JVM tuning in production Oracle Fusion Middleware is not performance, rather unacceptable GC pauses. The cultprit almost always is a Full GC that causes long application pause. Symptoms include temporarily unresponsive servers, client session timeouts, etc. If you’re capturing GC logs using the flags in the diagram, a search for “Full GC” will show how many, how frequent and how long Full GC’s took. Following the tunables in the diagram above, this is how you can solve the problem (I have highlighted the parameters to match those in the diagram):

Heap not sized correctly, causing Full GC’s

-Xmx should be equal to -Xms Growing from Xms to Xmx requires Full GC’s to resize the heap. Set these to the same value if Full GC’s are to be completely eliminated in production.
–XX:PermSize should be equal to –XX:MaxPermSize
Both params need to be specified and should have the same value. Otherwise, a full GC is required for each Perm Gen resize while it grows up to MaxPermSize
–XX:NewSize is specified but not equal to –XX:MaxNewSize
Like the other heap params, resize of new/young gen requires a Full GC. The preferred approach is to avoid these two parameters and use -Xmn instead. This eliminates the problem as setting, say "-Xmn1g", is the same as setting "-XX:NewSize=1g -XX:MaxNewSize=1g".
–XX:SurvivorRatio is specified but –XX:-UseAdaptiveSizePolicy is not. The SurvivorRatio specified will not stick if AdaptiveSizePolicy is in effect. By default, the JVM adapts and overrides the value you specified based on runtime heuristics. Use this parameter to disable adaptive sizing of generations (notice the 'minus' sign preceding UseAdaptiveSizePolicy).

–XX:+UseConcMarkSweepGC is almost always used when there is a strict latency requirement or Service Level Agreement (SLA) and long GC pauses are unacceptable. That is, avoid Full GC’s at all cost. However there are many reasons why Full GC’s could still occur:

Although UseConcMarkSweepGC is specified, CMS can and often will kick in too late, causing a Full GC when it can’t catch up. In other words, although CMS is collecting garbage, the application threads that are executing concurrently run out of heap for allocation because CMS couldn't free garbage soon enough. At this point, the JVM stops all application threads and does a Full GC. This is also called a “concurrent mode failure” in GC logs. The reason for concurrent mode failure - the JVM dynamically finds a value for when CMS should be initiated and changes this value based on statistics. However, in production, load is often bursty which leads to misses/miscalculation for the last dynamically computed initiation value. To prevent this, provide a static value for CMSInitiation. Use –XX:CMSInitiatingOccupancyFraction (as percentage of total heap) to tell the JVM what point it should initiate CMS. A value between 40 to 70 usually works for most Fusion middleware products. Start with the higher value (70) and tune down only if you still see the string “concurrent mode failure” in GC logs.
Secondly, always specify –XX:+UseCMSInitiatingOccupancyOnly when CMSInitiatingOccupancyFraction is used, otherwise the value you specify does not stick (JVM will dynamically change it on the fly again). This is very important and commonly missed.

UseParallelGC is used instead of –XX:+UseParallelOldGC

UseParallelOldGC does old gen collection in parallel unlike UseParallelGC. In both cases, young gen (minor) collections are still parallel. By having multiple threads do old gen collection, the overall Full GC pause can be reduced.
If no GC params are specified, UseParallelGC is usually the default (this may have changed in later versions of JDK6), so it is safe to always specify this parameter when throughput is the goal.

Rarely, no matter how well you tune your JVM, the heap gets backed up eventually and results in back-to-back Full GC’s (again, use GC logs to guide you). If this is the case, there is a possibility that your code has introduced a memory/reference leak. To confirm, take a few heap dumps and compare them to see if any particular object count is growing with time, even after GC completes. Again, this is very rare so make sure you do your due diligence with JVM tuning first.

I’d be interested in your comments or questions after you try this out. Happy tuning!

30 comments:

MauroMarch 14, 2012 at 1:28 AM
The correct parameter is -XX:+UseParallelOldGC
ReplyDelete
Replies
SundanceMarch 14, 2012 at 2:20 AM
What about Hotspot JDK1.7, any good tuning tips specifically for the latest JVM?
ReplyDelete
Replies
effendiMarch 19, 2012 at 5:11 PM
This is excellent, thanks.
ReplyDelete
Replies
effendiMarch 19, 2012 at 8:39 PM
Question - I notice you didn't discuss -XX:+UseLargePages - any particular reason?
ReplyDelete
Replies
AnonymousMarch 22, 2012 at 6:42 AM
What are your thoughts on XX:+UseCompressedOops? I've seen a lot of posts and confusion about it on the Internets.
ReplyDelete
Replies
MattMarch 22, 2012 at 8:11 AM
You don't mention -XX:+UseTLAB. Have you not found it helpful?
ReplyDelete
Replies
Adi LevMarch 22, 2012 at 12:17 PM
We found out that for real time application Oracle (BEA originally) JRocket JVM perform even better and it memory leakage profiling on production system is excellent for troubleshooting.
ReplyDelete
Replies
UnknownMarch 22, 2012 at 12:56 PM
what about the new-new G1 Garbage collector? To use it, just do: -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

The new G1 collector does better w/ pauses and also does a more through job w/ Young collection than the CMS collector mentioned above.
ReplyDelete
Replies
UnknownMarch 22, 2012 at 5:49 PM
Rupesh, can you suggest any particularly helpful in-depth resources for learning more about the details of Oracle's VM implementations? I'm looking for books. Thanks.
ReplyDelete
Replies
Will ThamesMarch 22, 2012 at 6:10 PM
Some missing log statements (there are more but these are minimal to know when performance is bad).

-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
ReplyDelete
Replies
UnknownMarch 23, 2012 at 4:40 AM
nice blog ! enjoy reading it
ReplyDelete
Replies
UnknownMarch 26, 2012 at 10:52 AM
Hi,
Get lot of Full GC in prod, there are 2 types.
Type-1
1287470.552: [Full GC (System) [PSYoungGen: 1494K->0K(692224K)] [PSOldGen: 190715K->179290K(1400832K)] 192209K->179290K(2093056K) [PSPermGen: 138023K->138023K(139264K)], 5.1737903 secs] [Times: user=5.17 sys=0.01, real=5.17 secs]

Type-2
1305498.007: [Full GC (System)[Unloading class sun.reflect.GeneratedMethodAccessor9791]
1309103.580: [Full GC (System)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor5242]
1312709.715: [Full GC (System)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor5323]
1316315.734: [Full GC (System)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor5364]

Is Type-2 purely code/framework problem? does setting appropriate heap and/or GC params can avoid these type-2 Full GC cycles?

Thanks
Amit
ReplyDelete
Replies
Matt FowlesApril 10, 2012 at 8:34 AM
Under I find it best to include `-XX:+PrintVMOptions`, it doesn't do much special, but it makes my tests easier to analyze after the fact.
ReplyDelete
Replies