At the tail end of the Rich Web Experience last year, I had a chance to sit down with Doug Hawkins and talk about his work optimizing the JVM in performance critical environments with Azul Systems. We discuss garbage collection, speculative optimization and some of the tools you can use to get better insight into the performance of your application.
If you’re interested in learning more about optimizing the JVM, check out ArchConf West where Doug is one of our Featured Speakers. He’s also going to be giving some great talks at our flagship event, UberConf. He is also an NFJS Tour regular.
Also, check out his Optimizing Java video course.
Douglas Hawkins: Hello.
Michael: Doug, you live out in the Bay Area.
Douglas: Yes, I do.
Michael: What exactly is it that you do?
Douglas: I work on the Java Virtual Machine, not for Oracle but for a company called Azul Systems that is targeted at high performance Java applications.
Michael: I like how you had to slip that in there, that it’s not for Oracle.
That’s cool. Doug knows more about the internals of the JVM than anybody I’ve ever met.
Douglas: There are people who know more but, yeah. Three of them.
Michael: Are you optimizing?
Douglas: We do a variety of different things, one of which is that we have a different garbage collector. Increasingly in the garbage collection world we’re seeing that we’ve gotten pretty good at the throughput side of things but now latency is more and more of a concern since we’re looking at web oriented applications where there’s some SLA to have your application respond in a minimum amount of time.
It’s not just about how many requests you can handle, it’s that you can handle the request quickly. The garbage collection world is going in that direction, whether it’s the garbage collector in something very new like Go or G1, which will become the default collector in Java 9, or the collector in RVM. They’re all becoming more latency oriented rather than throughput oriented.
Michael: Knowing as much as you do about the garbage collector, both your advanced one and what’s currently built into most JVM environments is there anything that you see that people tend to do wrong? To put it differently, are we crippling the garbage collector as it is? Are we at an optimum and we do need to start changing the design of these things?
Douglas: It’s complicated. For the average application I think the garbage collectors have gotten to the point where they’re actually quite good. Right now, you pick, with Java, between the two different old generation strategies that are available based on whether it’s front end, low latency work. There you might use concurrent mark and sweep.
If you’re doing more back end, batch type work then parallel mark compact would be a better choice. Separating those two makes a lot of sense. That’s what the really large scale companies like Twitter, Netflix do.
Twitter actually employs former JVM people from Oracle just to tune the garbage collector for Twitter’s workloads.
Michael: Oh, wow.
Douglas: G1 is trying to strike a balance between these two things. G1, it’s going to take a while to get all the heuristics right. They’ve been working on it for a really long time now. They think they’re close but there’s this debate going on on the Open JDK mailing list about making it the default collector in Java 9. You see that the HotSpot people are like, “We’ve got to make it the default collector. It’s not going to get any better unless we make it the default collector.”
Michael: If you’re relying on the heuristics then you have to have a large sample of what people are doing.
Douglas: You need a large set of test programs.
Then the guys who don’t work for Oracle, who are out doing GC tuning for really demanding customers are like, “From our experience, it’s not ready yet.” Both sides are probably right. The guys who are doing the GC tuning are probably right. It’s not ready for the type of applications that they’re working on yet. But they’re also dealing with the extreme end.
The HotSpot guys are right that they’re not going to make forward progress without exposing it to more programs and learning the few situations where it still falls over. This is something we’ve gone through with every garbage collector.
It’s going to happen again. I’ll trust that the HotSpot guys are right and G1 is ready to be the default collector. For most people, they’re not going to see the difference, regardless of which collector they choose.
Michael: That’s one of many low level optimizations that falls in your area of expertise.
Douglas: Yes. I don’t work on the garbage collector too much. I work with plenty of people who do. I get to benefit from all they’ve learned. They’ve worked on G1. They’ve worked on our garbage collector extensively. I’ve learned a lot from them about how the garbage collectors fail.
My particular area of expertise is actually around speculative JIT compilation stuff. In JVM, because it’s a contained environment and we can monitor what your application is doing we can have what’s called a profile guided compiler.
We go around and follow you and see what your program is doing. Say, “Oh, that’s a thing you’re doing a lot. Let’s optimize that.” We even optimize for up until this point, that if has always been true. You always go that way. You never exercise the else case. We’ll optimize for that.
It sounds kind of crazy but it actually does get you pretty significant performance wins, even simple if optimizations can get you 12 percent on a lot of benchmarks. Sorts of things that traditionally a compiler couldn’t be good at, like in C++ you have to explicitly make a call virtual because it’s a dynamic call. If it’s a dynamic call it’s very hard to optimize.
In Java we can follow your program around and go, “Oh, it’s a dynamic call but the reality is in this particular context you’re only using strings. Even though you’re calling object equals, you’re really only calling string equals. We can optimize that.”
All of this is speculative based on the fact that what your program did at the very beginning when we’re profiling is what it’s going to keep doing.
Michael: Can that lead to a situation where you get really bad performance if suddenly the landscape of the execution changes?
Douglas: Yes. You get into a couple things. It starts off somewhat optimistic and then you slowly move more pessimistic as you prove that the speculations didn’t work. There’s a delicate balancing act to what’s a big enough win to speculate on versus the penalties that come if you’re wrong.
It’s not just a penalty in terms of the throughput dropping. That will happen but that’s sort of necessary to make the code correct. There’s also a correctional penalty to going from your JITed code back to the interpreter, which is orders of magnitude slower.
The actual transition back to the interpreter is really fast but there are situations where when you do that you also throw away the compilation entirely. That compilation is a shared piece of code across all your Java threads. It’s not just the one thread that experienced this aberrant behavior that slows down. It’s every single thread because the compilation went away.
This, in really performance sensitive environments, which is what I’m focused on, becomes a big deal. Some of these are even scarier in that they require a stop the world pause to correct. People only worry about stop the world pauses caused by the garbage collector but that’s because those are the big ones. There are all these little stop the world pauses for things like making a lock thin or thick. That’s happening all the time.
Deoptimizing methods under certain situations can do that, as well. Most of these just don’t matter because they’re often one time correctional events but for the people who really care about peak performance at startup it starts to become an issue.
Michael: Are there any ways that you can actually hint at some of these optimizers?
Douglas: In HotSpot there’s not a lot you can do. Slowly they are giving themselves the tools to use within the JDK to do some interesting things. The most important optimization the compiler does is inlining. In some places it’s a good idea to inline and in some places it’s a bad idea.
There are annotations inside the JDK to always, unless this thing. I know it’s important. I know it’s a good idea to do it no matter what, but that’s not available for the general public to use. They use it to optimize the code itself.
There are things about exploiting our modern memory subsystems where if you’re changing a particular variable a lot you want to make sure it has a cache line to itself because otherwise any other variable that’s adjacent to it in the same cache line, two different cores would fight for it.
There’s an annotation that says, “Hey, give me my own cache line.” They’re sprinkling that throughout the JDK, as well. They haven’t really made most of those things available for your average Java developer to use. They’re hoping that you’re using the JDK classes themselves. By improving the JDK classes, you get the benefit, too.
Michael: If I wanted to dig deeper into the JVM and start doing some low level optimizations what are some resources that you would recommend to get started?
Douglas: If you wanted to see the general workflow of the VM itself there’s a very nice tool called JITWatch that’s a GUI that will show you what JIT compilations are happening. If you build an appropriate plug-in, you can also then see not just the Java code, not just the byte code, but the actual machine code that comes out when HotSpot’s running your program.
JITWatch will even do nice things like it will highlight, “Oh, this method got inlined,” or, “This method didn’t get inlined. I think it would be a good idea if it did,” so that you can go and try to correct some of those things.
A lot of the inlining choices are based on bite code size, so JITWatch also comes with a tool called JARScanner which will go through all your JARs and tell you, “You’ve got a method that’s too big. That’s going to be hard to inline.”
There’s some pretty cool things. If you want to keep going deeper, you can go look at the Java Measurement Harness, which is a way to write microbenchmarks. If you run it in Linux it has integration with really low level performance tools like Perf. You can see down to the machine instruction what’s hot.
This has become an interesting area lately. There are a lot of people talking about the problems with Java profilers at conferences. There are a lot of people contributing back to the open JDK to make this stuff better. A lot of interesting work is going on at Netflix, being contributed by a team that’s led by a guy named Brendan Gregg, who’s doing all sorts of interesting visualizations to show you where your time is going.
Michael: One other thing, I know you’ve got a series of videos that are coming out.
Douglas: Yes, I do. I’ve done a series of videos for O’Reilly that go through, shows you how to use JITWatch, shows you how to use JMH, and lets you see the types of optimizations that are happening as your program is running. But also tells you how the compiler goes about dissecting your code.
Just like with SQL execution plans where it’s important to understand, “Oh, I have an index. If this index is highly selective then I match a small set of things. I can do an index scan or something like that.” There’s a similar set of concepts for a compiler. It’s just an execution plan.
If you understand what those concepts are you can learn what will trip up the compiler. As long as you don’t trip up the compiler then you can let it do all the optimization for you rather than spending a whole bunch of time on things that may not be helping that much.
Michael: That makes a lot of sense. What’s the title of that video series?
Douglas: It’s going to be, “Understanding Java Performance.”
Michael: Excellent. Doug, thank you so much for your time. I look forward to seeing you on the road in 2016. Some of your talks are, “Java Optimizations That Don’t Matter…”
Douglas: Yeah. “Some Java Optimizations That Matter and Some That Don’t.” Talks on understanding and architecting for garbage collection, and talks around the mechanics of the VM, like when does it compile, what are these types of speculative optimizations.
I’m looking forward to next year. I’m doing a talk that marries the static optimizations and the speculative optimizations into one talk. We look at how they all fit together with a five line piece of Java code.
It does take an hour and a half to explain how the VM dissects five lines of Java code.
Michael: That’s awesome. I can’t wait to see it.
Doug, thanks so much for being here. Thanks for your time. I’ll see you next year.
Michael: At No Fluff, Just Stuff we bring the best technologists to you on a road show format. Early bird discounts are available for the 2016 season. Check out the entire show line up and tour dates at NoFluffJustStuff.com.
I’m your host Michael Carducci. Thanks for listening. Stay subscribed.