A deep, deep dive into Java debugging

A deep, deep dive into Java debugging

Follow senior architect Robbie Gates all the way down the rabbit hole to the JVM code, and learn a few new debugging tricks to keep up your sleeve.

This is the story of a Java debugging journey that started with a question I couldn’t answer about Java stack traces. As a long time Java programmer, I am approached for help by developers who encounter unusual problems in the language. Diving in and getting acquainted with all the dark corners of Java is something I really enjoy, mainly because I emerge with a better understanding of the language, and also because it equips our team with better tools to solve the everyday problems … as well as the unusual ones.

Our trip takes us through a deeper look at Java lambdas and method references, and ends up with a quick foray into the JVM code. We’ll use a couple of debugging tools and techniques to figure it all out, and learn a little about implementation details and diagnostic JVM options. It’s a good example of how, with the source in hand, you can demystify a general phenomenon, such as missing frames.

It all started with a NullPointerException

A co-worker approached me with the following stack trace:

java.lang.NullPointerException
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.Collections$2.tryAdvance(Collections.java:4717)
	at java.util.Collections$2.forEachRemaining(Collections.java:4725)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.atlassian.jira.jql.util.ParentEpicFinalTerminalClauseFactory.getFinalTerminalQuery(ParentEpicFinalTerminalClauseFactory.java:47)

This was accompanied by another 364 less-interesting frames. What’s interesting, and what brought him to me, is the code of the first frame. It’s from the JDK (build 1.8.0_172-b11), and it is as follows:

ReferencePipeline.java
public final  Stream map(Function<? super P_OUT, ? extends R> mapper) {
    Objects.requireNonNull(mapper);
    return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
                                 StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) {
        @Override
        Sink opWrapSink(int flags, Sink sink) {
            return new Sink.ChainedReference<P_OUT, R>(sink) {
                @Override
                public void accept(P_OUT u) {
                    downstream.accept(mapper.apply(u));
                }
            };
        }
    };
}

The question at hand was, “What could be null on line 193 (highlighted above in red)?”  It can’t be mapper because:

It can’t be downstream, because if you hunt down its declaration you find:

Sink.java
static abstract class ChainedReference<T, E_OUT> implements Sink {
    protected final Sink<? super E_OUT> downstream;
 
    public ChainedReference(Sink<? super E_OUT> downstream) {
        this.downstream = Objects.requireNonNull(downstream);
    }

As you can see:

My next thought was that maybe P_OUT was a boxed type, like Integer or Long, and that to invoke mapper.apply, the compiler had inserted an unboxing of u.Unboxing throws a NullPointerException if the reference is null (detailed in JLS 8, 5.1.8). For example, code like int value = getValue();

can throw an exception, if getValue() is declared to return Integer and actually returns null because the compiler inserts a call to Integer.intValue to convert the returned Integer to an int. However, looking at the code for getFinalTerminalQuery in this case revealed that P_OUT was com.atlassian.jira.issue.Issue, which isn’t boxed – it’s the main issue interface in Jira.

Where’s the null?

Although these questions didn’t find the source of the NullPointerException, they did reveal that mapper was Issue::getId – that is – a method reference (as per JLS 8, 15.13), and this seemed unusual enough to be interesting and worth looking at further. At this point I was somewhat bemused, but also quite excited, because I couldn’t explain what I was seeing … which meant that I was about to learn something new! So I broke out a bit of test code by boiling down the case above to the following code:

Main.java
public class Main {
    static class Thing {
        Long
        getId() {
            return 163L;
        }
    }

    public static void main(String[] args) {
        Thing thing = null;
        System.out.println(
            Collections.singleton(thing).stream().map(Thing::getId).collect(toList())
        );
    }
}

Sure enough, running this code got me a NullPointerException with a stack similar enough to the above, at least at the pointy end. By the way, you can clone that repository using git clone git@bitbucket.org:atlassian/missing-frame.git if you want to see it for yourself. There are some handy canned command lines in the README.

In other words, you’ll get a NullPointerException if you apply a valid method reference (like Thing::getId in the example) to a null reference.

This is not surprising. What is surprising, until you know this, is that you won’t see a frame for the invocation of that function. You can’t see a frame for getId of course, because that would mean you’d be in a member function where this was null, which won’t happen while you’re looking. This has nothing to do with the use of the Stream by the way, you can get the same effect from

Thing thing = null;
Function<Thing, Long> function = Thing::getId;
function.apply(thing);

But I didn’t know that at the time I was writing the test code. This is one of the nice things about writing test code, you can freely edit it and play around with what exactly is causing the effect you are investigating.

But where is the exception thrown?

In IntelliJ, you can use View/Show Bytecode on these examples to see the JVM bytecode. Other IDEs have similar functionality. You could also use javap from the command line. In any case, you will see something like:

INVOKEDYNAMIC apply()Ljava/util/function/Function; [
  // handle kind 0x6 : INVOKESTATIC
  java/lang/invoke/LambdaMetafactory.metafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;
  // arguments:
  (Ljava/lang/Object;)Ljava/lang/Object;, 
  // handle kind 0x5 : INVOKEVIRTUAL
  com/atlassian/rgates/missingframe/Main$Thing.getId()Ljava/lang/Long;, 
  (Lcom/atlassian/rgates/missingframe/Main$Thing;)Ljava/lang/Long;
]

What is invokedynamic, you say? In this case, we need to break out the Java Virtual Machine Specification, version 8, section 6.5, which tells us that some information is looked up and then resolved (as per JVMS 8, section 5.4.3.6) to produce a MethodHandle object. This is then invoke()d to get a CallSite which is bound to that instruction, effectively caching the costly lookup. Then, we invokeExact the CallSite.target to get a Function object. Whoah. It’s worth noting this is not what is throwing – it hasn’t even looked at the null reference yet, and the spec is quite clear that an invokedynamic instruction which is bound to a call site object never throws a NullPointerException or …”.

So what is the Function we get? If you look at it in IntelliJ, for example by breakpointing the constructor of NullPointerException and inspecting the mapper, you see an object which claims to be a com.atlassian.rgates.missingframe.Main$$Lambda$1/6738746, but you won’t find that in your source. These classes are generated on the fly by the LambdaMetafactory referenced in the INVOKEDYNAMIC opcode. The code that writes them is in the delightfully named java.lang.invoke.InnerClassLambdaMetafactory#spinInnerClass. It’s worth noting they don’t go through the normal ClassLoader machinery, but use the wonderful sun.misc.Unsafe#defineAnonymousClass to get themselves loaded.

Anyway, now that I was this far down, I had to keep digging. So I dumped the class spun up by spinInnerClass to a file by evaluating new FileOutputStream(“lambdaClass.class”).write(classBytes) in the debugger. This is a common trick I use when debugging – evaluating an expression which intentionally has side effects. There is some data we want to inspect – classBytes in this example. To inspect it, we want to pass it off to another tool, and so we need to get it in a file. The expression has the side effect of writing classBytes to the file lambdaClass.class, and I can use this for further processing. Once I had the bytes from the built-on-the-fly class in a file, I could use the javap utility to dump it.

0: aload_1
1: checkcast     #15                 // class com/atlassian/rgates/missingframe/Main$Thing
4: invokevirtual #19                 // Method com/atlassian/rgates/missingframe/Main$Thing.getId:()Ljava/lang/Long;
7: areturn

Voila! It’s the invokevirtual that is throwing NullPointerException, as per JVMS 8, section 6.5if objectref is null, the invokevirtual instruction throws a NullPointerException. So that’s kind of what one might expect – a runtime generated wrapper which calls the function named by the method reference.

But the frame is still missing!

There totally should be a frame for apply. In fact, if you’re playing at home and have breakpointed the NullPointerException constructor like me, you can see the apply frame on the stack. However, if you step through the code until after the call to java.lang.Throwable#fillInStackTrace(int), you won’t find the frame in the output of Throwable.getStackTrace. At this point, the only place left to inspect was fillInStackTrace, and of course that is native. So you need to hop over to the OpenJdk source, and look at void java_lang_Throwable::fill_in_stack_trace(Handle throwable, methodHandle method, TRAPS) in hotspot/src/share/vm/classfile/javaClasses.cpp. I didn’t grok the whole method, but I did skip through it enough to find:

javaClasses.cpp
if (method->is_hidden()) {
  if (skip_hidden)  continue;
}

Methods can be hidden? TIL. I wonder who calls set_hidden? As it turns out, void ClassFileParser::MethodAnnotationCollector::apply_to(methodHandle m) says:

classFileParser.cpp
if (has_annotation(_method_LambdaForm_Hidden))
  m->set_hidden(true);

A bit more digging reveals that the value of _method_LambdaForm_Hidden is the string Ljava/lang/invoke/LambdaForm$Hidden;. To tie this back to the Java code, I again used javap to decompile the class file we dumped above. This time, however, I added an extra flag to javap – the -v flag, which makes javap verbose. In particular, this makes it dump the strings in the class constant pool – our string is at index #13, and we can see the runtime annotation #13 is present on the apply method:

#13 = Utf8               Ljava/lang/invoke/LambdaForm$Hidden;

...
public java.lang.Object apply(java.lang.Object);
...
RuntimeVisibleAnnotations:
  0: #13()

So, you can’t see the frame because it’s hidden. In fact, a bit further up in spinInnerClass, it calls:

mv.visitAnnotation("Ljava/lang/invoke/LambdaForm$Hidden;", true);

to add the annotation, but I missed this on the first read through. Reading this backwards to understand the flow: When generating the runtime wrappers used to invoke the method references, the generation code annotates the apply method with a java.lang.invoke.LambdaForm$Hidden annotation, and the JVM code (which fills in stack traces to implement the Java level fillInStackTrace function) checks for this annotation, and skips over frames for methods with this annotation.

One more thing …

The sharp eyed will have noticed that the JVM code in javaClasses.cpp above has a skip_hidden check also, which turns out to be set from ShowHiddenFrames, which is mentioned in:

diagnostic(bool, ShowHiddenFrames, false,                                 \
        "show method handle implementation frames (usually hidden)")      \

Reading some documentation in this file led me to java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal which shows a wealth of JVM diagnostic and internal stuff, including:

bool ShowHiddenFrames                          = false                               {diagnostic}

and thus, finally to:

:; java -XX:+UnlockDiagnosticVMOptions -XX:+ShowHiddenFrames -cp build/libs/missing-frame.jar com.atlassian.rgates.missingframe.Main
Exception in thread "main" java.lang.NullPointerException
	at com.atlassian.rgates.missingframe.Main$Lambda$1/295530567.apply(:1000004)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.Collections$2.tryAdvance(Collections.java:4717)
	at java.util.Collections$2.forEachRemaining(Collections.java:4725)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at com.atlassian.rgates.missingframe.Main.main(Main.java:17)

There’s the frame! The error is at <Unknown>:1000004 –  the JVM doesn’t know the source name (hence <Unknown>), and uses 1000000 + bci for line numbers if it has no line numbers, where bci is the byte code index of the invokevirtual we identified as the cause above.

And my co-worker?

I got back to my co-worker with my findings, and they responded Curiosity: Satisfied. It’s nice to go beyond just “use this workaround” and instead gain a full understanding of exactly why the code is behaving in the way we observe. It’s especially nice when I get to learn a few things about Java implementation details and JVM internals.

 

P.S. Our engineering teams are hiring. A lot. Just sayin’.

Exit mobile version