Java Deep Reflection: How to Hack Integer and String

I am currently reading the book "The Pragmatic Programmer" by Andrew Hunt and David Thomas. In this book, the authors give the following task:

Which of these "impossible" things can happen?
[…]
3. In C++: a = 2; b = 3; if (a + b != 5) exit(1);
[…]

One of the correct answers is 3. In C++ there are several reasons why the condition "a + b != 5" could be met:

Operator overloading: you can have, for example, the '+' operator perform a multiplication.
Variable aliasing: b is an alias for a, so the assignment b = 3 also sets a to 3, making the sum 6.

Since neither of these is available in Java, I was wondering: Can I write the same code in Java so that the condition is met? The answer is yes. You will learn how this is possible in this article.

You can find the article's code examples in my GitHub-Repository.

2 + 3 = 5

Let's start very simply: with a main function, in which two primitive ints, a and b, are declared, followed by the code we want to hack:

import static java.lang.System.exit;

public class ImpossibleThings1 {
  public static void main(String[] args) {
    int a, b;
    a = 2; b = 3; if (a + b != 5) exit(1);
  }
}Code language: Java (java)

Of course, in this example, a + b = 5, so that the program ends regularly, i.e., with exit code 0.

2 + 3 = 6: Deep Reflection with Integer

It does not need too many changes to make the condition true (that 2 + 3 is not 5) and the program end with error code 1:

public class ImpossibleThings2 {
  static {
    try {
      Field VALUE = Integer.class.getDeclaredField("value");
      VALUE.setAccessible(true);
      VALUE.set(2, 3);
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }

  public static void main(String[] args) {
    Integer a, b;
    a = 2; b = 3; if (a + b != 5) exit(1);
  }
}Code language: Java (java)

Here is the proof:

Screenshot showing the output "exit code 1"

What have we done? We use Integer instead of int and make use of extensive autoboxing and unboxing. In the next section, I will describe what exactly is going on here.

Auto(un)boxing uncovered

In the following example, I have replaced autoboxing and unboxing with explicit boxing and unboxing. This makes clearer what happens. The changes are marked yellow (to facilitate this, I had to use a raw text box here).

public class ImpossibleThings3 {
  static {
    try {
      Field VALUE = Integer.class.getDeclaredField("value");
      VALUE.setAccessible(true);
      Integer two = Integer.valueOf(2);
      VALUE.set(two, 3);
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }

  public static void main(String[] args) {
    Integer a, b;
    a = Integer.valueOf(2);
    b = Integer.valueOf(3);
    if (a.intValue() + b.intValue() != 5) exit(1);
  }
}

Integer.valueOf() returns cached integer instances for the values -128 to 127.*

An Integer object stores the actual value in a private field called value. You can retrieve that value via intValue().

In the static initializer, we get the cached Integer object for the number 2. Using deep reflection, we set its value to the number 3. Since value is a private field, we must first allow access to it with Field.setAccessible(true).

If we would now print this object with System.out.println(two), we would see this "3".

In the main method, a = 2 is boxed to a = Integer.valueOf(2), which, in turn, returns the same cached Integer instance as two, whose value is now 3. b is also 3, so a + b gives 6, which is known to be unequal to 5 (unless the 5 has also been "hacked" … which, as far as I know, is not possible with an int primitive).

(*This behavior is not guaranteed, but in practice, it is so. You can increase the cached integer range with -XX:AutoBoxCacheMax.)

Deep Reflection with Strings

You can do the same with Strings. The following examples work with Java 9 or higher. An adaptation for older versions follows below.

public class ImpossibleThings4 {
  static {
    try {
      Field VALUE = String.class.getDeclaredField("value");
      VALUE.setAccessible(true);
      VALUE.set("Hello world", "You have been hacked".getBytes());
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }

  public static void main(String[] args) {
    System.out.println("Hello world");
  }
}Code language: Java (java)

The output of this program is "You have been hacked". Here's the proof:

Screenshot showing the output of "You have been hacked"

However, it is not always as simple as it seems in this example. To what extent we can manipulate strings with deep reflection depends on three factors:

whether Strings exist as constants or are created at runtime,
whether Strings contain special characters that cannot be encoded as Latin-1,
which Java version we use.

Strings must be constants

First of all, Strings must be defined as constants. Only constants, if they are the same, are replaced by the same object reference.

The following still works:

public class ImpossibleThings5 {
  static { ... }

  public static void main(String[] args) {
    System.out.println("Hello" + " " + "world");
  }
}Code language: Java (java)

Here the compiler already concatenates the three parts into a single String – so that, at runtime, this is the same String as the one whose value content we change.

The following, however, does not work:

public class ImpossibleThings6 {
  static { ... }

  public static void main(String[] args) {
    System.out.println("Hello " + getName());
  }

  private static String getName() {
    return "world";
  }
}Code language: Java (java)

Here, "Hello " and "world" are concatenated only at runtime. The concatenation creates a new String object with value containing "Hello world".

Comparing object identities

It gets clearer if we look at the identities of the String objects. Looking at the first String example again:

public class ImpossibleThings4WithIdentity {
  static {
    try {
      Field VALUE = String.class.getDeclaredField("value");
      VALUE.setAccessible(true);
      String s1 = "Hello world";
      System.out.println("identityHashCode(s1) = " + System.identityHashCode(s1));
      VALUE.set(s1, "You have been hacked".getBytes());
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }

  public static void main(String[] args) {
    String s2 = "Hello world";
    System.out.println("identityHashCode(s2) = " + System.identityHashCode(s2));
    System.out.println(s2);
  }
}Code language: Java (java)

The output is:

Screenshot displaying the object identities

The String object s1, which we modify in the static initializer, is, therefore, identical* to the String object s2, which we print out in the main method. So we print out precisely the String that we have changed using deep reflection.

Let's check the same for the String we concatenated from String constants in the source code:

public class ImpossibleThings5WithIdentity {
  static { ... }

  public static void main(String[] args) {
    String s2 = "Hello" + " " + "world";
    System.out.println("identityHashCode(s2) = " + System.identityHashCode(s2));
    System.out.println(s2);
  }
}Code language: Java (java)

We see the following output:

Also in this example, the String objects s1 and s2 are identical.*

And finally, we check the object identities in the third variant, where the getName() method returns part of the String:

public class ImpossibleThings6WithIdentity {
  static { ... }

  public static void main(String[] args) {
    String s2 = "Hello " + getName();
    System.out.println("identityHashCode(s2) = " + System.identityHashCode(s2));
    System.out.println(s2);
  }

  private static String getName() {
    return "world";
  }
}Code language: Java (java)

Here is the output of the third test:

So we have confirmed that s1 and s2 are two different String objects. Therefore, changing s1 by reflection does not affect s2.

(* Two non-identical objects could also have the same identity hash code. We would still have to check the identity with s1 == s2. However, the probability is minimal, so for our examples, comparing hash codes is sufficient.)

String representation: Latin-1 vs. UTF-16

If we slightly modify the first String example, we get a rather unexpected result. Let's change the String we're going to print from "Hello world" to "Hello world ✓" (with a checkmark at the end):

public class ImpossibleThings7 {
  static {
    try {
      Field VALUE = String.class.getDeclaredField("value");
      VALUE.setAccessible(true);
      VALUE.set("Hello world ✓", "You have been hacked".getBytes());
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }

  public static void main(String[] args) {
    System.out.println("Hello world ✓");
  }
}Code language: Java (java)

What will the code print out now? What do you think? (We are still at Java 9 or higher.)

"Hello world ✓"
"You have been hacked"
"You have been hacked ✓"
"潙⁵慨敶戠敥⁮慨正摥"

You can find the answer in the following screenshot:

How can this be explained?

For an explanation, we have to look at the internal representation of a String. Since Java 9, a String's internal representation is as a byte[]. The way characters are encoded into bytes depends on whether the String contains only Latin-1-encodable characters or others as well. If the String contains only characters that can be encoded in Latin-1, exactly one byte is used per character. However, if the String also contains other characters, it is encoded as UTF-16.

This feature is called "String Compaction", is defined in JEP 254 and is activated by default. You can deactivate it with the VM option -XX:-CompactStrings – in which case Strings are always stored as UTF-16.

What does that mean for our example?

The String "Hello world" is represented by the following bytes:

48 65 6c 6c 6f 20 77 6f 72 6c 64
^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^
H  e  l  l  o     W  o  r  l  d

The string "Hello world ✓" is stored as follows:

48 00 65 00 6c 00 6c 00 6f 00 20 00 77 00 6f 00 72 00 6c 00 64 00 20 00 13 27
^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^
  H     e     l     l     o           W     o     r     l     d           ✓

(Here in little-endian format since I work on an Intel system.)

The information on how the String is encoded is stored in a field called coder. 0 stands for Latin-1, and 1 for UTF-16.

In the string "Hello world ✓" the field coder, therefore, contains the value 1 due to the UTF-16 encoding.

In the previous code example, we set the value field of the string "Hello world ✓" to "You have been hacked".getBytes(). The method getBytes() returns the bytes in the standard character encoding, which – unless otherwise defined by the system property "file.encoding" – is UTF-8 (at least since Java 1.5; before that, it was ISO-8859-1).

Since the string "You have been hacked" does not contain any special characters, its UTF-8 encoding is identical to its Latin-1 encoding, so it occupies exactly one byte per character.

The String "Hello world ✓" thus contains the following byte sequence in its value field:

59 6f 75 20 68 61 76 65 20 62 65 65 6e 20 68 61 63 6b 65 64
^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^
Y  o  u     h  a  v  e     b  e  e  n     h  a  c  k  e  d

Since the "Hello world ✓" field coder still contains a 1 (because of the initial UTF-16 encoding), the byte array is interpreted as UTF-16 – and that's what leads to the output of the Chinese characters.

Roughly speaking, we have done the following:

byte[] bytes = "You have been hacked".getBytes(StandardCharsets.UTF_8);
String string = new String(bytes, StandardCharsets.UTF_16);Code language: Java (java)

How can we solve the problem?

Pretty simple: we also have to copy the content of coder. On this occasion, we also change the copying of value so that we read the corresponding field from the String "You have been hacked" instead of calling its getBytes() method. This method has, up to now, delivered the underlying byte array purely by chance. It worked because "You have been hacked" does not contain any special characters, and the system property "file.encoding" is not set (at least for me and most likely not for you).

public class ImpossibleThings8 {
  static {
    try {
      Field VALUE = String.class.getDeclaredField("value");
      VALUE.setAccessible(true);

      Field CODER = String.class.getDeclaredField("coder");
      CODER.setAccessible(true);

      VALUE.set("Hello world ✓", VALUE.get("You have been hacked"));
      CODER.set("Hello world ✓", CODER.get("You have been hacked"));
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }

  public static void main(String[] args) {
    System.out.println("Hello world ✓");
  }
}Code language: Java (java)

Instead of Chinese characters, we now see "You have been hacked" again:

Screenshot displaying "You have been hacked" instead of Chinese characters

Specifying String constants twice is not a beautiful thing to do. We solve this by extracting the code into a method and passing the two Strings as parameters:

public class StringHacker_Java9 {
  public static void hackString(String victim, String replacement) {
    try {
      Field VALUE = String.class.getDeclaredField("value");
      VALUE.setAccessible(true);

      Field CODER = String.class.getDeclaredField("coder");
      CODER.setAccessible(true);

      VALUE.set(victim, VALUE.get(replacement));
      CODER.set(victim, CODER.get(replacement));
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }
}Code language: Gherkin (gherkin)

Next, we have to take a look at older Java versions.

String representation: byte[] vs. char[]

As mentioned in the introduction of this chapter, the examples only work with Java 9. The reason is: Up to Java 8, the value of a String was not stored in a byte[] but in a char[]. Accordingly, the field coder did not exist up to Java 8.

If we started the previous examples with Java 8,

the call VALUE.set("…".getBytes()) would throw an IllegalArgumentException: Can not set final [C field java.lang.String.value to [B.
in the last two examples (where we do not explicitly set a byte array, but copy the contents of value), the subsequent call to String.class.getDeclaredField("coder") would throw a NoSuchFieldException: coder.

We have already eliminated the IllegalArgumentException in the last two examples. And we can simply ignore the NoSuchFieldException – if the field coder does not exist, we do not need to copy it:

public class StringHacker_Java7 {
  public static void hackString(String from, String to) {
    try {
      Field VALUE = String.class.getDeclaredField("value");
      VALUE.setAccessible(true);
      VALUE.set(from, VALUE.get(to));

      // For "Compact Strings" introduced in Java 9
      try {
        Field CODER = String.class.getDeclaredField("coder");
        CODER.setAccessible(true);
        CODER.set(from, CODER.get(to));
      } catch (NoSuchFieldException e) {
        // Ignore
      }
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }
}Code language: Java (java)

Here is proof that this code also runs under Java 7:

Substrings with offset und count

If we go back further in the history of Java, we encounter another change of the String internals from Java 6 to Java 7. Up to Java 6, the value character array was reused if you created a substring with String.substring().

For this operation, the character array of the original String was transferred unchanged into the substring. And the fields offset and count of the substring indicated the section of the character array representing its content.

The goal of this logic was to reduce memory consumption.

More often, however, the opposite happened: When the original String was no longer needed, the shorter substring still held a reference to the original, then unnecessarily longer character array. For this reason, the Java developers changed the functionality of String.substring() in Java 7 so that only the required part of the character array was copied into the substring.

Therefore, to make our code run on Java 6 and lower, we also need to copy the offset and count fields.

Before Java 7, there was neither the ReflectiveOperationException, nor the possibility to catch several exception types in one catch block. That makes the catch block a bit verbose. Here is the code that also runs under 6:

public class StringHacker {
  public static void hackString(String from, String to) {
    try {
      Field VALUE = String.class.getDeclaredField("value");
      VALUE.setAccessible(true);
      VALUE.set(from, VALUE.get(to));

      // "offset" and "count" for Strings up to Java 6
      try {
        Field OFFSET = String.class.getDeclaredField("offset");
        OFFSET.setAccessible(true);
        OFFSET.setInt(from, OFFSET.getInt(to));

        Field COUNT = String.class.getDeclaredField("count");
        COUNT.setAccessible(true);
        COUNT.setInt(from, COUNT.getInt(to));
      } catch (NoSuchFieldException e) {
        // Ignore
      }

      // For "Compact Strings" introduced in Java 9
      try {
        Field CODER = String.class.getDeclaredField("coder");
        CODER.setAccessible(true);
        CODER.set(from, CODER.get(to));
      } catch (NoSuchFieldException e) {
        // Ignore
      }
    } catch (IllegalAccessException e) {
      e.printStackTrace();
    } catch (NoSuchFieldException e) {
      e.printStackTrace();
    }
  }
}Code language: Java (java)

The following screenshot shows the code running on Java 6:

Experiment "Compressed Strings" in Java 6u21

The article would not be complete if I did not briefly mention the "Compressed Strings" introduced as "experimental" in Java 6 (not to be confused with the aforementioned "Compact Strings" introduced in Java 9).

If enabled with the -XX:+UseCompressedStrings VM option, then, if a String contains only Latin-1 characters, the value field stores a byte array instead of a character array. However, this was not done in the String source code, but internally in the JVM. This optimization saved memory but was very inefficient because the byte array had to be converted to a character array for almost all string operations. In Java 7, the developers removed this feature again.

Since this optimization was done JVM-internally, our code is also working with activated "Compressed Strings" without further adjustment:

String deep reflection with Java 6 and "-XX:+UseCompressedStrings"

Conclusion

In practice, you should refrain from changing the internal values of cached Integer or String objects. This approach could have unforeseeable consequences. Such a modification affects not only your own code but also the rest of the project, including all libraries and frameworks loaded by the same classloader.

Also, you should not rely on the internal representation of a class. As shown in the String example, this can change from one Java version to the next.

Furthermore, we get an error message for the code examples from this article since Java 9:

An illegal reflective access operation has occurred
[...]
All illegal access operations will be denied in a future release

This means: We must not assume that our code will work forever and ever. In Java 14 (release candidate) and 15 (early access) however, the code still works. And since many 3rd party frameworks make use of Deep Reflection, Oracle will certainly not remove this feature in the foreseeable future.