I am currently reading the book "The Pragmatic Programmer" by Andrew Hunt and David Thomas. In this book, the authors give the following task:
Which of these "impossible" things can happen?
[…]
3. In C++:a = 2; b = 3; if (a + b != 5) exit(1);
[…]
One of the correct answers is 3. In C++ there are several reasons why the condition "a + b != 5
" could be met:
- Operator overloading: you can have, for example, the '+' operator perform a multiplication.
- Variable aliasing:
b
is an alias fora
, so the assignmentb = 3
also setsa
to 3, making the sum 6.
Since neither of these is available in Java, I was wondering: Can I write the same code in Java so that the condition is met? The answer is yes. You will learn how this is possible in this article.
You can find the article's code examples in my GitHub-Repository.
2 + 3 = 5
Let's start very simply: with a main
function, in which two primitive ints, a
and b
, are declared, followed by the code we want to hack:
import static java.lang.System.exit;
public class ImpossibleThings1 {
public static void main(String[] args) {
int a, b;
a = 2; b = 3; if (a + b != 5) exit(1);
}
}
Code language: Java (java)
Of course, in this example, a + b = 5
, so that the program ends regularly, i.e., with exit code 0.
2 + 3 = 6: Deep Reflection with Integer
It does not need too many changes to make the condition true (that 2 + 3 is not 5) and the program end with error code 1:
public class ImpossibleThings2 {
static {
try {
Field VALUE = Integer.class.getDeclaredField("value");
VALUE.setAccessible(true);
VALUE.set(2, 3);
} catch (ReflectiveOperationException e) {
throw new Error(e);
}
}
public static void main(String[] args) {
Integer a, b;
a = 2; b = 3; if (a + b != 5) exit(1);
}
}
Code language: Java (java)
Here is the proof:
What have we done? We use Integer
instead of int
and make use of extensive autoboxing and unboxing. In the next section, I will describe what exactly is going on here.
Auto(un)boxing uncovered
In the following example, I have replaced autoboxing and unboxing with explicit boxing and unboxing. This makes clearer what happens. The changes are marked yellow (to facilitate this, I had to use a raw text box here).
public class ImpossibleThings3 { static { try { Field VALUE = Integer.class.getDeclaredField("value"); VALUE.setAccessible(true); Integer two = Integer.valueOf(2); VALUE.set(two, 3); } catch (ReflectiveOperationException e) { throw new Error(e); } } public static void main(String[] args) { Integer a, b; a = Integer.valueOf(2); b = Integer.valueOf(3); if (a.intValue() + b.intValue() != 5) exit(1); } }
Integer.valueOf()
returns cached integer instances for the values -128 to 127.*
An Integer
object stores the actual value in a private field called value
. You can retrieve that value via intValue()
.
In the static initializer, we get the cached Integer
object for the number 2. Using deep reflection, we set its value
to the number 3. Since value
is a private field, we must first allow access to it with Field.setAccessible(true)
.
If we would now print this object with System.out.println(two)
, we would see this "3".
In the main method, a = 2
is boxed to a = Integer.valueOf(2)
, which, in turn, returns the same cached Integer instance as two
, whose value is now 3. b
is also 3, so a + b
gives 6, which is known to be unequal to 5 (unless the 5 has also been "hacked" … which, as far as I know, is not possible with an int primitive).
(*This behavior is not guaranteed, but in practice, it is so. You can increase the cached integer range with -XX:AutoBoxCacheMax
.)
Deep Reflection with Strings
You can do the same with Strings. The following examples work with Java 9 or higher. An adaptation for older versions follows below.
public class ImpossibleThings4 {
static {
try {
Field VALUE = String.class.getDeclaredField("value");
VALUE.setAccessible(true);
VALUE.set("Hello world", "You have been hacked".getBytes());
} catch (ReflectiveOperationException e) {
throw new Error(e);
}
}
public static void main(String[] args) {
System.out.println("Hello world");
}
}
Code language: Java (java)
The output of this program is "You have been hacked". Here's the proof:
However, it is not always as simple as it seems in this example. To what extent we can manipulate strings with deep reflection depends on three factors:
- whether Strings exist as constants or are created at runtime,
- whether Strings contain special characters that cannot be encoded as Latin-1,
- which Java version we use.
Strings must be constants
First of all, Strings must be defined as constants. Only constants, if they are the same, are replaced by the same object reference.
The following still works:
public class ImpossibleThings5 {
static { ... }
public static void main(String[] args) {
System.out.println("Hello" + " " + "world");
}
}
Code language: Java (java)
Here the compiler already concatenates the three parts into a single String – so that, at runtime, this is the same String as the one whose value
content we change.
The following, however, does not work:
public class ImpossibleThings6 {
static { ... }
public static void main(String[] args) {
System.out.println("Hello " + getName());
}
private static String getName() {
return "world";
}
}
Code language: Java (java)
Here, "Hello " and "world" are concatenated only at runtime. The concatenation creates a new String object with value
containing "Hello world".
Comparing object identities
It gets clearer if we look at the identities of the String objects. Looking at the first String example again:
public class ImpossibleThings4WithIdentity {
static {
try {
Field VALUE = String.class.getDeclaredField("value");
VALUE.setAccessible(true);
String s1 = "Hello world";
System.out.println("identityHashCode(s1) = " + System.identityHashCode(s1));
VALUE.set(s1, "You have been hacked".getBytes());
} catch (ReflectiveOperationException e) {
throw new Error(e);
}
}
public static void main(String[] args) {
String s2 = "Hello world";
System.out.println("identityHashCode(s2) = " + System.identityHashCode(s2));
System.out.println(s2);
}
}
Code language: Java (java)
The output is:
The String object s1
, which we modify in the static initializer, is, therefore, identical* to the String object s2
, which we print out in the main method. So we print out precisely the String that we have changed using deep reflection.
Let's check the same for the String we concatenated from String constants in the source code:
public class ImpossibleThings5WithIdentity {
static { ... }
public static void main(String[] args) {
String s2 = "Hello" + " " + "world";
System.out.println("identityHashCode(s2) = " + System.identityHashCode(s2));
System.out.println(s2);
}
}
Code language: Java (java)
We see the following output:
Also in this example, the String objects s1
and s2
are identical.*
And finally, we check the object identities in the third variant, where the getName()
method returns part of the String:
public class ImpossibleThings6WithIdentity {
static { ... }
public static void main(String[] args) {
String s2 = "Hello " + getName();
System.out.println("identityHashCode(s2) = " + System.identityHashCode(s2));
System.out.println(s2);
}
private static String getName() {
return "world";
}
}
Code language: Java (java)
Here is the output of the third test:
So we have confirmed that s1
and s2
are two different String objects. Therefore, changing s1
by reflection does not affect s2
.
(* Two non-identical objects could also have the same identity hash code. We would still have to check the identity with s1 == s2
. However, the probability is minimal, so for our examples, comparing hash codes is sufficient.)
String representation: Latin-1 vs. UTF-16
If we slightly modify the first String example, we get a rather unexpected result. Let's change the String we're going to print from "Hello world" to "Hello world ✓" (with a checkmark at the end):
public class ImpossibleThings7 {
static {
try {
Field VALUE = String.class.getDeclaredField("value");
VALUE.setAccessible(true);
VALUE.set("Hello world ✓", "You have been hacked".getBytes());
} catch (ReflectiveOperationException e) {
throw new Error(e);
}
}
public static void main(String[] args) {
System.out.println("Hello world ✓");
}
}
Code language: Java (java)
What will the code print out now? What do you think? (We are still at Java 9 or higher.)
- "Hello world ✓"
- "You have been hacked"
- "You have been hacked ✓"
- "潙⁵慨敶戠敥慨正摥"
You can find the answer in the following screenshot:
How can this be explained?
For an explanation, we have to look at the internal representation of a String. Since Java 9, a String's internal representation is as a byte[]
. The way characters are encoded into bytes depends on whether the String contains only Latin-1-encodable characters or others as well. If the String contains only characters that can be encoded in Latin-1, exactly one byte is used per character. However, if the String also contains other characters, it is encoded as UTF-16.
This feature is called "String Compaction", is defined in JEP 254 and is activated by default. You can deactivate it with the VM option -XX:-CompactStrings
– in which case Strings are always stored as UTF-16.
What does that mean for our example?
- The String "Hello world" is represented by the following bytes:
48 65 6c 6c 6f 20 77 6f 72 6c 64 ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ H e l l o W o r l d
- The string "Hello world ✓" is stored as follows:
48 00 65 00 6c 00 6c 00 6f 00 20 00 77 00 6f 00 72 00 6c 00 64 00 20 00 13 27 ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^ H e l l o W o r l d ✓
(Here in little-endian format since I work on an Intel system.)
The information on how the String is encoded is stored in a field called coder
. 0 stands for Latin-1, and 1 for UTF-16.
In the string "Hello world ✓" the field coder
, therefore, contains the value 1 due to the UTF-16 encoding.
In the previous code example, we set the value
field of the string "Hello world ✓" to "You have been hacked".getBytes()
. The method getBytes()
returns the bytes in the standard character encoding, which – unless otherwise defined by the system property "file.encoding" – is UTF-8 (at least since Java 1.5; before that, it was ISO-8859-1).
Since the string "You have been hacked" does not contain any special characters, its UTF-8 encoding is identical to its Latin-1 encoding, so it occupies exactly one byte per character.
The String "Hello world ✓" thus contains the following byte sequence in its value
field:
59 6f 75 20 68 61 76 65 20 62 65 65 6e 20 68 61 63 6b 65 64 ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ Y o u h a v e b e e n h a c k e d
Since the "Hello world ✓" field coder
still contains a 1 (because of the initial UTF-16 encoding), the byte array is interpreted as UTF-16 – and that's what leads to the output of the Chinese characters.
Roughly speaking, we have done the following:
byte[] bytes = "You have been hacked".getBytes(StandardCharsets.UTF_8);
String string = new String(bytes, StandardCharsets.UTF_16);
Code language: Java (java)
How can we solve the problem?
Pretty simple: we also have to copy the content of coder
. On this occasion, we also change the copying of value
so that we read the corresponding field from the String "You have been hacked" instead of calling its getBytes()
method. This method has, up to now, delivered the underlying byte array purely by chance. It worked because "You have been hacked" does not contain any special characters, and the system property "file.encoding" is not set (at least for me and most likely not for you).
public class ImpossibleThings8 {
static {
try {
Field VALUE = String.class.getDeclaredField("value");
VALUE.setAccessible(true);
Field CODER = String.class.getDeclaredField("coder");
CODER.setAccessible(true);
VALUE.set("Hello world ✓", VALUE.get("You have been hacked"));
CODER.set("Hello world ✓", CODER.get("You have been hacked"));
} catch (ReflectiveOperationException e) {
throw new Error(e);
}
}
public static void main(String[] args) {
System.out.println("Hello world ✓");
}
}
Code language: Java (java)
Instead of Chinese characters, we now see "You have been hacked" again:
Specifying String constants twice is not a beautiful thing to do. We solve this by extracting the code into a method and passing the two Strings as parameters:
public class StringHacker_Java9 {
public static void hackString(String victim, String replacement) {
try {
Field VALUE = String.class.getDeclaredField("value");
VALUE.setAccessible(true);
Field CODER = String.class.getDeclaredField("coder");
CODER.setAccessible(true);
VALUE.set(victim, VALUE.get(replacement));
CODER.set(victim, CODER.get(replacement));
} catch (ReflectiveOperationException e) {
throw new Error(e);
}
}
}
Code language: Gherkin (gherkin)
Next, we have to take a look at older Java versions.
String representation: byte[] vs. char[]
As mentioned in the introduction of this chapter, the examples only work with Java 9. The reason is: Up to Java 8, the value of a String was not stored in a byte[]
but in a char[]
. Accordingly, the field coder
did not exist up to Java 8.
If we started the previous examples with Java 8,
- the call
VALUE.set("…".getBytes())
would throw anIllegalArgumentException: Can not set final [C field java.lang.String.value to [B
. - in the last two examples (where we do not explicitly set a byte array, but copy the contents of
value
), the subsequent call toString.class.getDeclaredField("coder")
would throw aNoSuchFieldException: coder
.
We have already eliminated the IllegalArgumentException
in the last two examples. And we can simply ignore the NoSuchFieldException
– if the field coder
does not exist, we do not need to copy it:
public class StringHacker_Java7 {
public static void hackString(String from, String to) {
try {
Field VALUE = String.class.getDeclaredField("value");
VALUE.setAccessible(true);
VALUE.set(from, VALUE.get(to));
// For "Compact Strings" introduced in Java 9
try {
Field CODER = String.class.getDeclaredField("coder");
CODER.setAccessible(true);
CODER.set(from, CODER.get(to));
} catch (NoSuchFieldException e) {
// Ignore
}
} catch (ReflectiveOperationException e) {
throw new Error(e);
}
}
}
Code language: Java (java)
Here is proof that this code also runs under Java 7:
Substrings with offset und count
If we go back further in the history of Java, we encounter another change of the String internals from Java 6 to Java 7. Up to Java 6, the value
character array was reused if you created a substring with String.substring()
.
For this operation, the character array of the original String was transferred unchanged into the substring. And the fields offset
and count
of the substring indicated the section of the character array representing its content.
The goal of this logic was to reduce memory consumption.
More often, however, the opposite happened: When the original String was no longer needed, the shorter substring still held a reference to the original, then unnecessarily longer character array. For this reason, the Java developers changed the functionality of String.substring()
in Java 7 so that only the required part of the character array was copied into the substring.
Therefore, to make our code run on Java 6 and lower, we also need to copy the offset
and count
fields.
Before Java 7, there was neither the ReflectiveOperationException
, nor the possibility to catch several exception types in one catch
block. That makes the catch
block a bit verbose. Here is the code that also runs under 6:
public class StringHacker {
public static void hackString(String from, String to) {
try {
Field VALUE = String.class.getDeclaredField("value");
VALUE.setAccessible(true);
VALUE.set(from, VALUE.get(to));
// "offset" and "count" for Strings up to Java 6
try {
Field OFFSET = String.class.getDeclaredField("offset");
OFFSET.setAccessible(true);
OFFSET.setInt(from, OFFSET.getInt(to));
Field COUNT = String.class.getDeclaredField("count");
COUNT.setAccessible(true);
COUNT.setInt(from, COUNT.getInt(to));
} catch (NoSuchFieldException e) {
// Ignore
}
// For "Compact Strings" introduced in Java 9
try {
Field CODER = String.class.getDeclaredField("coder");
CODER.setAccessible(true);
CODER.set(from, CODER.get(to));
} catch (NoSuchFieldException e) {
// Ignore
}
} catch (IllegalAccessException e) {
e.printStackTrace();
} catch (NoSuchFieldException e) {
e.printStackTrace();
}
}
}
Code language: Java (java)
The following screenshot shows the code running on Java 6:
Experiment "Compressed Strings" in Java 6u21
The article would not be complete if I did not briefly mention the "Compressed Strings" introduced as "experimental" in Java 6 (not to be confused with the aforementioned "Compact Strings" introduced in Java 9).
If enabled with the -XX:+UseCompressedStrings
VM option, then, if a String contains only Latin-1 characters, the value
field stores a byte array instead of a character array. However, this was not done in the String source code, but internally in the JVM. This optimization saved memory but was very inefficient because the byte array had to be converted to a character array for almost all string operations. In Java 7, the developers removed this feature again.
Since this optimization was done JVM-internally, our code is also working with activated "Compressed Strings" without further adjustment:
Conclusion
In practice, you should refrain from changing the internal values of cached Integer or String objects. This approach could have unforeseeable consequences. Such a modification affects not only your own code but also the rest of the project, including all libraries and frameworks loaded by the same classloader.
Also, you should not rely on the internal representation of a class. As shown in the String example, this can change from one Java version to the next.
Furthermore, we get an error message for the code examples from this article since Java 9:
An illegal reflective access operation has occurred
[...]
All illegal access operations will be denied in a future release
This means: We must not assume that our code will work forever and ever. In Java 14 (release candidate) and 15 (early access) however, the code still works. And since many 3rd party frameworks make use of Deep Reflection, Oracle will certainly not remove this feature in the foreseeable future.
If you liked the article, feel free to share it using one of the share buttons below. If you would like to be informed when new articles are published, please click here to subscribe to the HappyCoders newsletter.