Java's String.substring()
method is one of the most used Java methods ever (at least according to Google search results). Reason enough to take a closer look at the method.
This article describes how to use substring()
– and how it works internally. There have been exciting changes in the course of Java releases. Experienced Java developers familiar with the use of the method can jump directly to the "How Does Substring Work in Java?" section.
String.substring()
The String.substring()
method returns a substring of the original string, based on a beginning index and an ending index. The best way to explain this is with an image.
In the following example, we extract a substring from position 5 to 8 from the string "HappyCoders" (counting starts at 0):
When invoking the substring() method, we specify the beginning position, which is 5, as the first parameter, and the position after the ending position, which is 9, as the second parameter:
String string = "HappyCoders";
String substring = string.substring(5,9);
System.out.println("substring = " + substring);
Code language: Java (java)
As expected, the program prints the substring "Code". The length of the substring corresponds to the ending position minus the beginning position, i.e., 9-5 = 4.
Specifying the Substring Length
As shown in the previous example, we need to pass the beginning and ending index of the substring to the substring()
method. However, sometimes we do not know the ending index but the requested substring length.
This is easily solved: we can calculate the ending index as beginning index plus length. We can extract this directly into a method like the following:
public static String substring(String string, int beginIndex, int length) {
int endIndex = beginIndex + length;
return string.substring(beginIndex, endIndex);
}
Code language: Java (java)
We can then call the method as follows:
String code = substring("HappyCoders", 5, 4);
Code language: Java (java)
We do not need to validate the parameters; the String.substring()
method does that for us.
Substring to the End
To get a substring starting from a given position to the end of the string, we can use an overloaded String.substring()
method where we only need to specify the beginning index.
The following substring example shows how we extract from the string "Do or do not. There is no try." the substring from position 14 to the end (i.e., the second sentence):
String yodaQuote = "Do or do not. There is no try.";
String thereIsNoTry = yodaQuote.substring(14);
Code language: Java (java)
Substring from the End
Another task could be to extract a substring of a given length from the end of a string. To do this, we need to calculate the beginning index as the length of the string minus the requested substring length. We should also extract this into a method:
public static String substringFromEnd(String string, int length) {
int beginIndex = string.length() - length;
return string.substring(beginIndex);
}
Code language: Java (java)
Other Substring Tasks
This section shows solutions to various string/substring tasks that must be solved using methods other than String.substring()
.
How to Find a Substring within a String
To find a particular substring within a given string, you use Java's String.indexOf()
method. Let's say we want to find the positions of "Happy" and "Code" in "HappyCoders". Here's how it works:
String string = "HappyCoders";
int happyIndex = string.indexOf("Happy");
int codeIndex = string.indexOf("Code");
Code language: Java (java)
For "Happy", indexOf()
returns 0; and for "Code", it returns 5.
If the specified substring is not found, indexOf()
returns -1.
You can find the last position of a substring with lastIndexOf()
:
String string = "The needs of the many outweigh the needs of the few, or the one.";
int lastNeedsIndex = string.lastIndexOf("needs");
Code language: Java (java)
In this example, lastIndexOf()
returns 35.
How to Check If a String Contains a Substring
To check whether a string contains a particular substring, we can use the String.contains()
method since Java 5. The following code, for example, checks whether the string "foobar" contains the string "oo":
String string = "foobar";
boolean containsOo = string.contains("oo");
Code language: Java (java)
Before Java 5, we have to use the indexOf()
method instead:
boolean containsOo = string.indexOf("oo") != -1;
Code language: Java (java)
In fact, the String.contains()
method internally calls String.indexOf()
.
How to Replace a Substring within a String
We can replace a substring in Java using the String.replace()
method. In the following example, every occurrence of the word "the" in the given sentence is replaced with "a":
String string = "the quick brown fox jumps over the lazy dog";
string = string.replace("the", "a");
Code language: Java (java)
How to Remove a Substring within a String
To remove a substring, we can replace it with the empty string "". In the following example, we delete every occurrence of "and ":
String string = "When there is no emotion, there is no motive for violence.";
string = string.replace("no ", "");
Code language: Java (java)
How Does Substring Work in Java?
String
is one of the most commonly used Java classes and often takes up a large part of the heap. No wonder it has been optimized repeatedly over time.
For example, the hash code calculation was changed several times, and Java 9 introduced Compact Strings. Since then, strings containing only Latin-1 characters are encoded with only one byte per character instead of two.
The substring
function has also been fundamentally changed:
Up to and including Java 6, a substring created by substring()
points to the same char
array as the original string. The beginning position and length of the substring are stored in the string's offset
and count
fields.
Here is the relevant part of the substring
method from Java 1 to 6:
public String substring(int beginIndex, int endIndex) {
// ... parameter validation ...
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}
Code language: Java (java)
If the substring covers the complete original string, the original string is returned. Otherwise, the following constructor is called:
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}
Code language: Java (java)
The substring and the original string thus share a char
array and differ only in the offset
and count
values that define the underlying section of the char array. The JDK developers expected this to have two advantages:
- Less memory usage on the heap
- Faster execution of the
substring
method than if the array would be copied
However, one important aspect was not considered:
If the original string is no longer needed, the garbage collector cannot clean up its char array because the substring still references it. For example, if the original string contains 10,000 characters and the substring contains only ten characters, then 9,990 characters, or just under 20 KB (one char occupies two bytes) of the heap would be wasted.
Java developers who were aware of this often employed one of the following two workarounds:
String substring = new String(string.substring(5, 9));
String substring = "" + string.substring(5, 9);
Code language: Java (java)
The string constructor used in the first line checks whether the string passed is a substring. If so, it creates a copy of the requested section. The string concatenation used in the second line only leads to the desired result as of Java 5 (see below).
Ultimately, the JDK developers weighed the pros and cons of the previous solution and decided to change the implementation in Java 7 so that multiple strings no longer share char
arrays. Instead, the substring function (or the String
constructor it calls) creates a copy of the requested section of the char
array.
In Java 7, substring()
is implemented as follows:
public String substring(int beginIndex, int endIndex) {
// ... parameter validation ...
return ((beginIndex == 0) && (endIndex == value.length)) ? this
: new String(value, beginIndex, subLen);
}
Code language: Java (java)
At first sight, this looks the same. At a second glance, you notice that count
has been replaced by value.length
, which is the length of the char
array. Since each string has its own char
array, the offset
and count
fields are no longer needed.
It also calls a different String
constructor (with value
at the beginning instead of at the end). This constructor looks like this:
public String(char value[], int offset, int count) {
// ... parameter validation ...
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
Code language: Java (java)
Therefore, a copy of the requested section of the char
array is created.
In Java 9, the substring
method has been modified to take into account the encoding used (1-byte Latin 1 vs. 2-byte UTF-16). However, the basic functionality (calling Arrays.copyOfRange
) has been retained.
String.substring Internals – Demo
I wrote a small program to demonstrate the changes of the substring
method over the Java versions. You can find the code also in this GitHub repository.
package eu.happycoders.substring;
import java.lang.reflect.Field;
public class SubstringInternalsDemo {
public static void main(String[] args) throws IllegalAccessException {
String string = "HappyCoders.eu";
String substring = string.substring(5, 9);
printDetails("original string", string);
printDetails("substring", substring);
printDetails("substring appended to empty string", "" + substring);
printDetails("substring wrapped with new string", new String(substring));
}
private static void printDetails(String name, String string)
throws IllegalAccessException {
System.out.println(name + ":");
System.out.println(" string identity : " + identity(string));
System.out.println(" string : " + string);
Object value = getPrivateField(string, "value");
System.out.println(" value[] identity : " + identity(value));
System.out.println(" value[] : " + valueToString(value));
// Java 1-6: offset + count
Integer offset = (Integer) getPrivateField(string, "offset");
if (offset != null) {
System.out.println(" offset : " + offset);
}
Integer count = (Integer) getPrivateField(string, "count");
if (count != null) {
System.out.println(" count : " + count);
}
// Java 9+: coder
Byte coder = (Byte) getPrivateField(string, "coder");
if (coder != null) {
System.out.println(" coder : " + coder);
}
System.out.println();
}
private static String identity(Object o) {
return "@" + Integer.toHexString(System.identityHashCode(o));
}
private static String valueToString(Object value) {
if (value instanceof byte[]) {
return Arrays.toString((byte[]) value);
}
if (value instanceof char[]) {
return Arrays.toString((char[]) value);
}
return value.toString();
}
private static Object getPrivateField(String string, String fieldName)
throws IllegalAccessException {
try {
Field field = String.class.getDeclaredField(fieldName);
field.setAccessible(true);
return field.get(string);
} catch (NoSuchFieldException e) {
return null;
}
}
}
Code language: Java (java)
The program shows the identities and values of the strings and substrings and their internal fields. To test the workarounds described above, the substrings are concatenated once with an empty string and once wrapped by new String(…)
.
To make the program run with versions older than Java 5, I could not use java.util.Arrays.toString()
. A replacement implementation of Arrays
is also in the GitHub repo.
If we run the program with the oldest Java version still downloadable, Java 1.2, we get the following output:
original string:
string identity : @b450fff4
string : HappyCoders.eu
value[] identity : @b454fff4
value[] : [H, a, p, p, y, C, o, d, e, r, s, ., e, u]
offset : 0
count : 14
substring:
string identity : @b42cfff4
string : Code
value[] identity : @b454fff4
value[] : [H, a, p, p, y, C, o, d, e, r, s, ., e, u]
offset : 5
count : 4
substring appended to empty string:
string identity : @b42cfff4
string : Code
value[] identity : @b454fff4
value[] : [H, a, p, p, y, C, o, d, e, r, s, ., e, u]
offset : 5
count : 4
substring wrapped with new string:
string identity : @bf34fff4
string : Code
value[] identity : @bf30fff4
value[] : [C, o, d, e]
offset : 0
count : 4
Code language: plaintext (plaintext)
We can see that string, substring, and the substring concatenated with "" all refer to the identical char
array @b454fff4. The string created with new String(…)
, on the other hand, uses a separate char
array that contains only the text "Code".
In Java 1.3 and 1.4, string concatenation leads to a different result (you can find the entire output for all Java versions in the results directory on GitHub):
...
substring appended to empty string:
string identity : @20c10f
string : Code
value[] identity : @62eec8
value[] : [C, o, d, e, , , , , , , , , , , , ]
offset : 0
count : 4
...
Code language: plaintext (plaintext)
That's because, in these versions, a StringBuffer
is used for concatenation, which is created with an initial length of 16 characters and whose toString()
method returns a string using the same char
array.
In Java 5, the result of concatenating the substring with an empty string changes again:
...
substring appended to empty string:
string identity : @1004901
string : Code
value[] identity : @1b90b39
value[] : [C, o, d, e]
offset : 0
count : 4
...
Code language: plaintext (plaintext)
As of Java 5, StringBuffer.toString()
and StringBuilder.toString
() call the String
constructor shown above, which uses Arrays.copyOfRange()
to copy only the section of the char
array that is needed.
In Java 7 and 8, the output then looks like this:
original string:
string identity : @26ffd553
string : HappyCoders.eu
value[] identity : @f74f6ef
value[] : [H, a, p, p, y, C, o, d, e, r, s, ., e, u]
substring:
string identity : @47ffccd6
string : Code
value[] identity : @6ae11a87
value[] : [C, o, d, e]
substring appended to empty string:
string identity : @6094cbe2
string : Code
value[] identity : @48d593f7
value[] : [C, o, d, e]
substring wrapped with new string:
string identity : @3de5627c
string : Code
value[] identity : @6ae11a87
value[] : [C, o, d, e]
Code language: plaintext (plaintext)
As explained above, as of Java 7, the substring returned by String.substring()
points to a separate char
array. Also, the offset
and count
fields no longer exist.
The workarounds by concatenation or invoking the String
constructor are thus no longer necessary. It is still noticeable that string concatenation creates a new string with a new char
array, while the String
constructor reuses the char
array.
Since Java 9, String
no longer contains a char
array, but a byte
array:
original string:
string identity : @4c203ea1
string : HappyCoders.eu
value[] identity : @71be98f5
value[] : [72, 97, 112, 112, 121, 67, 111, 100, 101, 114, 115, 46, 101, 117]
coder : 0
substring:
string identity : @96532d6
string : Code
value[] identity : @3796751b
value[] : [67, 111, 100, 101]
coder : 0
substring appended to empty string:
string identity : @3498ed
string : Code
value[] identity : @1a407d53
value[] : [67, 111, 100, 101]
coder : 0
substring wrapped with new string:
string identity : @3d8c7aca
string : Code
value[] identity : @3796751b
value[] : [67, 111, 100, 101]
coder : 0
Code language: plaintext (plaintext)
Analogous to the previous Java version, string concatenation creates a new byte
array, while the String
constructor reuses the existing byte array.
Summary
This article has shown how to use String.substring()
, how the method works internally, and how its functionality has changed over time.
If you liked the article, feel free to leave a comment or share the article using one of the share buttons at the end. If you want to be informed about every new article on HappyCoders.eu, click here to sign up for the HappyCoders newsletter.