Java substring() Method

Java's String.substring() method is one of the most used Java methods ever (at least according to Google search results). Reason enough to take a closer look at the method.

This article describes how to use substring() – and how it works internally. There have been exciting changes in the course of Java releases. Experienced Java developers familiar with the use of the method can jump directly to the "How Does Substring Work in Java?" section.

String.substring()

The String.substring() method returns a substring of the original string, based on a beginning index and an ending index. The best way to explain this is with an image.

In the following example, we extract a substring from position 5 to 8 from the string "HappyCoders" (counting starts at 0):

When invoking the substring() method, we specify the beginning position, which is 5, as the first parameter, and the position after the ending position, which is 9, as the second parameter:

String string = "HappyCoders";
String substring = string.substring(5,9);
System.out.println("substring = " + substring);Code language: Java (java)

As expected, the program prints the substring "Code". The length of the substring corresponds to the ending position minus the beginning position, i.e., 9-5 = 4.

Specifying the Substring Length

As shown in the previous example, we need to pass the beginning and ending index of the substring to the substring() method. However, sometimes we do not know the ending index but the requested substring length.

This is easily solved: we can calculate the ending index as beginning index plus length. We can extract this directly into a method like the following:

public static String substring(String string, int beginIndex, int length) {
  int endIndex = beginIndex + length;
  return string.substring(beginIndex, endIndex);
}Code language: Java (java)

We can then call the method as follows:

String code = substring("HappyCoders", 5, 4);Code language: Java (java)

We do not need to validate the parameters; the String.substring() method does that for us.

Substring to the End

To get a substring starting from a given position to the end of the string, we can use an overloaded String.substring() method where we only need to specify the beginning index.

The following substring example shows how we extract from the string "Do or do not. There is no try." the substring from position 14 to the end (i.e., the second sentence):

String yodaQuote = "Do or do not. There is no try.";
String thereIsNoTry = yodaQuote.substring(14);Code language: Java (java)

Substring from the End

Another task could be to extract a substring of a given length from the end of a string. To do this, we need to calculate the beginning index as the length of the string minus the requested substring length. We should also extract this into a method:

public static String substringFromEnd(String string, int length) {
  int beginIndex = string.length() - length;
  return string.substring(beginIndex);
}Code language: Java (java)

Other Substring Tasks

This section shows solutions to various string/substring tasks that must be solved using methods other than String.substring().

How to Find a Substring within a String

To find a particular substring within a given string, you use Java's String.indexOf() method. Let's say we want to find the positions of "Happy" and "Code" in "HappyCoders". Here's how it works:

String string = "HappyCoders";
int happyIndex = string.indexOf("Happy");
int codeIndex = string.indexOf("Code");Code language: Java (java)

For "Happy", indexOf() returns 0; and for "Code", it returns 5.

If the specified substring is not found, indexOf() returns -1.

You can find the last position of a substring with lastIndexOf():

String string = "The needs of the many outweigh the needs of the few, or the one.";
int lastNeedsIndex = string.lastIndexOf("needs");Code language: Java (java)

In this example, lastIndexOf() returns 35.

How to Check If a String Contains a Substring

To check whether a string contains a particular substring, we can use the String.contains() method since Java 5. The following code, for example, checks whether the string "foobar" contains the string "oo":

String string = "foobar";
boolean containsOo = string.contains("oo");Code language: Java (java)

Before Java 5, we have to use the indexOf() method instead:

boolean containsOo = string.indexOf("oo") != -1;Code language: Java (java)

In fact, the String.contains() method internally calls String.indexOf().

How to Replace a Substring within a String

We can replace a substring in Java using the String.replace() method. In the following example, every occurrence of the word "the" in the given sentence is replaced with "a":

String string = "the quick brown fox jumps over the lazy dog";
string = string.replace("the", "a");Code language: Java (java)

How to Remove a Substring within a String

To remove a substring, we can replace it with the empty string "". In the following example, we delete every occurrence of "and ":

String string = "When there is no emotion, there is no motive for violence.";
string = string.replace("no ", "");Code language: Java (java)

How Does Substring Work in Java?

String is one of the most commonly used Java classes and often takes up a large part of the heap. No wonder it has been optimized repeatedly over time.

For example, the hash code calculation was changed several times, and Java 9 introduced Compact Strings. Since then, strings containing only Latin-1 characters are encoded with only one byte per character instead of two.

The substring function has also been fundamentally changed:

Up to and including Java 6, a substring created by substring() points to the same char array as the original string. The beginning position and length of the substring are stored in the string's offset and count fields.

Here is the relevant part of the substring method from Java 1 to 6:

public String substring(int beginIndex, int endIndex) {
    // ... parameter validation ... 
    return ((beginIndex == 0) && (endIndex == count)) ? this :
        new String(offset + beginIndex, endIndex - beginIndex, value);
}Code language: Java (java)

If the substring covers the complete original string, the original string is returned. Otherwise, the following constructor is called:

String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
}Code language: Java (java)

The substring and the original string thus share a char array and differ only in the offset and count values that define the underlying section of the char array. The JDK developers expected this to have two advantages:

Less memory usage on the heap
Faster execution of the substring method than if the array would be copied

However, one important aspect was not considered:

If the original string is no longer needed, the garbage collector cannot clean up its char array because the substring still references it. For example, if the original string contains 10,000 characters and the substring contains only ten characters, then 9,990 characters, or just under 20 KB (one char occupies two bytes) of the heap would be wasted.

Java developers who were aware of this often employed one of the following two workarounds:

String substring = new String(string.substring(5, 9));
String substring = "" + string.substring(5, 9);Code language: Java (java)

The string constructor used in the first line checks whether the string passed is a substring. If so, it creates a copy of the requested section. The string concatenation used in the second line only leads to the desired result as of Java 5 (see below).

Ultimately, the JDK developers weighed the pros and cons of the previous solution and decided to change the implementation in Java 7 so that multiple strings no longer share char arrays. Instead, the substring function (or the String constructor it calls) creates a copy of the requested section of the char array.

In Java 7, substring() is implemented as follows:

public String substring(int beginIndex, int endIndex) {
    // ... parameter validation ...
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
        : new String(value, beginIndex, subLen);
}Code language: Java (java)

At first sight, this looks the same. At a second glance, you notice that count has been replaced by value.length, which is the length of the char array. Since each string has its own char array, the offset and count fields are no longer needed.

It also calls a different String constructor (with value at the beginning instead of at the end). This constructor looks like this:

public String(char value[], int offset, int count) {
    // ... parameter validation ... 
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}Code language: Java (java)

Therefore, a copy of the requested section of the char array is created.

In Java 9, the substring method has been modified to take into account the encoding used (1-byte Latin 1 vs. 2-byte UTF-16). However, the basic functionality (calling Arrays.copyOfRange) has been retained.

String.substring Internals – Demo

I wrote a small program to demonstrate the changes of the substring method over the Java versions. You can find the code also in this GitHub repository.

package eu.happycoders.substring;

import java.lang.reflect.Field;

public class SubstringInternalsDemo {
  public static void main(String[] args) throws IllegalAccessException {
    String string = "HappyCoders.eu";
    String substring = string.substring(5, 9);

    printDetails("original string", string);
    printDetails("substring", substring);
    printDetails("substring appended to empty string", "" + substring);
    printDetails("substring wrapped with new string", new String(substring));
  }

  private static void printDetails(String name, String string)
      throws IllegalAccessException {
    System.out.println(name + ":");
    System.out.println("  string identity  : " + identity(string));
    System.out.println("  string           : " + string);

    Object value = getPrivateField(string, "value");
    System.out.println("  value[] identity : " + identity(value));
    System.out.println("  value[]          : " + valueToString(value));

    // Java 1-6: offset + count
    Integer offset = (Integer) getPrivateField(string, "offset");
    if (offset != null) {
      System.out.println("  offset           : " + offset);
    }

    Integer count = (Integer) getPrivateField(string, "count");
    if (count != null) {
      System.out.println("  count            : " + count);
    }

    // Java 9+: coder
    Byte coder = (Byte) getPrivateField(string, "coder");
    if (coder != null) {
      System.out.println("  coder            : " + coder);
    }

    System.out.println();
  }

  private static String identity(Object o) {
    return "@" + Integer.toHexString(System.identityHashCode(o));
  }

  private static String valueToString(Object value) {
    if (value instanceof byte[]) {
      return Arrays.toString((byte[]) value);
    }

    if (value instanceof char[]) {
      return Arrays.toString((char[]) value);
    }

    return value.toString();
  }

  private static Object getPrivateField(String string, String fieldName)
      throws IllegalAccessException {
    try {
      Field field = String.class.getDeclaredField(fieldName);
      field.setAccessible(true);
      return field.get(string);
    } catch (NoSuchFieldException e) {
      return null;
    }
  }
}Code language: Java (java)

The program shows the identities and values of the strings and substrings and their internal fields. To test the workarounds described above, the substrings are concatenated once with an empty string and once wrapped by new String(…).

To make the program run with versions older than Java 5, I could not use java.util.Arrays.toString(). A replacement implementation of Arrays is also in the GitHub repo.

If we run the program with the oldest Java version still downloadable, Java 1.2, we get the following output:

original string:
  string identity  : @b450fff4
  string           : HappyCoders.eu
  value[] identity : @b454fff4
  value[]          : [H, a, p, p, y, C, o, d, e, r, s, ., e, u]
  offset           : 0
  count            : 14

substring:
  string identity  : @b42cfff4
  string           : Code
  value[] identity : @b454fff4
  value[]          : [H, a, p, p, y, C, o, d, e, r, s, ., e, u]
  offset           : 5
  count            : 4

substring appended to empty string:
  string identity  : @b42cfff4
  string           : Code
  value[] identity : @b454fff4
  value[]          : [H, a, p, p, y, C, o, d, e, r, s, ., e, u]
  offset           : 5
  count            : 4

substring wrapped with new string:
  string identity  : @bf34fff4
  string           : Code
  value[] identity : @bf30fff4
  value[]          : [C, o, d, e]
  offset           : 0
  count            : 4Code language: plaintext (plaintext)

We can see that string, substring, and the substring concatenated with "" all refer to the identical char array @b454fff4. The string created with new String(…), on the other hand, uses a separate char array that contains only the text "Code".

In Java 1.3 and 1.4, string concatenation leads to a different result (you can find the entire output for all Java versions in the results directory on GitHub):

...

substring appended to empty string:
  string identity  : @20c10f
  string           : Code
  value[] identity : @62eec8
  value[]          : [C, o, d, e,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ]
  offset           : 0
  count            : 4

...Code language: plaintext (plaintext)

That's because, in these versions, a StringBuffer is used for concatenation, which is created with an initial length of 16 characters and whose toString() method returns a string using the same char array.

In Java 5, the result of concatenating the substring with an empty string changes again:

...

substring appended to empty string:
  string identity  : @1004901
  string           : Code
  value[] identity : @1b90b39
  value[]          : [C, o, d, e]
  offset           : 0
  count            : 4

...Code language: plaintext (plaintext)

As of Java 5, StringBuffer.toString() and StringBuilder.toString() call the String constructor shown above, which uses Arrays.copyOfRange() to copy only the section of the char array that is needed.

In Java 7 and 8, the output then looks like this:

original string:
  string identity  : @26ffd553
  string           : HappyCoders.eu
  value[] identity : @f74f6ef
  value[]          : [H, a, p, p, y, C, o, d, e, r, s, ., e, u]

substring:
  string identity  : @47ffccd6
  string           : Code
  value[] identity : @6ae11a87
  value[]          : [C, o, d, e]

substring appended to empty string:
  string identity  : @6094cbe2
  string           : Code
  value[] identity : @48d593f7
  value[]          : [C, o, d, e]

substring wrapped with new string:
  string identity  : @3de5627c
  string           : Code
  value[] identity : @6ae11a87
  value[]          : [C, o, d, e]Code language: plaintext (plaintext)

As explained above, as of Java 7, the substring returned by String.substring() points to a separate char array. Also, the offset and count fields no longer exist.

The workarounds by concatenation or invoking the String constructor are thus no longer necessary. It is still noticeable that string concatenation creates a new string with a new char array, while the String constructor reuses the char array.

Since Java 9, String no longer contains a char array, but a byte array:

original string:
  string identity  : @4c203ea1
  string           : HappyCoders.eu
  value[] identity : @71be98f5
  value[]          : [72, 97, 112, 112, 121, 67, 111, 100, 101, 114, 115, 46, 101, 117]
  coder            : 0

substring:
  string identity  : @96532d6
  string           : Code
  value[] identity : @3796751b
  value[]          : [67, 111, 100, 101]
  coder            : 0

substring appended to empty string:
  string identity  : @3498ed
  string           : Code
  value[] identity : @1a407d53
  value[]          : [67, 111, 100, 101]
  coder            : 0

substring wrapped with new string:
  string identity  : @3d8c7aca
  string           : Code
  value[] identity : @3796751b
  value[]          : [67, 111, 100, 101]
  coder            : 0Code language: plaintext (plaintext)

Analogous to the previous Java version, string concatenation creates a new byte array, while the String constructor reuses the existing byte array.

Summary

This article has shown how to use String.substring(), how the method works internally, and how its functionality has changed over time.