Know your way around String Concatenation

Understanding the workings of ‘String Concatenation’ in Java.

As Java Developers we all have used ‘String Concatenation’ at some point. But do we actually have a proper understanding of what happens underneath?

As Strings are immutable in Java, once a String is created, it cannot be changed. Hence when we concatenate one String with another, a new String is created. Thus if you don’t handle String Concatenation wisely, you’ll end up with lot of unnecessary String Objects in your heap to be garbage collected. This can also raise performance concerns.

Therefore, I thought it is best to explore ‘String Concatenation’ in a little detail in order to ensure we use it properly.


Static String Concatenation

If all of the substrings used to form the final string are known at the time of compilation (without loops or conditional statements), this form of string concatenation is referred to as ‘Static String Concatenation’.

Example 1: Simple concatenation of compile-time constant strings

Let’s perform a simple concatenation of strings, which are constants at compile-time.

As you may know already, for a variable to be a compile-time constant, it needs to satisfy allof the following conditions.

  • declared as final
  • have primitive type literals or string type literals(in our case we'll be using only string literals)
  • initialized at the time of declaration
  • assigned to a compile-time constant expression

Source code:

public class StringConcatenateTest  {
  public static void main(String[] args) {
    final String str1="Hello ";
    final String str2="world";
    String str = str1 +str2;
  }
 }

Compiled Bytecode:

public class StringConcatenateTest {
  public StringConcatenateTest();
   Code:
    0: aload_0
    1: invokespecial #1        // Method java/lang/Object."<init>":()V
    4: return  
public static void main(java.lang.String[]);
   Code:
    0: ldc           #2         // String Hello
    2: astore_1
    3: ldc           #3         // String world
    5: astore_2
    6: ldc           #4         // String Hello world
    8: astore_3
    9: return
 }

Since the two variables (str1 and str2) are compile-time constants, the concatenation of those two variables becomes a compile-time constant expression. Therefore the compiler does the concatenation at compile-time (and does not wait till run-time):

String str = str1 + str2; // after in-lining the values, the string concatenation is translated by the compiler to `String str = "Hello world";

The value of a compile-time constant expression is compiled inline where it is used, i.e. if a variable is found to be a compile-time constant, the compiler replaces that variable’s name everywhere in the code with its value.

So in the above example, it directly replaces the final variable to create the string "Hello world" at compile time, which is then loaded by the ldc operation in step 6.

Remember that when you are concatenating strings which are compile-time constants, there’s no need to use a StringBuilder, the compiler takes care of it.


Example 2: Simple concatenation of string literals using the += operation

Let’s perform a simple concatenation of string literals using the += operation (without loops or conditional statements).

Source Code:

public class StringConcatenateTest {
    public static void main(String[] args) {
    String str = "Hello ";
    str += "world";
    }
}
Complie

Complied Bytecode:

public class StringConcatenateTest {
  public StringConcatenateTest();
   Code:
    0: aload_0
    1: invokespecial #1  // Method java/lang/Object."<init>":()V
    4: return  
public static void main(java.lang.String[]);
   Code:
    0: ldc           #2  // String Hello
    2: astore_1
    3: new           #3  // class java/lang/StringBuilder
    6: dup
    7: invokespecial #4  // Method java/lang/StringBuilder."<init>":()V
   10: aload_1
   11: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   14: ldc           #6  // String world
   16: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   19: invokevirtual #7  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   22: astore_1
   23: return
 }

As you can see in the bytecode, the StringBuilder is used. This means the Java compiler has optimized the generated bytecode. Here the string concatenation with the += operator is replaced by the StringBuilder:

new StringBuilder().append("Hello ").append("world");

This optimization is known as a ‘static string concatenation optimization’.

So you don't need to manually apply the StringBuilder when you are doing a static String concatenation (without any loops or conditional statements),the compiler can handle it for you.


Example 3: Simple concatenation of string literals using the + operation

Let’s perform a simple concatenation of string literals using the + operation (without loops or conditional statements).

Source Code:

public class StringConcatenateTest {
     public static void main(String[] args) {
    String str1 = "Hello ";
    String str2 = "world";
    String str = str1 +str2;
     }
}

Compiled Bytecode:

public class StringConcatenateTest {
public StringConcatenateTest();
 Code:
  0: aload_0
  1: invokespecial #1 // Method java/lang/Object."<init>":()V
  4: return  
public static void main(java.lang.String[]);
 Code:
  0: ldc           #2 // String Hello
  2: astore_1
  3: ldc           #3 // String world
  5: astore_2
  6: new           #4 // class java/lang/StringBuilder
  9: dup
 10: invokespecial #5 // Method java/lang/StringBuilder."<init>":()V
 13: aload_1
 14: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 17: aload_2
 18: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 21: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
 24: astore_3
 25: return
}

In the above piece of code, string variables are not final. So, they are not compile-time constant expressions. Thus the concatenation operation is NOT handled at compile time. The concatenation operation is therefore delayed till run-time.

Clearly it is storing "Hello " and "world" in two separate variables, and using the StringBuilder to perform the concatenation operation.


Dynamic String Concatenation

‘Dynamic String Concatenation’ refers to the concatenation of substrings whose result is known only at run-time. This is the case when substrings are appended to a string within a For loop or within a conditional statement (if, if-else, switch statements etc.)

Concatenation of String Literals inside a For loop

If you need to concatenate strings inside a For loop, you should always use the StringBuilder which needs to be applied manually.

Example 1: Use String literals inside the For loop and perform the concatenation

Let’s use string literals inside the For loop and perform a concatenation

Source Code:

public class StringConcatenateTest {
  public static void main(String[] args) {
    String str = "Say ";
    for(int i=0; i<2; i++ ){
    str += "Hello ";
    }
    str +="world.";
  }
}

Compiled Bytecode:

public class StringConcatenateTest {
public StringConcatenateTest();
 Code:
  0: aload_0
  1: invokespecial #1  // Method java/lang/Object."<init>":()V
  4: return  
public static void main(java.lang.String[]);
 Code:
  0: ldc           #2  // String Say
  2: astore_1
  3: iconst_0
  4: istore_2
  5: iload_2
  6: iconst_2
  7: if_icmpge     36
 10: new           #3  // class java/lang/StringBuilder
 13: dup
 14: invokespecial #4  // Method java/lang/StringBuilder."<init>":()V
 17: aload_1
 18: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 21: ldc           #6  // String Hello
 23: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 26: invokevirtual #7  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
 29: astore_1
 30: iinc          2, 1
 33: goto          5
 36: new           #3  // class java/lang/StringBuilder
 39: dup
 40: invokespecial #4  // Method java/lang/StringBuilder."<init>":()V
 43: aload_1
 44: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 47: ldc           #8  // String world.
 49: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 52: invokevirtual #7  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
 55: astore_1
 56: return
}

Using the += concatenation operator to concatenate substrings to a stringdefined outside the body of the loop will cause performance degradation.

Here a new instance of the StringBuilder is created per iteration. This is because optimization for static string concatenation is applied in the body of the loop but not outside it. The compiler cannot compute the
concatenating result without executing the instructions, which is not its role.

Supposing the source code associated with the bytecode of the above StringConcatenateTest class has to be displayed, it would look as follows:

String str = "Say ";
 for (int i = 0; i < 2; i++) {
     StringBuilder tmp = new StringBuilder();
     tmp.append(str);
     tmp.append("Hello ");
     str = tmp.toString();
 }
StringBuilder tmp2 = new StringBuilder();
tmp2.append(str);
tmp2.append("world.");
str = tmp2.toString();

The characters inside the str variable is copied to the StringBuilder instance before appending "Hello " and returning its string representation with the toStringmethod. The concatenation of "world" makes another copy of characters contained in the StringBuilder instance.

Thus the solution is to manually create a StringBuilder instance outside the loop.


Example 2: Create a StringBuilder outside the For loop and use it inside the loop for concatenation

Let’s initialize a StringBuilder manually outside the For loop and use it inside the loop to do the string concatenation

Source Code:

public class StringConcatenateTest {
  public static void main(String[] args) {
    StringBuilder str = new StringBuilder();
    str.append("Say ");
    for(int i=0; i<2; i++ ){    
      str.append("Hello ");
    }
    str.append("world.");
  }
}

Compiled Bytecode:

public class StringConcatenateTest {
public StringConcatenateTest();
 Code:
  0: aload_0
  1: invokespecial #1  // Method java/lang/Object."<init>":()V
  4: return  
public static void main(java.lang.String[]);
 Code:
  0: new           #2  // class java/lang/StringBuilder
  3: dup
  4: invokespecial #3  // Method java/lang/StringBuilder."<init>":()V
  7: astore_1
  8: aload_1
  9: ldc           #4  // String Say
 11: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 
 14: pop
 15: iconst_0 
 16: istore_2
 17: iload_2
 18: iconst_2
 19: if_icmpge     35
 22: aload_1
 23: ldc           #6  // String Hello
 25: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 28: pop
 29: iinc          2, 1
 32: goto          17
 35: aload_1
 36: ldc           #7  // String world.
 38: invokevirtual #5  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
 41: pop
 42: return
}

When we observe the bytecode, we see that only one StringBuilder is created which is the one we initialized manually. This is a lot better in terms of memory consumption than the previous approach of using the += operator.

The reason is that unlike a String, a StringBuilder can be mutated.

However, in cases where a mutable string is accessed by multiple threads and no external synchronization is employed, you must keep in mind to use StringBuffer rather than StringBuilder.


Key Takeawaysz

  • If you are concatenating compile-time constant strings, the final string would be resolved at compile-time.
  • Further, for static string concatenation, you do not need to use a mutable string object like a StringBuilder. The Java compiler will handle it for you.
  • For dynamic string concatenation, you need to use a mutable string object like a StringBuilder for better performance.