Difference between String trim() and strip() methods in Java 11
Asked Answered
B

4

217

Among other changes, JDK 11 introduces 6 new methods for java.lang.String class:

  • repeat(int) - Repeats the String as many times as provided by the int parameter
  • lines() - Uses a Spliterator to lazily provide lines from the source string
  • isBlank() - Indicates if the String is empty or contains only white space characters
  • stripLeading() - Removes the white space from the beginning
  • stripTrailing() - Removes the white space from the end
  • strip() - Removes the white space from both, beginning and the end of string

In particular, strip() looks very similar to trim(). As per this article strip*() methods are designed to:

The String.strip(), String.stripLeading(), and String.stripTrailing() methods trim white space [as determined by Character.isWhiteSpace()] off either the front, back, or both front and back of the targeted String.

String.trim() JavaDoc states:

/**
  * Returns a string whose value is this string, with any leading and trailing
  * whitespace removed.
  * ...
  */

Which is almost identical to the quote above.

What exactly the difference between String.trim() and String.strip() since Java 11?

Bornite answered 10/7, 2018 at 13:28 Comment(0)
B
225

In short: strip() is "Unicode-aware" evolution of trim(). Meaning trim() removes only characters <= U+0020 (space); strip() removes all Unicode whitespace characters (but not all control characters, such as \0)

CSR : JDK-8200378

Problem

String::trim has existed from early days of Java when Unicode

had not fully evolved to the standard we widely use today.

The definition of space used by String::trim is any code point less than or equal to the space code point (\u0020), commonly referred to as ASCII or ISO control characters.

Unicode-aware trimming routines should use Character::isWhitespace(int).

Additionally, developers have not been able to specifically remove indentation white space or to specifically remove trailing white space.

Solution

Introduce trimming methods that are Unicode white space aware and provide additional control of leading only or trailing only.

A common characteristic of these new methods is that they use a different (newer) definition of "whitespace" than did old methods such as String.trim(). Bug JDK-8200373.

The current JavaDoc for String::trim does not make it clear which definition of "space" is being used in the code. With additional trimming methods coming in the near future that use a different definition of space, clarification is imperative. String::trim uses the definition of space as any codepoint that is less than or equal to the space character codepoint (\u0020.) Newer trimming methods will use the definition of (white) space as any codepoint that returns true when passed to the Character::isWhitespace predicate.

The method isWhitespace(char) was added to Character with JDK 1.1, but the method isWhitespace(int) was not introduced to the Character class until JDK 1.5. The latter method (the one accepting a parameter of type int) was added to support supplementary characters. The Javadoc comments for the Character class define supplementary characters (typically modeled with int-based "code point") versus BMP characters (typically modeled with single character):

The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values ... A char value, therefore, represents Basic Multilingual Plane (BMP) code points, including the surrogate code points, or code units of the UTF-16 encoding. An int value represents all Unicode code points, including supplementary code points. ... The methods that only accept a char value cannot support supplementary characters. ... The methods that accept an int value support all Unicode characters, including supplementary characters.

OpenJDK Changeset.


Benchmark comparison between trim() and strip() - Why is String.strip() 5 times faster than String.trim() for blank string In Java 11

Bornite answered 10/7, 2018 at 13:28 Comment(6)
Interesting that symbol '\u0000' is not deleted by strip, but deleted by trim.Ribald
Why not upgrade trim() itself, instead of creating a new method? It would've worked on existing applications without any developer intervention? Or is that the very reason they decided to create new method?Ashaashamed
@Ashaashamed Because a big part of the Java ethos is to maximize backwards-compatibility. Changing the behavior of a method like String::trim would bring unwelcome surprises to existing codebases.Scauper
Is strip's universe of characters a superset of trim's universe of characters? In other words does strip strip more than trim trims?Marashio
apparently trim's set of characters are not a subset of strip's one, since symbol '\u0000' is not deleted by strip, but deleted by trim :-/ ...Telophase
@MikhailKholodkov could you please update the "In short" statement. As mentioned by CHEM_Eugene strip does not remove all control characters. The current answer might make developers believe they can 'update' their code to use strip, but in reality they might be removing control character sanitization by accident this way. Could you please change it to: "In short: trim removes only characters <= U+0020 (space); strip removes all Unicode whitespace characters (but not all control characters, such as \0)" (or similar)Drummer
B
60

Here is a unit-test that illustrates the answer by @MikhailKholodkov, using Java 11.

(Note that \u2000 is above \u0020 and not considered whitespace by trim())

public class StringTestCase {
    @Test
    public void testSame() {
        String s = "\t abc \n";

        assertEquals("abc", s.trim());
        assertEquals("abc", s.strip());
    }

    @Test
    public void testDifferent() {
        Character c = '\u2000';
        String s = c + "abc" + c;

        assertTrue(Character.isWhitespace(c));
        assertEquals(s, s.trim());
        assertEquals("abc", s.strip());
    }
}
Bufflehead answered 30/8, 2018 at 23:43 Comment(1)
I suggest you also print the output.Swingeing
C
29

In general both method removes leading and trailing spaces from string. However the difference comes when we work with unicode charaters or multilingual features.

trim() removes all leading and trailing character whose ASCII value is less than or equal to 32 (‘U+0020’ or space).

According to Unicode standards there are various space characters having ASCII value more than 32(‘U+0020’). Ex: 8193(U+2001).

To identify these space characters, new method isWhitespace(int) was added from Java 1.5 in Character class. This method uses unicode to identify space characters. You can read more about unicode space characters here.

New method strip which is added in java 11 usage this Character.isWhitespace(int) method to cover wide range of white space characters and remove them.

example

public class StringTrimVsStripTest {
    public static void main(String[] args) {
        String string = '\u2001'+"String    with    space"+ '\u2001';
        System.out.println("Before: \"" + string+"\"");
        System.out.println("After trim: \"" + string.trim()+"\"");
        System.out.println("After strip: \"" + string.strip()+"\"");
   }
}

Output

Before: "  String    with    space  "
After trim: " String    with    space "
After strip: "String    with    space"

Note: If you are running on windows machine, you may not be able to see the similar output due to limited unicode set. you can try some online compilers for testing this code.

Charlenecharleroi answered 10/7, 2020 at 14:48 Comment(1)
How is it different from the other two answers?Prostitution
D
7

An example where strip() and trim() results in different output:

String s = "test string\u205F";
String striped = s.strip();
System.out.printf("'%s'%n", striped);//'test string'

String trimmed = s.trim();
System.out.printf("'%s'%n", trimmed);//'test string '
Diorama answered 8/7, 2021 at 6:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.