Remove all whitespaces from String but keep ONE newline
Asked Answered
F

5

9

I have this input String (containg tabs, spaces, linebreaks):


        That      is a test.              
    seems to work       pretty good? working.








    Another test  again.

[Edit]: I should have provided the String for better testing as stackoverflow removes all special characters (tabs, ...)

String testContent = "\n\t\n\t\t\t\n\t\t\tDas      ist ein Test.\t\t\t  \n\tsoweit scheint das \t\tganze zu? funktionieren.\n\n\n\n\t\t\n\t\t\n\t\t\t      \n\t\t\t      \n    \t\t\t\n    \tNoch ein  Test.\n    \t\n    \t\n    \t";

And I want to reach this state:


That is a test.
seems to work pretty good? working.
Another test again.

String expectedOutput = "Das ist ein Test.\nsoweit scheint das ganze zu? funktionieren.\nNoch ein Test.\n";

Any ideas? Can this be achieved using regexes?

replaceAll("\\s+", " ") is NOT what I'm looking for. If this regex would preserve exactly 1 newline of the ones existing it would be perfect.

I have tried this but this seems suboptimal to me...:

BufferedReader bufReader = new BufferedReader(new StringReader(testContent));
String line = null;
StringBuilder newString = new StringBuilder();
while ((line = bufReader.readLine()) != null) {
    String temp = line.replaceAll("\\s+", " ");
    if (!temp.trim().equals("")) {
        newString.append(temp.trim());
        newString.append("\n");
    }
}
Feudatory answered 19/3, 2013 at 8:39 Comment(4)
i think you must make some logic for that.. you need to search for a non white space character after 1 space. and search for non space character after new line..Douglassdougy
What is the logic you want? Trimming consecutive whitespaces to 1 whitespace?Shingle
@BlackMaggie yeah that sums it up I think..Feudatory
@zvzdhk no as this doesn't remove tabs and doesn't collapse all newlines to single onesFeudatory
P
15

In a single regex (plus a small patch for tabs):

input.replaceAll("^\\s+|\\s+$|\\s*(\n)\\s*|(\\s)\\s*", "$1$2")
     .replace("\t"," ");

The regex looks daunting, but in fact decomposes nicely into these parts that are OR-ed together:

  • ^\s+ – match whitespace at the beginning;
  • \s+$ – match whitespace at the end;
  • \s*(\n)\s* – match whitespace containing a newline, and capture that newline;
  • (\s)\s* – match whitespace, capturing the first whitespace character.

The result will be a match with two capture groups, but only one of the groups may be non-empty at a time. This allows me to replace the match with "$1$2", which means "concatenate the two capture groups."

The only remaining problem is that I can't replace a tab with a space using this approach, so I fix that up with a simple non-regex character replacement.

Past answered 19/3, 2013 at 9:5 Comment(0)
K
7

In 4 steps:

text
    // 1. compress all non-newline whitespaces to single space
    .replaceAll("[\\s&&[^\\n]]+", " ")
    // 2. remove spaces from begining or end of lines
    .replaceAll("(?m)^\\s|\\s$", "")
    // 3. compress multiple newlines to single newlines
    .replaceAll("\\n+", "\n")
    // 4. remove newlines from begining or end of string
    .replaceAll("^\n|\n$", "") 
Khosrow answered 19/3, 2013 at 9:0 Comment(1)
The only problems my solution has were: leaving single space at end of line if there were any whitespaces there, and leaving single newline at begining/end if string had any trailing newlines. I just fixed it (at last I hope :))Khosrow
V
2

If I understand correctly, you simply want to replace a succession of newlines with one newline. So replace \n\n* with \n (with appropriate flags). If there is a lot of whitespace in the lines, simply remove the whitespace (^\s\s*$ with multiline mode) first, then replace the newlines.

Edit: The only issue here is that some newlines might remain here and there, so you have to be careful to first collapse spaces, then fix the empty line problem. You can trim it down further into probably a single regex, but it's easier to read with these three:

 Pattern spaces = Pattern.compile("[\t ]+");
 Pattern emptyLines = Pattern.compile("^\\s+$?", Pattern.MULTILINE);
 Pattern newlines = Pattern.compile("\\s*\\n+");
 System.out.print(
      newlines.matcher(emptyLines.matcher(spaces.matcher(
        input).replaceAll(" ")).replaceAll("")).replaceAll("\n"));
Vallonia answered 19/3, 2013 at 8:45 Comment(1)
This also works correctly in my case :) Will have to try more testdataFeudatory
B
2

Why don't you do

String[] lines = split(s,"\n")
String[] noExtraSpaces = removeSpacesInEachLine(lines)
String result = join(noExtraSpaces,"\n")

Don't forget https://softwareengineering.stackexchange.com/questions/10998/what-does-the-jamie-zawinskis-quotation-about-regular-expressions-mean

Berberine answered 19/3, 2013 at 8:48 Comment(3)
What about "xx\n \n\n yy"?Past
@MarkoTopolnik well, that depends if you need to keep the empty lines or remove them..Berberine
@Berberine I'm sorry but stackoverflow removed all special characters and converted those to spaces. Therefore I've just added Strings containing all special chars.Feudatory
J
2

First replace all new lines with one new line, then replace the spaces but not new lines, last thing, you should remove all white spaces from the beginning of the string:

String test = "      This is              a real\n\n\n\n\n\n\n\n\n test !!\n\n\n   bye";
test = test.replaceAll("\n+", "\n");
test = test.replaceAll("((?!\n+)\\s+)", " ");
test = test.replaceAll("((?!\n+)\\s+)", "");

Output:

This is a real
test !!
bye
Jahdal answered 19/3, 2013 at 8:49 Comment(1)
@MarounMaroun for me it removes all spaces. I've just added example Strings (containing the right escape characters).Feudatory

© 2022 - 2024 — McMap. All rights reserved.