How do I use System.getProperty("line.separator").toString()?
Asked Answered
S

7

21

I have a Tab-delimited String (representing a table) that is passed to my method. When I print it to the command line, it appears like a table with rows:

https://i.sstatic.net/2fAyq.gif

The command window is correctly buffered. My thinking is that there is definitely a new line character before or after each row.

My problem is that I want to split up the incoming string into individual strings representing the rows of the table. So far I have:

private static final String newLine = System.getProperty("line.separator").toString();
private static final String tab = "\t";
private static String[] rows;
...

rows = tabDelimitedTable.split(newLine);    //problem is here
    
System.out.println();
System.out.println("################### start debug ####################");

System.out.println((tabDelimitedTable.contains(newLine)) ? "True" : "False");
    
System.out.println("#################### end debug###################");
System.out.println();

output:

################### start debug ####################
False
#################### end debug###################

Obviously there is something in the string telling the OS to start a new line. Yet it apparently contains no newline characters.

Running the latest JDK on Windows XP SP3.

Any Ideas?

Shigella answered 18/8, 2010 at 21:41 Comment(2)
Why .toString()? It already is a String, unless it is null, in which case you get a NullPointerException.Swaziland
Yes, I just wanted to make sure that I was passing a string instead of a character to .split(). I should have read the documentation on the .getProperty() method, but writing .toString() was faster than opening up my browser lol.Shigella
B
29

Try

rows = tabDelimitedTable.split("[" + newLine + "]");

This should solve the regex problem.

Also not that important but return type of

System.getProperty("line.separator")

is String so no need to call toString().

Bandeau answered 18/8, 2010 at 22:6 Comment(2)
Thanks, this worked. I guess I just read the documentation for the split() method and saw that it took a String, not understanding the difference between a regex and a String.Shigella
If this is Windows and newline is "\r\n", this will in fact split between the \r and \n creating false empty strings.Araucanian
A
30

The problem

You must NOT assume that an arbitrary input text file uses the "correct" platform-specific newline separator. This seems to be the source of your problem; it has little to do with regex.

To illustrate, on the Windows platform, System.getProperty("line.separator") is "\r\n" (CR+LF). However, when you run your Java code on this platform, you may very well have to deal with an input file whose line separator is simply "\n" (LF). Maybe this file was originally created in Unix platform, and then transferred in binary (instead of text) mode to Windows. There could be many scenarios where you may run into these kinds of situations, where you must parse a text file as input which does not use the current platform's newline separator.

(Coincidentally, when a Windows text file is transferred to Unix in binary mode, many editors would display ^M which confused some people who didn't understand what was going on).

When you are producing a text file as output, you should probably prefer the platform-specific newline separator, but when you are consuming a text file as input, it's probably not safe to make the assumption that it correctly uses the platform specific newline separator.


The solution

One way to solve the problem is to use e.g. java.util.Scanner. It has a nextLine() method that can return the next line (if one exists), correctly handling any inconsistency between the platform's newline separator and the input text file.

You can also combine 2 Scanner, one to scan the file line by line, and another to scan the tokens of each line. Here's a simple usage example that breaks each line into a List<String>. The entire file therefore becomes a List<List<String>>.

This is probably a better approach than reading the entire file into one huge String and then split into lines (which are then split into parts).

    String text
        = "row1\tblah\tblah\tblah\n"
        + "row2\t1\t2\t3\t4\r\n"
        + "row3\tA\tB\tC\r"
        + "row4";

    System.out.println(text);
    //  row1    blah    blah    blah
    //  row2    1   2   3   4
    //  row3    A   B   C
    //  row4

    List<List<String>> input = new ArrayList<List<String>>();

    Scanner sc = new Scanner(text);
    while (sc.hasNextLine()) {
        Scanner lineSc = new Scanner(sc.nextLine()).useDelimiter("\t");
        List<String> line = new ArrayList<String>();
        while (lineSc.hasNext()) {
            line.add(lineSc.next());
        }
        input.add(line);
    }
    System.out.println(input);
    // [[row1, blah, blah, blah], [row2, 1, 2, 3, 4], [row3, A, B, C], [row4]]

See also

  • Effective Java 2nd Edition, Item 25: Prefer lists to arrays

Related questions

Araucanian answered 19/8, 2010 at 8:15 Comment(2)
Thank you for your time to answer. I tried one of the other solutions and it worked (it was faster than setting up the scanners in the right places). Since this is only a small portion of my java program, and since I know exactly what the input will be (it's not an arbitrary input text file), I can assume the default newline character. I've looked at the other method that returned this input string, and it uses the platform default character. Thanks for all of your help though.Shigella
I was parsing outlook pst email headers in linux and the ^M comment helped me understand the output of cat -A. Definitely didn't want the line.separator property in my case.Inquest
B
29

Try

rows = tabDelimitedTable.split("[" + newLine + "]");

This should solve the regex problem.

Also not that important but return type of

System.getProperty("line.separator")

is String so no need to call toString().

Bandeau answered 18/8, 2010 at 22:6 Comment(2)
Thanks, this worked. I guess I just read the documentation for the split() method and saw that it took a String, not understanding the difference between a regex and a String.Shigella
If this is Windows and newline is "\r\n", this will in fact split between the \r and \n creating false empty strings.Araucanian
A
2

On Windows, line.separator is a CR/LF combination (reference here).

The Java String.split() method takes a regular expression. So I think there's some confusion here.

Astragalus answered 18/8, 2010 at 21:55 Comment(0)
S
2

Try BufferedReader.readLine() instead of all this complication. It will recognize all possible line terminators.

Swaziland answered 19/8, 2010 at 9:8 Comment(0)
G
1

I think your problem is that String.split() treats its argument as a regex, and regexes treat newlines specially. You may need to explicitly create a regex object to pass to split() (there is another overload of it) and configure that regex to allow newlines by passing MULTILINE in the flags param of Pattern.compile(). Docs

Goldschmidt answered 18/8, 2010 at 21:52 Comment(3)
The MULTILINE flag only applies to when you are using the start/end flags (^ and $) in your regex.Decencies
The MULTILINE flag also causes the "." character to recognize line separators as a match.Decencies
@James: nope, according to the specs (and according to my testing) MULTILINE "(?m) does not cause the "." character to match line separators. That would be the DOTALL flag "(?s)"Freda
M
1

The other responders are correct that split() takes a regex as the argument, so you'll have to fix that first. The other problem is that you're assuming that the line break characters are the same as the system default. Depending on where the data is coming from, and where the program is running, this assumption may not be correct.

Misbehave answered 18/8, 2010 at 22:0 Comment(1)
Wow... "Responders" sounds so cool. I'm going to use that from now on.Kildare
D
1

Try this:

rows = tabDelimitedTable.split("[\\r\\n]+");

This should work regardless of what line delimiters are in the input, and will ignore blank lines.

Decencies answered 18/8, 2010 at 22:21 Comment(2)
I originally wanted this java program to run on a Mac/Linux as well. hence the System.getProperty() method.Shigella
You could still potentially have input which includes non-system-default line separators. This regex will catch all combinations regardless of the platform and input.Decencies

© 2022 - 2024 — McMap. All rights reserved.