Read String line by line
Asked Answered
E

11

167

Given a string that isn't too long, what is the best way to read it line by line?

I know you can do:

BufferedReader reader = new BufferedReader(new StringReader(<string>));
reader.readLine();

Another way would be to take the substring on the eol:

final String eol = System.getProperty("line.separator");
output = output.substring(output.indexOf(eol + 1));

Any other maybe simpler ways of doing it? I have no problems with the above approaches, just interested to know if any of you know something that may look simpler and more efficient?

Eudiometer answered 8/7, 2009 at 7:34 Comment(1)
Well your requirement said "read it line by line", which implies you don't need all the lines in memory at one time, so I would stick with the BufferedReader or Scanner approach, whichever you feel more comfortable with ( don't know which is more efficient). This way your memory requirements are less. It will also allow you to "scale up" the application to use larger strings by potentially reading data from a file in the future.Singlephase
D
154

You can also use the split method of String:

String[] lines = myString.split(System.getProperty("line.separator"));

This gives you all lines in a handy array.

I don't know about the performance of split. It uses regular expressions.

Democratize answered 8/7, 2009 at 7:37 Comment(6)
And hope the line separator doesn't have regex characters in it. :)Hobble
"line.separator" is not reliable anyway. Just because the code is running on (e.g.) Unix, what's to stop the file from having Windows-style "\r\n" line separators? BufferedReader.readLine() and Scanner.nextLine() always check for all three styles of separator.Dallapiccola
I know this comment is really old, but ... The question doesn't mention files at all. Assuming the String was not read from a file, this approach is probably safe.Edelstein
@Edelstein This is not safe even for manually constructed Strings, if you're on windows and constructed your String with '\n' and then split on line.separator you get no lines.Rumelia
Huh? If I create a string on my linux box using line.separator and someone else reads it on windows using line.separator, it's still humped. That's not incompetent coders from doing stupid things, it's just how things (don't always) work.Prohibitive
How about latest JDK/11 API - String.lines?Shelashelagh
T
236

There is also Scanner. You can use it just like the BufferedReader:

Scanner scanner = new Scanner(myString);
while (scanner.hasNextLine()) {
  String line = scanner.nextLine();
  // process the line
}
scanner.close();

I think that this is a bit cleaner approach that both of the suggested ones.

Trophy answered 8/7, 2009 at 7:36 Comment(4)
I don't think it's a fair comparison though - String.split relies on the entire input being read into memory, which isn't always feasible (e.g. for large files).Laurentium
The input has to reside in memory, given that the input is String. The memory overhead is the array. Also, the resulting Strings reuse the same back-end character array.Trophy
Beware Scanner can produce wrong results if you scan an UTF-8 file with Unicode characters and don't specify the encoding in Scanner.It might interpret a different character as end of line. In Windows it uses its default encoding.Disjunctive
First it didn't work, because my first code was BufferedReader with try/finally, I remove it, then it works.Sedgewick
D
154

You can also use the split method of String:

String[] lines = myString.split(System.getProperty("line.separator"));

This gives you all lines in a handy array.

I don't know about the performance of split. It uses regular expressions.

Democratize answered 8/7, 2009 at 7:37 Comment(6)
And hope the line separator doesn't have regex characters in it. :)Hobble
"line.separator" is not reliable anyway. Just because the code is running on (e.g.) Unix, what's to stop the file from having Windows-style "\r\n" line separators? BufferedReader.readLine() and Scanner.nextLine() always check for all three styles of separator.Dallapiccola
I know this comment is really old, but ... The question doesn't mention files at all. Assuming the String was not read from a file, this approach is probably safe.Edelstein
@Edelstein This is not safe even for manually constructed Strings, if you're on windows and constructed your String with '\n' and then split on line.separator you get no lines.Rumelia
Huh? If I create a string on my linux box using line.separator and someone else reads it on windows using line.separator, it's still humped. That's not incompetent coders from doing stupid things, it's just how things (don't always) work.Prohibitive
How about latest JDK/11 API - String.lines?Shelashelagh
M
52

Since I was especially interested in the efficiency angle, I created a little test class (below). Outcome for 5,000,000 lines:

Comparing line breaking performance of different solutions
Testing 5000000 lines
Split (all): 14665 ms
Split (CR only): 3752 ms
Scanner: 10005
Reader: 2060

As usual, exact times may vary, but the ratio holds true however often I've run it.

Conclusion: the "simpler" and "more efficient" requirements of the OP can't be satisfied simultaneously, the split solution (in either incarnation) is simpler, but the Reader implementation beats the others hands down.

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;

/**
 * Test class for splitting a string into lines at linebreaks
 */
public class LineBreakTest {
    /** Main method: pass in desired line count as first parameter (default = 10000). */
    public static void main(String[] args) {
        int lineCount = args.length == 0 ? 10000 : Integer.parseInt(args[0]);
        System.out.println("Comparing line breaking performance of different solutions");
        System.out.printf("Testing %d lines%n", lineCount);
        String text = createText(lineCount);
        testSplitAllPlatforms(text);
        testSplitWindowsOnly(text);
        testScanner(text);
        testReader(text);
    }

    private static void testSplitAllPlatforms(String text) {
        long start = System.currentTimeMillis();
        text.split("\n\r|\r");
        System.out.printf("Split (regexp): %d%n", System.currentTimeMillis() - start);
    }

    private static void testSplitWindowsOnly(String text) {
        long start = System.currentTimeMillis();
        text.split("\n");
        System.out.printf("Split (CR only): %d%n", System.currentTimeMillis() - start);
    }

    private static void testScanner(String text) {
        long start = System.currentTimeMillis();
        List<String> result = new ArrayList<>();
        try (Scanner scanner = new Scanner(text)) {
            while (scanner.hasNextLine()) {
                result.add(scanner.nextLine());
            }
        }
        System.out.printf("Scanner: %d%n", System.currentTimeMillis() - start);
    }

    private static void testReader(String text) {
        long start = System.currentTimeMillis();
        List<String> result = new ArrayList<>();
        try (BufferedReader reader = new BufferedReader(new StringReader(text))) {
            String line = reader.readLine();
            while (line != null) {
                result.add(line);
                line = reader.readLine();
            }
        } catch (IOException exc) {
            // quit
        }
        System.out.printf("Reader: %d%n", System.currentTimeMillis() - start);
    }

    private static String createText(int lineCount) {
        StringBuilder result = new StringBuilder();
        StringBuilder lineBuilder = new StringBuilder();
        for (int i = 0; i < 20; i++) {
            lineBuilder.append("word ");
        }
        String line = lineBuilder.toString();
        for (int i = 0; i < lineCount; i++) {
            result.append(line);
            result.append("\n");
        }
        return result.toString();
    }
}
Multicellular answered 2/11, 2014 at 10:12 Comment(1)
As of Java8, the BufferedReader has a lines() function returning a Stream<String> of the lines, which you can collect into a list if you wish, or process the stream.Postaxial
E
28

Using Apache Commons IOUtils you can do this nicely via

List<String> lines = IOUtils.readLines(new StringReader(string));

It's not doing anything clever, but it's nice and compact. It'll handle streams as well, and you can get a LineIterator too if you prefer.

Enchant answered 8/7, 2009 at 8:40 Comment(2)
One drawback of this approach is that IOUtils.readlines(Reader) throws an IOException. Even though this will probably never happen with a StringReader, you'll have to catch or declare it.Grearson
There is a slight typo, it should be: List lines = IOUtils.readLines(new StringReader(string));Marin
T
27

Since Java 11, there is a new method String.lines:

/**
 * Returns a stream of lines extracted from this string,
 * separated by line terminators.
 * ...
 */
public Stream<String> lines() { ... }

Usage:

"line1\nline2\nlines3"
    .lines()
    .forEach(System.out::println);
Trollop answered 10/4, 2015 at 8:21 Comment(1)
I believe, this answer should be the accepted one, as it is native, cross-platform and the most concise of all others.Recrudesce
F
21

Solution using Java 8 features such as Stream API and Method references

new BufferedReader(new StringReader(myString))
        .lines().forEach(System.out::println);

or

public void someMethod(String myLongString) {

    new BufferedReader(new StringReader(myLongString))
            .lines().forEach(this::parseString);
}

private void parseString(String data) {
    //do something
}
Fealty answered 4/5, 2016 at 12:6 Comment(0)
C
7

You can also use:

String[] lines = someString.split("\n");

If that doesn't work try replacing \n with \r\n.

Color answered 23/7, 2012 at 18:47 Comment(3)
Hardcoding the representation of newline makes the solution platform-dependent.Piny
@Piny I would argue the same can be said about not harcoding it - if you don't hardcode it, you'll get different outcome on different platforms for the same input (i.e. with exactly same line breaks instead of platform-dependent line breaks in the input). This isn't really a yes/no and you have to think about what your input will be.Durwood
Yeah, in practice I've used and seen the method I answered with hundreds of times. It's just more straightforward to have one line that breaks your text chunks than using the Scanner class. That is, if your string isn't abnormally massive.Color
R
7

You can use the stream api and a StringReader wrapped in a BufferedReader which got a lines() stream output in java 8:

import java.util.stream.*;
import java.io.*;
class test {
    public static void main(String... a) {
        String s = "this is a \nmultiline\rstring\r\nusing different newline styles";

        new BufferedReader(new StringReader(s)).lines().forEach(
            (line) -> System.out.println("one line of the string: " + line)
        );
    }
}

Gives

one line of the string: this is a
one line of the string: multiline
one line of the string: string
one line of the string: using different newline styles

Just like in BufferedReader's readLine, the newline character(s) themselves are not included. All kinds of newline separators are supported (in the same string even).

Rumelia answered 4/5, 2016 at 11:52 Comment(1)
Didn't even knew that ! Thank's a lot .Amoebaean
T
5

Or use new try with resources clause combined with Scanner:

   try (Scanner scanner = new Scanner(value)) {
        while (scanner.hasNextLine()) {
            String line = scanner.nextLine();
            // process the line
        }
    }
Tiaratibbetts answered 22/10, 2015 at 9:59 Comment(0)
C
3

You can try the following regular expression:

\r?\n

Code:

String input = "\nab\n\n    \n\ncd\nef\n\n\n\n\n";
String[] lines = input.split("\\r?\\n", -1);
int n = 1;
for(String line : lines) {
    System.out.printf("\tLine %02d \"%s\"%n", n++, line);
}

Output:

Line 01 ""
Line 02 "ab"
Line 03 ""
Line 04 "    "
Line 05 ""
Line 06 "cd"
Line 07 "ef"
Line 08 ""
Line 09 ""
Line 10 ""
Line 11 ""
Line 12 ""
Corvine answered 22/6, 2016 at 22:46 Comment(0)
L
1

The easiest and most universal approach would be to just use the regex Linebreak matcher \R which matches Any Unicode linebreak sequence:

Pattern NEWLINE = Pattern.compile("\\R")
String lines[] = NEWLINE.split(input)

@see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html

Lyontine answered 4/3, 2020 at 5:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.