How to find out which line separator BufferedReader#readLine() used to split the line?
Asked Answered
T

9

14

I am reading a file via the BufferedReader

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String s = br.readLine();
   if (s == null) break;
   ...
}

I need to know if the lines are separated by '\n' or '\r\n' is there way I can find out ?

I don't want to open the FileInputStream so to scan it initially. Ideally I would like to ask the BufferedReader since it must know.

I am happy to override the BufferedReader to hack it but I really don't want to open the filestream twice.

Thanks,

Note: the current line separator (returned by System.getProperty("line.separator") ) can not be used as the file could have been written by another app on another operating system.

Typhon answered 24/5, 2011 at 16:10 Comment(0)
M
7

After reading the java docs (I confess to being a pythonista), it seems that there isn't a clean way to determine the line-end encoding used in a specific file.

The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String l = "";
   Char c = " ";
   while (true){
        c = br.read();
        if not c == "\n"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
        }
        if not c == "\r"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
            Char ctwo = ' '
            ctwo = br.read();
            if ctwo == "\n"{
                // do extra stuff since you know that you've got a \r\n
            }
        }
        else{
            l = l + c;
        }
   if (l == null) break;
   ...
   l = "";
}
Monadelphous answered 24/5, 2011 at 16:43 Comment(0)
A
14

To be in phase with the BufferedReader class, you may use the following method that handles \n, \r, \n\r and \r\n end line separators:

public static String retrieveLineSeparator(File file) throws IOException {
    char current;
    String lineSeparator = "";
    FileInputStream fis = new FileInputStream(file);
    try {
        while (fis.available() > 0) {
            current = (char) fis.read();
            if ((current == '\n') || (current == '\r')) {
                lineSeparator += current;
                if (fis.available() > 0) {
                    char next = (char) fis.read();
                    if ((next != current)
                            && ((next == '\r') || (next == '\n'))) {
                        lineSeparator += next;
                    }
                }
                return lineSeparator;
            }
        }
    } finally {
        if (fis!=null) {
            fis.close();
        }
    }
    return null;
}
Abbieabbot answered 11/12, 2012 at 20:32 Comment(3)
Thanks, it works great, however FileInputStream object is not closed properly.Vitek
You should add a next != current check, otherwise if the file would start with empty lines you could get \n\n or \r\r as separators.Threonine
Thanks serh.nechaev and M. Schenk I've taken into account your changesAbbieabbot
M
7

After reading the java docs (I confess to being a pythonista), it seems that there isn't a clean way to determine the line-end encoding used in a specific file.

The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String l = "";
   Char c = " ";
   while (true){
        c = br.read();
        if not c == "\n"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
        }
        if not c == "\r"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
            Char ctwo = ' '
            ctwo = br.read();
            if ctwo == "\n"{
                // do extra stuff since you know that you've got a \r\n
            }
        }
        else{
            l = l + c;
        }
   if (l == null) break;
   ...
   l = "";
}
Monadelphous answered 24/5, 2011 at 16:43 Comment(0)
K
3

BufferedReader.readLine() does not provide any means of determining what the line break was. If you need to know, you'll need to read characters in yourself and find line breaks yourself.

You may be interested in the internal LineBuffer class from Guava (as well as the public LineReader class it's used in). LineBuffer provides a callback method void handleLine(String line, String end) where end is the line break characters. You could probably base something to do what you want on that. An API might look something like public Line readLine() where Line is an object that contains both the line text and the line end.

Kayleigh answered 24/5, 2011 at 16:30 Comment(4)
@gshauger: You could say that about a whole lot of problems, which doesn't mean it isn't better to use one. In the case of LineBuffer, it's internal anyway so adding the whole library wouldn't help... he could just copy that file in though.Kayleigh
I wouldn't say that about a lot of problems...only the ones that don't need an unnecessary dependency...which is what you're recommending. Plus this isn't the first time you've unnecessarily flogged the Guava library.Kweiyang
@gshauger: When someone else has written code that will save you from having to write it yourself, sometimes it's useful to use that, particularly when you consider that little problems like this rarely exist in isolation. I happen to be very familiar with Guava and so I tend to suggest solutions using it when I believe they're easier or more appropriate than doing the extra work with just the JDK. Your apparent distaste for libraries doesn't affect the validity of my answers. (I was mainly suggesting that the OP might want to reference existing some code that can do what he wants.)Kayleigh
@gshauger: I have a distaste for writing and maintaining large amounts of code that others have already written and tested and will maintain for you and the impact of that on the "quality, scalability, deployability and usability of a properly engineered piece of software". I do agree that dependencies should be chosen carefully, but personally I believe that Guava has an extremely high power to weight ratio and that most Java projects can benefit from using it. In the end, though, it's up to the OP what they wish to do... I'm just providing an option they may not have been aware of.Kayleigh
M
2

BufferedReader does not accept FileInputStreams

No, you cannot find out the line terminator character that was used in the file being read by BufferedReader. That information is lost while reading the file.

Unfornunately all answers below are incorrect.

Edit: And yes you can always extend BufferedReader to include the additional functionality you desire.

Mannerly answered 24/5, 2011 at 16:23 Comment(0)
A
2

The answer would be You can't find out what was the line ending.

I am looking for what can cause line endings in the same funcion. After looking at the BufferedReader source code, I can saz that BufferedReader.readLine ends line on '\r' or '\n' and skips leftower '\r' or '\n'. Hardcoded, does not care about settings.

Argive answered 20/7, 2012 at 11:29 Comment(0)
I
1

If you happen to be reading this file into a Swing text component then you can just use the JTextComponent.read(...) method to load the file into the Document. Then you can use:

textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );

to get actual EOL string that was used in the file.

Imprecate answered 24/5, 2011 at 16:38 Comment(5)
@EJP That would apply in any solution.Imprecate
No it wouldn't. You can imagine an API where you read a line and then retrieve the terminator that was used for that line.Divvy
@EJP, Based on the posters comments I thought he just wanted to know if the file was created on Windows ("\r\n") or Unix ("\n"), in which case he only cared about the first line separator. If he cares about every line, then yes every line would need to be parsed.Imprecate
@camrickr agreed, but that applies to the problem, rather than to 'any solution'.Divvy
@EJP, yes I meant any solution to this question, not any solution in general.Imprecate
E
1

Maybe you could use Scanner instead.

You can pass regular expressions to Scanner#useDelimiter() to set custom delimiter.

String regex="(\r)?\n";
String filename=....;
Scanner scan = new Scanner(new FileInputStream(filename));
scan.useDelimiter(Pattern.compile(regex));
while (scan.hasNext()) {
    String str= scan.next();
    // todo
}

You could use this code below to convert BufferedReader to Scanner

 new Scanner(bufferedReader);
Empale answered 29/4, 2020 at 6:16 Comment(0)
G
0

Not sure if useful, but sometimes I need to find out the line delimiter after I've read the file already far-down the road.

In this case I use this code:

/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* @param str - the string with the mystery line delimiter.
* @return  the line delimiter for windows, {@code \r\n}, <br>
*           unix/linux {@code \n} or legacy mac {@code \r} <br>
*           if none can be identified, it falls back to unix {@code \n}
*/
public static String identifyLineDelimiter(String str) {
    if (str.matches("(?s).*(\\r\\n).*")) {     //Windows //$NON-NLS-1$
        return "\r\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
        return "\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
        return "\r"; //$NON-NLS-1$
    } else {
        return "\n";  //fallback onto '\n' if nothing matches. //$NON-NLS-1$
    }
}
Gate answered 22/7, 2014 at 14:37 Comment(0)
S
-2

If you are using groovy, you can simply do:

def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'
Stereo answered 2/6, 2014 at 15:39 Comment(1)
Only thing , I can guess is , User was asking about java , looks like so by the Tags of question.. Not sure though.Byelection

© 2022 - 2024 — McMap. All rights reserved.