I think what is going on is the following
All lines that contain ^L (ff) get modified to remove everything before the ^L but in addition you have the side effect in 1 that all \r (cr) also get removed. However, if cr appears before ^L nextLine() is treating that as a line too. Note how, in the output file below, the number of cr + nl is 6 in the input file and the number of cr + nl is also 6 but they're all nl, so the line with c gets preserved because it's being treated on a different line than ^L. Probably not what you want. See below.
Some observations
The source file is being generated on a system that uses \r\n to define a new line, and your program is being run on a system that does not. Because of this all occurrences of 0xd are going to be removed. This will make the two files different sizes even if there are no ^L.
But you probably overlooked #1 because vim will operate in DOS mode (recognize \r\n as a newline separator) or non-DOS mode (only \n) depending on what it reads when it opens the file and hides the fact from the user if it can. In fact to test I had to brute force in \r using ^v^m because I was editing on Linux using vim more here.
Your means to test is probably using od -x (for hex right)? But that outputs ints which is not what you want. Consider the following input file and output file. After your program runs. As viewed in vi
Input file
a
b^M
c^M^M ^L
d^L
Output file
a
b
c
Well maybe that's right, lets see what od has to say
od -x of input File
0a61 0d62 630a 0d0d 0c20 640a 0a0c
od -x of output File
0a61 0a62 0a63 0a0a 000a
Huh, what where did that null come from? But wait from the man page of od
-t type Specify the output format. type is a string containing one or more of the following kinds of type specifiers:
q a Named characters (ASCII). Control characters are displayed using the following names:
-h, -x Output hexadecimal shorts. Equivalent to -t x2.
-a Output named characters. Equivalent to -t a.
Oh, ok so instead use the -a option
od -a of input
a nl b cr nl c cr cr sp ff nl d ff nl
od -a of output
a nl b nl c nl nl nl nl
Forcing java to ignore \r
And finally, all that being said, you really have to overcome the implicit understanding of java that \r delimits a line, even contrary to the documentation. Even when explicitly setting the scanner to use a \r ignoring pattern, it still operates contrary to the documentation and you must override that again by setting the delimiter (see below). I've found the following will probably do what you want by insisting on Unix line semantics. I also added in some logic to not output a blank line.
public static void repl(File original,File file) throws IOException
{
Scanner fileScanner = new Scanner(original);
Pattern pattern1 = Pattern.compile("(?d).*");
fileScanner.useDelimiter("(?d)\\n");
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "UTF8"));
while(fileScanner.hasNext(pattern1))
{
String next = fileScanner.next(pattern1);
next = next.replaceAll("(?d)(.*\\x0C)|(\\x0D)","");
if(next.length() != 0)
{
out.write(next);
out.newLine();
}
}
out.flush();
out.close();
}
With this change, the output above changes to.
od -a of input
a nl b cr nl c cr cr sp ff nl d ff nl
od -a of output
a nl b nl
replaceAll
line? – Chincapin