Short answer, do this:
public static String readFile( String filePath ) throws IOException
{
Reader reader = new FileReader( filePath );
StringBuilder sb = new StringBuilder();
char buffer[] = new char[16384]; // read 16k blocks
int len; // how much content was read?
while( ( len = reader.read( buffer ) ) > 0 ){
sb.append( buffer, 0, len );
}
reader.close();
return sb.toString();
}
It's very straight forward, very fast, and works well for unreasonable large textfiles (100+ MB)
Long answer:
(Code at the end)
Many times it won't matter, but this method is pretty fast and quite readable. In fact its an order of complexity faster than @Raceimation's answer -- O(n) instead of O(n^2).
I've tested six methods (from slow to fast):
- concat: reading line by line, concat with str += ... *This is alarmingly slow even for smaller files (takes ~70 seconds for a 3MB file) *
- strbuilder guessing length: StringBuilder, initialized with the files size. i'm guessing that its slow because it really tries to find such a huge chunk of linear memory.
- strbuilder with line buffer: StringBuilder, file is read line by line
- strbuffer with char[] buffer: Concat with StringBuffer, read the file in 16k blocks
- strbuilder with char[] buffer: Concat with StringBuilder, read the file in 16k blocks
- preallocate byte[filesize] buffer: Allocate a byte[] buffer the size of the file and let the java api decide how to buffer individual blocks.
Conclusion:
Preallocating the buffer entirely is the fastest on very large files, but the method isn't very versatile because the total filesize must be known ahead of time. Thats why i suggest using strBuilder with char[] buffers, its still simple and if needed easily changable to accept any input stream instead of just files. Yet its certainly fast enough for all reasonable cases.
Test Results + Code
import java.io.*;
public class Test
{
static final int N = 5;
public final static void main( String args[] ) throws IOException{
test( "1k.txt", true );
test( "10k.txt", true );
// concat with += would take ages here, so we skip it
test( "100k.txt", false );
test( "2142k.txt", false );
test( "pruned-names.csv", false );
// ah, what the heck, why not try a binary file
test( "/Users/hansi/Downloads/xcode46graphicstools6938140a.dmg", false );
}
public static void test( String file, boolean includeConcat ) throws IOException{
System.out.println( "Reading " + file + " (~" + (new File(file).length()/1024) + "Kbytes)" );
strbuilderwithchars( file );
strbuilderwithchars( file );
strbuilderwithchars( file );
tick( "Warm up... " );
if( includeConcat ){
for( int i = 0; i < N; i++ )
concat( file );
tick( "> Concat with += " );
}
else{
tick( "> Concat with += **skipped** " );
}
for( int i = 0; i < N; i++ )
strbuilderguess( file );
tick( "> StringBuilder init with length " );
for( int i = 0; i < N; i++ )
strbuilder( file );
tick( "> StringBuilder with line buffer " );
for( int i = 0; i < N; i++ )
strbuilderwithchars( file );
tick( "> StringBuilder with char[] buffer" );
for( int i = 0; i < N; i++ )
strbufferwithchars( file );
tick( "> StringBuffer with char[] buffer " );
for( int i = 0; i < N; i++ )
singleBuffer( file );
tick( "> Allocate byte[filesize] " );
System.out.println();
}
public static long now = System.currentTimeMillis();
public static void tick( String message ){
long t = System.currentTimeMillis();
System.out.println( message + ": " + ( t - now )/N + " ms" );
now = t;
}
// StringBuilder with char[] buffer
// + works if filesize is unknown
// + pretty fast
public static String strbuilderwithchars( String filePath ) throws IOException
{
Reader reader = new FileReader( filePath );
StringBuilder sb = new StringBuilder();
char buffer[] = new char[16384]; // read 16k blocks
int len; // how much content was read?
while( ( len = reader.read( buffer ) ) > 0 ){
sb.append( buffer, 0, len );
}
reader.close();
return sb.toString();
}
// StringBuffer with char[] buffer
// + works if filesize is unknown
// + faster than stringbuilder on my computer
// - should be slower than stringbuilder, which confuses me
public static String strbufferwithchars( String filePath ) throws IOException
{
Reader reader = new FileReader( filePath );
StringBuffer sb = new StringBuffer();
char buffer[] = new char[16384]; // read 16k blocks
int len; // how much content was read?
while( ( len = reader.read( buffer ) ) > 0 ){
sb.append( buffer, 0, len );
}
reader.close();
return sb.toString();
}
// StringBuilder init with length
// + works if filesize is unknown
// - not faster than any of the other methods, but more complicated
public static String strbuilderguess(String filePath) throws IOException
{
File file = new File( filePath );
BufferedReader reader = new BufferedReader(new FileReader(file));
String line;
StringBuilder sb = new StringBuilder( (int)file.length() );
while( ( line = reader.readLine() ) != null)
{
sb.append( line );
}
reader.close();
return sb.toString();
}
// StringBuilder with line buffer
// + works if filesize is unknown
// + pretty fast
// - speed may (!) vary with line length
public static String strbuilder(String filePath) throws IOException
{
BufferedReader reader = new BufferedReader(new FileReader(filePath));
String line;
StringBuilder sb = new StringBuilder();
while( ( line = reader.readLine() ) != null)
{
sb.append( line );
}
reader.close();
return sb.toString();
}
// Concat with +=
// - slow
// - slow
// - really slow
public static String concat(String filePath) throws IOException
{
BufferedReader reader = new BufferedReader(new FileReader(filePath));
String line, results = "";
int i = 0;
while( ( line = reader.readLine() ) != null)
{
results += line;
i++;
}
reader.close();
return results;
}
// Allocate byte[filesize]
// + seems to be the fastest for large files
// - only works if filesize is known in advance, so less versatile for a not significant performance gain
// + shortest code
public static String singleBuffer(String filePath ) throws IOException{
FileInputStream in = new FileInputStream( filePath );
byte buffer[] = new byte[(int) new File( filePath).length()]; // buffer for the entire file
int len = in.read( buffer );
return new String( buffer, 0, len );
}
}
/**
*** RESULTS ***
Reading 1k.txt (~31Kbytes)
Warm up... : 0 ms
> Concat with += : 37 ms
> StringBuilder init with length : 0 ms
> StringBuilder with line buffer : 0 ms
> StringBuilder with char[] buffer: 0 ms
> StringBuffer with char[] buffer : 0 ms
> Allocate byte[filesize] : 1 ms
Reading 10k.txt (~313Kbytes)
Warm up... : 0 ms
> Concat with += : 708 ms
> StringBuilder init with length : 2 ms
> StringBuilder with line buffer : 2 ms
> StringBuilder with char[] buffer: 1 ms
> StringBuffer with char[] buffer : 1 ms
> Allocate byte[filesize] : 1 ms
Reading 100k.txt (~3136Kbytes)
Warm up... : 7 ms
> Concat with += **skipped** : 0 ms
> StringBuilder init with length : 19 ms
> StringBuilder with line buffer : 21 ms
> StringBuilder with char[] buffer: 9 ms
> StringBuffer with char[] buffer : 9 ms
> Allocate byte[filesize] : 8 ms
Reading 2142k.txt (~67204Kbytes)
Warm up... : 181 ms
> Concat with += **skipped** : 0 ms
> StringBuilder init with length : 367 ms
> StringBuilder with line buffer : 372 ms
> StringBuilder with char[] buffer: 208 ms
> StringBuffer with char[] buffer : 202 ms
> Allocate byte[filesize] : 199 ms
Reading pruned-names.csv (~11200Kbytes)
Warm up... : 23 ms
> Concat with += **skipped** : 0 ms
> StringBuilder init with length : 54 ms
> StringBuilder with line buffer : 57 ms
> StringBuilder with char[] buffer: 32 ms
> StringBuffer with char[] buffer : 31 ms
> Allocate byte[filesize] : 32 ms
Reading /Users/hansi/Downloads/xcode46graphicstools6938140a.dmg (~123429Kbytes)
Warm up... : 1665 ms
> Concat with += **skipped** : 0 ms
> StringBuilder init with length : 2899 ms
> StringBuilder with line buffer : 2978 ms
> StringBuilder with char[] buffer: 2702 ms
> StringBuffer with char[] buffer : 2684 ms
> Allocate byte[filesize] : 1567 ms
**/
Ps. You might have noticed that StringBuffer is slightly faster than StringBuilder. This is a bit nonsense because the classes are the same, except StringBuilder is not synchronized. If anyone can (or) can't reproduce this... I'm most curious :)