I have a very large file (several GB) in AWS S3, and I only need a small number of lines in the file which satisfy a certain condition. I don't want to load the entire file in-memory and then search for and print those few lines - the memory load for this would be too high. The right way would be to only load those lines in-memory which are needed.
As per AWS documentation to read from file:
fullObject = s3Client.getObject(new GetObjectRequest(bucketName, key));
displayTextInputStream(fullObject.getObjectContent());
private static void displayTextInputStream(InputStream input) throws IOException {
// Read the text input stream one line at a time and display each line.
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String line = null;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
System.out.println();
}
Here we are using a BufferedReader. It is not clear to me what is happening underneath here.
Are we making a network call to S3 each time we are reading a new line, and only keeping the current line in the buffer? Or is the entire file loaded in-memory and then read line-by-line by BufferedReader? Or is it somewhere in between?