Count all the words in a file using java Streams
Asked Answered
G

2

7

I was trying to count the number of unique words in a text file. For the sake of simplicity, my current file content is:

This is a sample file

My attempt is:

long wordCount = 
    Files.lines(Paths.get("sample.txt"))
         .map(line -> line.split("\\s+"))
         .distinct()
         .count();
System.out.println(wordCount);

This compiles and runs fine, but results in 1, while it should be 5.

Goins answered 9/1, 2019 at 6:43 Comment(1)
Possible duplicate of How to count words in a text file, java 8-styleDiscount
K
12

You are mapping each line to an array (transforming a Stream<String> to a Stream<String[]>, and then count the number of array elements (i.e. the number of lines in the file).

You should use flatMap to create a Stream<String> of all the words in the file, and after the distinct() and count() operations, you'll get the number of distinct words.

long wordCount = 
    Files.lines(Paths.get("sample.txt"))
         .flatMap(line -> Arrays.stream(line.split("\\s+")))
         .distinct()
         .count();
Krystenkrystin answered 9/1, 2019 at 6:44 Comment(1)
It might be more efficient not to scan for line breaks when you only want to count words, i.e. in Java 9: new Scanner(Paths.get("sample.txt")) .findAll("\\S+") .map(MatchResult::group) .distinct() .count(). Another advantage of this approach is that it won’t treat empty lines as words. In either case, whether you use Files.lines or Scanner.find, the resource should be closed after use in production code.Kriegspiel
A
7

You seem to be counting the lines in your file instead :

map(line -> line.split("\\s+")) // this is a Stream<String[]>

You shall further use Stream.flatMap as:

long wordCount = Files.lines(Paths.get("sample.txt"))
        .map(line -> line.split("\\s+"))
        .flatMap(Arrays::stream)
        .distinct()
        .count();
Aesculapian answered 9/1, 2019 at 6:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.