How to convert stream of character to Map<Character, Integer> [duplicate]
Asked Answered
I

9

0

I want the map to have key as Character and value as Integer, value represents the frequency of the character.

The below code won't compile because I am probably using the wrong signature for the .toMap

String s  = "zwdacb";

// Snippet to get a map
s.chars().map(i->(char)i).collect(Collectors.toMap(Character key->key,(Character k)->1,(Integer ov,Integer nv)->ov+nv));

Compiler gives this error:

Candidates for method call Collectors.toMap(Character key->key,(Character k)->1,(Integer ov,Integer nv)->ov+nv) are:  
  
Collector<Object, ?, Map<Object, Object>> toMap(Function<? super Object, ?>, Function<? super Object, ?>)   

Collector<Object, ?, Map<Object, Object>> toMap(Function<? super Object, ?>, Function<? super Object, ?>, BinaryOperator<Object>)   

Collector<T, ?, M> toMap(Function<? super T, ? extends K>, Function<? super T, ? extends U>, BinaryOperator<U>, Supplier<M>) 
Irresolute answered 19/10, 2023 at 16:36 Comment(2)
The documentation of ConcurrentHashMap actually has an example of this.Thistle
Character key->key is not proper lambda syntax.Forgetful
T
3

Use Collectors.counting, and wrap it in collectingAndThen to transform each Long value into an Integer value:

Map<Character, Integer> frequencies =
    s.chars().mapToObj(c -> (char) c).collect(
        Collectors.groupingBy(c -> c,
            Collectors.collectingAndThen(
                Collectors.counting(), Long::intValue)));
Thistle answered 19/10, 2023 at 17:19 Comment(6)
thanks but I think this would be slowerIrresolute
@curiousengineer: why do you think that? It isn't.Wellspoken
Because you are unwrapping the Long object into Integer again vs directly being able to get to Integer. The value part of the mapIrresolute
You’re unwrapping the Long once per Map entry, not as part of the counting process. Any effect on performance is negligible.Thistle
@Irresolute your approach is boxing every intermediate result into an Integer whereas this answer’s solution only boxes the final counts. The performance differences, if you’re able to measure them, will depend on the actual input string. If performance really matters for character frequency, you wouldn’t use a Map<Character, Integer> for this task in the first place.Hashish
@curiousengineer, you're correct, this would be slower.Ophicleide
O
2

Use the groupingBy and counting methods.

String s  = "zwdacb";
Map<Character, Long> m
    = s.chars()
       .mapToObj(x -> (char) x)
       .collect(Collectors.groupingBy(x -> x, Collectors.counting()));

Output

{a=1, b=1, c=1, d=1, w=1, z=1}

Here is a relevant question and answer.
StackOverflow – How to count frequency of characters in a string?

Ophicleide answered 19/10, 2023 at 17:48 Comment(2)
I like this even more than my own answer! +1Isooctane
@CardinalSystem, I had to look it up, actually. I don't use streams often; I would have used a loop here.Ophicleide
S
2

tl;dr

Use code points, not char.

Map < Integer, Integer > frequencies =
        input
                .codePoints ( )                          // Generate an `IntStream` of `int` primitive values, one or more per character encountered in the input text. 
                .boxed ( )                               // Convert each `int` primitive into a `Integer` object.
                .collect (
                        Collectors.toMap (
                                Function.identity ( ) ,  // Code point.
                                ( Integer a ) -> 1 ,
                                Integer :: sum           // Increment the frequency count, the value for this entry in our Map. 
                        )
                );

Code point

Unfortunately, the char type has been essentially broken since Java 2, and legacy since Java 5. As a 16-bit value, char is physically incapable of representing most characters. Example: try "😷".length().

Instead, use code point integer numbers to work with individual characters.

Here is an code point savvy version of the Collectors.toMap approach seen in the Answers by WJS and by Cardinal System.

As the key in our map, we use Integer to represent each code point.

The IntStream returned by String#codePoints yields a series of int primitive values, one per code point found in the input string. We convert each code point from an int primitive to a Integer object as we need an object rather than a primitive to be a key. The int to Integer conversion is performed by the call to .boxed().

String input = "😷🦜zwdddaaaaacbb🦜";
Map < Integer, Integer > frequencies =
        input
                .codePoints ( )                
                .boxed ( )
                .collect (
                        Collectors.toMap (
                                ( Integer codePoint ) -> codePoint ,
                                ( Integer a ) -> 1 ,
                                Integer :: sum
                        )
                );

System.out.println ( "frequencies = " + frequencies );
frequencies.forEach ( ( Integer codePoint , Integer count ) -> System.out.println ( Character.toString ( codePoint ) + " = " + count ) );

Function.identity

That first argument to toMap is saying for every code point, just use that code point. A shorter way of doing that is Function.identity().

String input = "😷🦜zwdddaaaaacbb🦜";
Map < Integer, Integer > frequencies =
        input
                .codePoints ( )                          // Generate an `IntStream` of `int` primitive values, one or more per character encountered in the input text. 
                .boxed ( )                               // Convert each `int` primitive into a `Integer` object.
                .collect (
                        Collectors.toMap (
                                Function.identity ( ) ,  // Code point.
                                ( Integer a ) -> 1 ,
                                Integer :: sum           // Increment the frequency count, the value for this entry in our Map. 
                        )
                );

System.out.println ( "frequencies = " + frequencies );
frequencies.forEach ( ( Integer codePoint , Integer count ) -> System.out.println ( Character.toString ( codePoint ) + " = " + count ) );

Result

When run:

frequencies = {97=5, 98=2, 99=1, 100=3, 128567=1, 119=1, 122=1, 129436=2}
a = 5
b = 2
c = 1
d = 3
😷 = 1
w = 1
z = 1
🦜 = 2

In contrast, if we changed .codePoints() to .chars(), we get incorrect results:

frequencies = {97=5, 98=2, 99=1, 100=3, 119=1, 56887=1, 122=1, 56732=2, 55357=1, 55358=2}
a = 5
b = 2
c = 1
d = 3
w = 1
? = 1
z = 1
? = 2
? = 1
? = 2
Simony answered 19/10, 2023 at 22:0 Comment(5)
Borque, great reply, but I am unclear on the difference here between simply saying chars() vs codePoints to get the stream. Please help with that as well in your responseIrresolute
@Irresolute I appended a chunk at the end of the Answer, substituting .chars() for .codePoints(). That code will get erroneous results for most characters, for those beyond the BMP.Simony
@Irresolute As noted in my Answer, just try: "😷".length(). Compare to "😷".codePoints().count(). Avoid char.Simony
I understand now, thanks for the nuanced clarification. Many people know stuff, but are bad at explaining things. You have a knack for bothIrresolute
And now try with "🏳️‍🌈"Hashish
I
1

I would use utilize Collectors.toMap(Function, Function, BinaryOperator) which takes a mergeFunction parameter that you can use to calculate the frequencies. In order to use this approach, you need to map your int stream to a generic stream using IntStream.mapToObj(IntFunction).

Here's an example:

String input = "aabbbcdd";
Map<Character, Integer> frequencies = input.chars().mapToObj(i -> (char) i)
        .collect(Collectors.toMap(c -> c, c -> 1, (t, u) -> t + u));
System.out.println(frequencies);

Output:

{a=2, b=3, c=1, d=2}
Isooctane answered 19/10, 2023 at 16:47 Comment(0)
C
1

First: .map() will return IntStream => should use mapToObject

Second: you are missing parentheses for: Character key->key

s.chars().mapToObj(i->(char)i).collect(Collectors.toMap((Character key)->key,(Character k)->1,(Integer ov,Integer nv)->ov+nv));
Cyclic answered 19/10, 2023 at 16:54 Comment(0)
D
1

If you are open to using a third-party library, you could use Eclipse Collections CharAdapter or CodePointAdapter classes, and create either a CharBag or IntBag to count the characters by simply calling toBag on either adapter. While this answer doesn't use Streams, the advantage here, aside from being more concise, is that there is no boxing of char as Character instances or int as Integer instances. Everything remains either as char or int primitive values.

But before you continue using chars or codePoints with Strings in Java, you should read this blog from Cay Horstmann: Stop using char in Java. And Code Points.

If you're still convinced you want to use chars or codePoints, here are some examples with Eclipse Collections.

Here's a char example.

@Test
public void chars()
{
    String input = "abacab";
    CharAdapter chars = Strings.asChars(input);
    CharBag counts = chars.toBag();

    MutableCharBag expected = CharBags.mutable.empty();
    expected.addOccurrences('a', 3);
    expected.addOccurrences('b', 2);
    expected.addOccurrences('c', 1);    
    Assertions.assertEquals(expected, counts);

    CharAdapter output = Strings.toChars(counts.toSortedList().toArray());
    Assertions.assertEquals(Strings.asChars("aaabbc"), output);
}

Here's a code point example.

@Test
public void codePoints()
{
    // Using the example input from Basil Bourque's answer
    String input = "😷🦜zwdddaaaaacbb🦜";
    CodePointAdapter codePoints = Strings.asCodePoints(input);
    IntBag counts = codePoints.toBag();

    MutableIntBag expected = IntBags.mutable.empty();
    expected.addOccurrences(97, 5);
    expected.addOccurrences(98, 2);
    expected.addOccurrences(99, 1);
    expected.addOccurrences(100, 3);
    expected.addOccurrences(128567, 1);
    expected.addOccurrences(119, 1);
    expected.addOccurrences(122, 1);
    expected.addOccurrences(129436, 2);
    Assertions.assertEquals(expected, counts);

    CodePointAdapter output = Strings.toCodePoints(counts.toSortedList().toArray());
    Assertions.assertEquals(Strings.asCodePoints("aaaaabbcdddwz😷🦜🦜"), output);
}

Note: I am a committer for Eclipse Collections.

Dagmar answered 20/10, 2023 at 4:13 Comment(0)
B
1

Using plain Java:

public static Map<Character, Long> histogram(String str) {
    Map<Character, Long> map = new HashMap<>();

    for (int i = 0; i < str.length(); i++) {
        char ch = str.charAt(i);
        map.put(ch, map.getOrDefault(ch, 0L) + 1);
    }

    return Collections.unmodifiableMap(map);
}

Using Stream:

public static Map<Character, Long> histogram(String str) {
    return IntStream.range(0, str.length())
            .mapToObj(str::charAt)
            .collect(Collectors.groupingBy(Function.identity(),
                                           Collectors.counting()));
}
Boathouse answered 20/10, 2023 at 8:15 Comment(0)
E
0

Try it like this. Use toMap with a constant of 1 and then add one for each occurrence of the character.

String s = "zwdddaaaaacbb";
        Map<Character, Integer> frequencies = s.chars()
                .mapToObj(chr -> (char) chr)
                .collect(Collectors.toMap(chr -> chr, a -> 1, Integer::sum));
        
frequencies.entrySet().forEach(System.out::println);

prints

a=5
b=2
c=1
d=3
w=1
z=1

Note that if you collect to a Map<Character,Long> you can do it as follows:

Map<Character, Long> frequencies = s.chars().mapToObj(c -> (char) c)
   .collect(Collectors.groupingBy(c -> c, Collectors.counting()));

And streams are not always the most efficient solution. Here is one that uses a loop and the Map.merge method.

Map<Character, Integer> frequencies = new HashMap<>();
for (char c : s.toCharArray()) {
     frequencies.merge(Character.valueOf((char) c), 1, Integer::sum);
}
frequencies.entrySet().forEach(System.out::println);
Encyclical answered 19/10, 2023 at 17:37 Comment(1)
Yes, I also feel that many times they are just slowerIrresolute
F
0

Here is another way of producing the desired result:

Map<Character, Integer> result = Stream.of(s.split(""))
                                       .collect(groupingBy(
                                                  e -> Character.valueOf(e.charAt(0)), 
                                                  summingInt(e -> 1)
                                        ));

NOTE: Prior to Java version 8, use s.split("(?!^)") (to return a String [] of individual characters as strings).

The Stream#collect method can take different grouping collectors as parameter. The one used here is Collectors#groupingBy


This also produces the result by using the IntStream#collect method:

Map<Character, Integer> result =
    s.chars()
         .collect(
             () -> new HashMap<>(),
             (m, k) -> m.merge((char) k, 1, (v, x) -> v + x),
             (m1, m2) -> m1.putAll(m2)
          );
Fixation answered 20/10, 2023 at 6:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.