Java 8 Streams: Map the same object multiple times based on different properties
Asked Answered
S

1

29

I was presented with an interesting problem by a colleague of mine and I was unable to find a neat and pretty Java 8 solution. The problem is to stream through a list of POJOs and then collect them in a map based on multiple properties - the mapping causes the POJO to occur multiple times

Imagine the following POJO:

private static class Customer {
    public String first;
    public String last;

    public Customer(String first, String last) {
        this.first = first;
        this.last = last;
    }

    public String toString() {
        return "Customer(" + first + " " + last + ")";
    }
}

Set it up as a List<Customer>:

// The list of customers
List<Customer> customers = Arrays.asList(
        new Customer("Johnny", "Puma"),
        new Customer("Super", "Mac"));

Alternative 1: Use a Map outside of the "stream" (or rather outside forEach).

// Alt 1: not pretty since the resulting map is "outside" of
// the stream. If parallel streams are used it must be
// ConcurrentHashMap
Map<String, Customer> res1 = new HashMap<>();
customers.stream().forEach(c -> {
    res1.put(c.first, c);
    res1.put(c.last, c);
});

Alternative 2: Create map entries and stream them, then flatMap them. IMO it is a bit too verbose and not so easy to read.

// Alt 2: A bit verbose and "new AbstractMap.SimpleEntry" feels as
// a "hard" dependency to AbstractMap
Map<String, Customer> res2 =
        customers.stream()
                .map(p -> {
                    Map.Entry<String, Customer> firstEntry = new AbstractMap.SimpleEntry<>(p.first, p);
                    Map.Entry<String, Customer> lastEntry = new AbstractMap.SimpleEntry<>(p.last, p);
                    return Stream.of(firstEntry, lastEntry);
                })
                .flatMap(Function.identity())
                .collect(Collectors.toMap(
                        Map.Entry::getKey, Map.Entry::getValue));

Alternative 3: This is another one that I came up with the "prettiest" code so far but it uses the three-arg version of reduce and the third parameter is a bit dodgy as found in this question: Purpose of third argument to 'reduce' function in Java 8 functional programming. Furthermore, reduce does not seem like a good fit for this problem since it is mutating and parallel streams may not work with the approach below.

// Alt 3: using reduce. Not so pretty
Map<String, Customer> res3 = customers.stream().reduce(
        new HashMap<>(),
        (m, p) -> {
            m.put(p.first, p);
            m.put(p.last, p);
            return m;
        }, (m1, m2) -> m2 /* <- NOT USED UNLESS PARALLEL */);

If the above code is printed like this:

System.out.println(res1);
System.out.println(res2);
System.out.println(res3);

The result would be:

{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}

So, now to my question: How should I, in a Java 8 orderly fashion, stream through the List<Customer> and then somehow collect it as a Map<String, Customer> where you split the whole thing as two keys (first AND last) i.e. the Customer is mapped twice. I do not want to use any 3rd party libraries, I do not want to use a map outside of the stream as in alt 1. Are there any other nice alternatives?

The full code can be found on hastebin for simple copy-paste to get the whole thing running.

Scoria answered 13/2, 2015 at 20:31 Comment(0)
V
22

I think your alternatives 2 and 3 can be re-written to be more clear:

Alternative 2:

Map<String, Customer> res2 = customers.stream()
    .flatMap(
        c -> Stream.of(c.first, c.last)
        .map(k -> new AbstractMap.SimpleImmutableEntry<>(k, c))
    ).collect(toMap(Map.Entry::getKey, Map.Entry::getValue));

Alternative 3: Your code abuses reduce by mutating the HashMap. To do mutable reduction, use collect:

Map<String, Customer> res3 = customers.stream()
    .collect(
        HashMap::new, 
        (m,c) -> {m.put(c.first, c); m.put(c.last, c);}, 
        HashMap::putAll
    );

Note that these are not identical. Alternative 2 will throw an exception if there are duplicate keys while Alternative 3 will silently overwrite the entries.

If overwriting entries in case of duplicate keys is what you want, I would personally prefer Alternative 3. It is immediately clear to me what it does. It most closely resembles the iterative solution. I would expect it to be more performant as Alternative 2 has to do a bunch of allocations per customer with all that flatmapping.

However, Alternative 2 has a huge advantage over Alternative 3 by separating the production of entries from their aggregation. This gives you a great deal of flexibility. For example, if you want to change Alternative 2 to overwrite entries on duplicate keys instead of throwing an exception, you would simply add (a,b) -> b to toMap(...). If you decide you want to collect matching entries into a list, all you would have to do is replace toMap(...) with groupingBy(...), etc.

Valenti answered 13/2, 2015 at 21:20 Comment(2)
It's not just that collect() is "the preferred way"; using reduce() for this is simply wrong. Reduce is for values; the contortions that Alternative 3 goes through violates the specification of reduce(). And if later you try and run that stream in parallel, you'll just get the wrong answer. On the other hand, using collect() as Misha shows is designed to handle this case and is safe either sequentially or in parallel.Monoicous
@BrianGoetz Thank you, Brian. I modified my answer to be more forceful just as you posted your comment.Valenti

© 2022 - 2024 — McMap. All rights reserved.