Concatenating ImmutableLists
Asked Answered
A

4

7

I have a List<ImmutableList<T>>. I want to flatten it into a single ImmutableList<T> that is a concatenation of all the internal ImmutableLists. These lists can be very long so I do not want this operation to perform a copy of all the elements. The number of ImmutableLists to flatten will be relatively small, so it is fine that lookup will be linear in the number of ImmutableLists. I would strongly prefer that the concatenation will return an Immutable collection. And I need it to return a List that can be accessed in a random location.

Is there a way to do this in Guava?

There is Iterables.concat but that returns an Iterable. To convert this into an ImmutableList again will be linear in the size of the lists IIUC.

Alberic answered 20/6, 2016 at 10:2 Comment(9)
why do you think an Iterable is mutable? all I see in it is iterator()Abase
Sorry meant that I need a list. I need to be able to access the i-th element. Updating question.Alberic
Ok, so if you had lists with counts like 11, 30, 5, 20, NewList[20] is partway through the second list?Abase
It doesn't seem to exist for some reason: github.com/google/guava/issues/1029 . Here is a comment you might be interested in: github.com/google/guava/issues/1029#issuecomment-61353929Flouncing
Why must the result be an ImmutableList? Would it be acceptable for it to be a List that cannot be mutated?Collector
This is very deliberately not in Guava. ImmutableList is intended to guarantee O(1) lookup, if it doesn't have that property it isn't an ImmutableList.Martamartaban
@LouisWasserman where is this guarantee mentioned? I don't seem to be present in the javadoc of ImmutableList. In practice when used with small number of underlying lists, performance will not be a concern.Comras
@Comras Guava needs to provide better guarantees for its users than 'in practice' and 'for certain inputs', and performance is absolutely a concern. A caller could easily construct a ConcatenatedList that uses many small lists and therefore approaches O(n) time. Hiding that behavior from the caller leads to performance problems.Collector
@Collector if retrieval performance is important, Guava could in theory make copy of supplied lists if number of lists supplied in constructor is larger than a certain constant, say 1000. The copy will be avoided in the majority of cases, but performance will be still O(1) with respect to total number of elements.Comras
C
6

By design Guava does not allow you to define your own ImmutableList implementations (if it did, there'd be no way to enforce that it was immutable). Working around this by defining your own class in the com.google.common.collect package is a terrible idea. You break the promises of the Guava library and are running firmly in "undefined behavior" territory, for no benefit.

Looking at your requirements:

  • You need to concatenate the elements of n ImmutableList instances in sub-linear time.
  • You would like the result to also be immutable.
  • You need the result to implement List, and possibly be an ImmutableList.

As you know you can get the first two bullets with a call to Iterables.concat(), but if you need an O(1) random-access List this won't cut it. There isn't a standard List implementation (in Java or Guava) that is backed by a sequence of Lists, but it's straightforward to create one yourself:

/**
 * A constant-time view into several {@link ImmutableList} instances, as if they were
   concatenated together. Since the backing lists are immutable this class is also
   immutable and therefore thread-safe.
 * 
 * More precisely, this class provides O(log n) element access where n is the number of
 * input lists. Assuming the number of lists is small relative to the total number of
 * elements this is effectively constant time.
 */
public class MultiListView<E> extends AbstractList<E> implements RandomAccess {
  private final ImmutableList<ImmutableList<E>> elements;
  private final int size;
  private final int[] startIndexes;

  private MutliListView(Iterable<ImmutableList<E>> elements) {
    this.elements = ImmutableList.copyOf(elements);
    startIndexes = new int[elements.size()];
    int currentSize = 0;
    for (int i = 0; i < this.elements.size(); i++) {
      List<E> ls = this.elements.get(i);
      startIndexes[i] = ls.size();
      currentSize += ls.size();
    }
  }

  @Override
  public E get(int index) {
    checkElementIndex(index, size);
    int location = Arrays.binarySearch(startIndexes, index);
    if (location >= 0) {
      return elements.get(location).get(0);
    }
    location = (~location) - 1;
    return elements.get(location).get(index - startIndexes[location]);
  }

  @Override
  public int size() {
    return size;
  }

  // The default iterator returned by AbstractList.iterator() calls .get()
  // which is likely slower than just concatenating the backing lists' iterators
  @Override
  public Iterator<E> iterator() {
    return Iterables.concat(elements).iterator();
  }

  public static MultiListView<E> of(Iterable<ImmutableList<E>> lists) {
    return new MultiListView<>(lists);
  }

  public static MultiListView<E> of(ImmutableList<E> ... lists) {
    return of(Arrays.asList(lists));
  }
}

This class is immutable even though it doesn't extend ImmutableList or ImmutableCollection, therefore there's no need for it to actually extend ImmutableList.

As to whether such a class should be provided by Guava; you can make your case in the associated issue, but the reason this doesn't already exist is that surprisingly few users actually need it. Be sure there isn't a reasonable way to solve your problem with an Iterable before using MultiListView.

Collector answered 22/6, 2016 at 4:4 Comment(10)
can you be more specific and expand on what is meant by "undefined behavior territory"?Comras
@Comras I mean that the authors of Guava intentionally designed ImmutableList to be un-extendable. Spoofing the com.google namespace to workaround that intent means you can no longer rely on the invariants their APIs provide. ImmutableCollection calls this out specifically: "This type cannot be subclassed outside this package (which would allow these guarantees to be violated)."Collector
well, implementation in ConcatenatedList satisfies all guarantees mentioned in ImmutableCollection javadoc and if that implementation was in Guava library itself there would be no issue. I understand that Guava authors intentionally designed the classes that way, but it is not necessarily a good choice. There is huge number of examples where you can not guarantee something but still allow subclassing in the API, for example java.util.AbstractSet can not guarantee that implementations have no duplicates, etc.Comras
@Comras AbstractSet is designed to be extended by others. ImmutableCollection and it subtypes are not. The fact that (you think) you can satisfy the necessary guarantees doesn't change anything; you're still violating the spec. In particular as Louis mentioned ConcatenatedList is not O(1) (it's close, but that's not the point). You can always define your own com.hgrey.ImmutableList type if you don't like the invariants Guava provides.Collector
not precise enough. AbstractSet is made open for extension, whereas ImmutableList is closed for extension. There is no magical difference is design of these classes (unless you can point it to me). Saying something is O(1) without mentioning in respect to what, is meaningless. Not sure where Louis got that statement from. Defining com.hgrey.ImmutableList would mean forking full library, adding ConcatenatedList to application you control is much less effort.Comras
@Comras Louis is a Guava dev. Clearly we philosophically disagree about the importance of code isolation and respecting others' APIs, so let's drop the issue. I simply would never want to work in your codebase. Clearly you wouldn't like to work in mine.Collector
AbstractList.iterator() calls .get() that's not trueAlkene
@Alkene yes it does.Collector
@Collector Just creating an iterator does not call get(). The iterator's method next() calls it as seen in your link. If all sublists are RandomAccess as the code claims, Iterables.concat() will not perform better than the default implementation.Alkene
Well yes, but there's little point in calling iterator() if you're not actually going to iterate... I've clarified the comment.Collector
B
1

Firstly, @dimo414's answer is right on the mark - with a clean wrapper view implementation and advice.

Still, I would like to emphasise that since Java 8, you probably just want to do:

    listOfList.stream()
            .flatMap(ImmutableList::stream)
            .collect(ImmutableList.toImmutableList());

The guava issue was since closed as working-as-intended with the remark:

We are more down on lazy view collections than we used to be (especially now that Stream exists) (...)

At least, profile your own use case before trying the view-collection approach.

Under the hood using streams, what effectively happens is that a new backing array is populated with references to the elements - the elements themselves are not deeply copied. So there's very low number of objects created (GC costs) and linear copies from backing-arrays to backing-arrays usually proceed faster than you might expect even with large inner-lists. (They work well with CPU cache prefetch).

Depending on how much you do with the result, the stream version might work out faster that the wrapper version's extra indirection every time you access it.

Borders answered 30/9, 2022 at 9:5 Comment(0)
P
0

Here is probably a slightly more readable version of dimo414 implementation, which processes empty lists correctly and populates startIndexes correctly:

public class ImmutableMultiListView<E> extends AbstractList<E> implements RandomAccess {
  private final ImmutableList<ImmutableList<E>> listOfLists;
  private final int[] startIndexes;

  private final int size;

  private ImmutableMultiListView(List<ImmutableList<E>> originalListOfLists) {
    this.listOfLists =
        originalListOfLists.stream().filter(l -> !l.isEmpty()).collect(toImmutableList());
    startIndexes = new int[listOfLists.size()];
    int sumSize = 0;
    for (int i = 0; i < listOfLists.size(); i++) {
      List<E> list = listOfLists.get(i);
      sumSize += list.size();
      if (i < startIndexes.length - 1) {
        startIndexes[i + 1] = sumSize;
      }
    }
    this.size = sumSize;
  }

  @Override
  public E get(int index) {
    checkElementIndex(index, size);
    int location = Arrays.binarySearch(startIndexes, index);
    if (location >= 0) {
      return listOfLists.get(location).get(0);
    } else {
      // See Arrays#binarySearch Javadoc:
      int insertionPoint = -location - 1;
      int listIndex = insertionPoint - 1;
      return listOfLists.get(listIndex).get(index - startIndexes[listIndex]);
    }
  }

  @Override
  public int size() {
    return size;
  }

  // AbstractList.iterator() calls .get(), which is slower than just concatenating
  // the backing lists' iterators
  @Override
  public Iterator<E> iterator() {
    return Iterables.concat(listOfLists).iterator();
  }

  public static <E> ImmutableMultiListView<E> of(List<ImmutableList<E>> lists) {
    return new ImmutableMultiListView<>(lists);
  }
}

Pitchman answered 3/7, 2020 at 5:12 Comment(0)
C
-2

Not sure if it is possible just with Guava classes, but it seems not difficult to implement, how about something like the following:

package com.google.common.collect;

import java.util.List;

public class ConcatenatedList<T> extends ImmutableList<T> {

    private final List<ImmutableList<T>> underlyingLists;

    public ConcatenatedList(List<ImmutableList<T>> underlyingLists) {
        this.underlyingLists = underlyingLists;
    }

    @Override
    public T get(int index) {
        for (ImmutableList<T> list : underlyingLists) {
            if (index < list.size()) return list.get(index);
            index -= list.size();
        }
        throw new IndexOutOfBoundsException();
    }

    @Override
    boolean isPartialView() {
        for (ImmutableList<T> list : underlyingLists) {
            if (list.isPartialView()) return true;
        }
        return false;
    }

    @Override
    public int size() {
        int result = 0;
        for (ImmutableList<T> list : underlyingLists) {
            result += list.size();
        }
        return result;
    }
}

Note package declaration, it needs to be like that to access Guava's ImmutableList package access constructor. Be aware that this implementation might break with future version of Guava, since the constructor is not part of API. Also as mentioned in the javadoc of ImmutableList and in comments this class was not intended to be subclassed by the original library author. However, there is no good reason for not using it in application you control and it has additional benefit of expressing immutability in the type signature compared to MultiListView suggested in the other answer.

Comras answered 20/6, 2016 at 10:40 Comment(6)
+1, beat me to it. If the "small count of lists" becomes bigger, it may be better to have a tree splitting on count, but this is much simpler atmAbase
Hmm... It's a shame that it doesn't exist in Guava but thanks for the code.Alberic
Please don't author your own classes in the com.google package space...Collector
It is specifically intended in Guava's API design that you not be allowed to do this.Martamartaban
There is no other way to satisfy original requirement of having ImmutableList type for the concatenated list. Having this type is classic "is a" property since the provided implementation is immutable and saying so in the type is beneficial.Comras
Correct, by design you cannot construct your own ImmutableList implementations with your own behavior. That means the original requirements need to be revisited, not that we should mangle the type hierarchy.Collector

© 2022 - 2025 — McMap. All rights reserved.