LINQ query — Data aggregation (group adjacent)
Asked Answered
P

8

23

Let's take a class called Cls:

public class Cls
{
    public int SequenceNumber { get; set; }
    public int Value { get; set; }
}

Now, let's populate some collection with following elements:

Sequence
Number      Value
========    =====
1           9
2           9
3           15
4           15
5           15
6           30
7           9

What I need to do, is to enumerate over Sequence Numbers and check if the next element has the same value. If yes, values are aggregated and so, desired output is as following:

Sequence    Sequence
Number      Number
From        To          Value
========    ========    =====
1           2           9
3           5           15
6           6           30
7           7           9

How can I perform this operation using LINQ query?

Politick answered 14/2, 2013 at 16:16 Comment(5)
I reckon you're gonna need to use a standard for-each loop here, interesting question though, and well put +1Rear
Very interesting question, but I somehow doubt that LINQ version will be much more readable than the foreach loop version. I'm hoping an answer here can prove me otherwise.Lacedaemonian
You could group by value and then search the grouped collections for contiguous sequences, then split by them and sort by "from", but I think I agree the imperative version won't be much less readable in this particular case.Chacha
See also: https://mcmap.net/q/585479/-c-linq-get-sets-with-adjacent/21727Hahnke
See the same problem at CodeGolf: codegolf.stackexchange.com/questions/10797/…Ferdinand
J
23

You can use Linq's GroupBy in a modified version which groups only if the two items are adjacent, then it's easy as:

var result = classes
    .GroupAdjacent(c => c.Value)
    .Select(g => new { 
        SequenceNumFrom = g.Min(c => c.SequenceNumber),
        SequenceNumTo = g.Max(c => c.SequenceNumber),  
        Value = g.Key
    });

foreach (var x in result)
    Console.WriteLine("SequenceNumFrom:{0} SequenceNumTo:{1} Value:{2}", x.SequenceNumFrom, x.SequenceNumTo, x.Value);

DEMO

Result:

SequenceNumFrom:1  SequenceNumTo:2  Value:9
SequenceNumFrom:3  SequenceNumTo:5  Value:15
SequenceNumFrom:6  SequenceNumTo:6  Value:30
SequenceNumFrom:7  SequenceNumTo:7  Value:9

This is the extension method to to group adjacent items:

public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
        this IEnumerable<TSource> source,
        Func<TSource, TKey> keySelector)
    {
        TKey last = default(TKey);
        bool haveLast = false;
        List<TSource> list = new List<TSource>();
        foreach (TSource s in source)
        {
            TKey k = keySelector(s);
            if (haveLast)
            {
                if (!k.Equals(last))
                {
                    yield return new GroupOfAdjacent<TSource, TKey>(list, last);
                    list = new List<TSource>();
                    list.Add(s);
                    last = k;
                }
                else
                {
                    list.Add(s);
                    last = k;
                }
            }
            else
            {
                list.Add(s);
                last = k;
                haveLast = true;
            }
        }
        if (haveLast)
            yield return new GroupOfAdjacent<TSource, TKey>(list, last);
    }
}

and the class used:

public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
    public TKey Key { get; set; }
    private List<TSource> GroupList { get; set; }
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
    }
    System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
    {
        foreach (var s in GroupList)
            yield return s;
    }
    public GroupOfAdjacent(List<TSource> source, TKey key)
    {
        GroupList = source;
        Key = key;
    }
}
Johiah answered 14/2, 2013 at 16:32 Comment(5)
+1 Great answer, that's a lot of code though, I think I would just use a regular for-each loop and build a new collection that wayRear
Lots of code? It's a completely reusable, generalized solution. Not a lot, given that. Fantastic answer and a new tool for the toolbox. +1Donavon
The original source for this code appears to be blogs.msdn.microsoft.com/ericwhite/2008/04/20/…Litchi
@Quails4Eva: i must admit that i really don't know where the original source was, i'm fairly sure that it wasn't me, but i don't know that blog either. The author says: "this approach was suggested by one of the LINQ architects a year or two ago", so he's not the real author either.Johiah
@Tim Schmelter That is a fair point. I would interpret that as they suggested the approach and he wrote the code, or at least that the LINQ architect's version isn't publicly available. Regardless, I'm not too worried about it, it just seemed strange to me to find identical solutions 5 years apart so I added a link to the earlier version.Litchi
C
3

You can use this linq query

Demo

var values = (new[] { 9, 9, 15, 15, 15, 30, 9 }).Select((x, i) => new { x, i });

var query = from v in values
            let firstNonValue = values.Where(v2 => v2.i >= v.i && v2.x != v.x).FirstOrDefault()
            let grouping = firstNonValue == null ? int.MaxValue : firstNonValue.i
            group v by grouping into v
            select new
            {
              From = v.Min(y => y.i) + 1,
              To = v.Max(y => y.i) + 1,
              Value = v.Min(y => y.x)
            };
Caracal answered 14/2, 2013 at 17:17 Comment(0)
O
3

MoreLinq provides this functionality out of the box

It's called GroupAdjacent and is implemented as extension method on IEnumerable:

Groups the adjacent elements of a sequence according to a specified key selector function.

enumerable.GroupAdjacent(e => e.Key)

There is even a Nuget "source" package that contains only that method, if you don't want to pull in an additional binary Nuget package.

The method returns an IEnumerable<IGrouping<TKey, TValue>>, so its output can be processed in the same way output from GroupBy would be.

Osei answered 24/4, 2015 at 6:12 Comment(1)
I think this should be marked as the correct answer. I personally prefer adding the nuget package to copy/paste. Besides, it's worth understanding what else is in MoreLinq that I've been missing.Maddie
R
2

You can do it like this:

var all = new [] {
    new Cls(1, 9)
,   new Cls(2, 9)
,   new Cls(3, 15)
,   new Cls(4, 15)
,   new Cls(5, 15)
,   new Cls(6, 30)
,   new Cls(7, 9)
};
var f = all.First();
var res = all.Skip(1).Aggregate(
    new List<Run> {new Run {From = f.SequenceNumber, To = f.SequenceNumber, Value = f.Value} }
,   (p, v) => {
    if (v.Value == p.Last().Value) {
        p.Last().To = v.SequenceNumber;
    } else {
        p.Add(new Run {From = v.SequenceNumber, To = v.SequenceNumber, Value = v.Value});
    }
    return p;
});
foreach (var r in res) {
    Console.WriteLine("{0} - {1} : {2}", r.From, r.To, r.Value);
}

The idea is to use Aggregate creatively: starting with a list consisting of a single Run, examine the content of the list we've got so far at each stage of aggregation (the if statement in the lambda). Depending on the last value, either continue the old run, or start a new one.

Here is a demo on ideone.

Ruelu answered 14/2, 2013 at 16:32 Comment(3)
IMHO it's better to use a foreach loop when there is that much code in a lambda.Muscarine
@Muscarine It's not just the amount of code, it's the fact that it's causing side effects and depending on those side effects. When the important parts of any LINQ call cause side effects it usually means that portion should just be in a foreach.Mazur
@Mazur I agree - I would not use LINQ for run detection precisely for the reason of side effects. I view this as a cure answer to a LINQ puzzle, because the OP asked for LINQ explicitly.Ruelu
S
2

I was able to accomplish it by creating a custom extension method.

static class Extensions {
  internal static IEnumerable<Tuple<int, int, int>> GroupAdj(this IEnumerable<Cls> enumerable) {
    Cls start = null;
    Cls end = null;
    int value = Int32.MinValue;

    foreach (Cls cls in enumerable) {
      if (start == null) {
        start = cls;
        end = cls;
        continue;
      }

      if (start.Value == cls.Value) {
        end = cls;
        continue;
      }

      yield return Tuple.Create(start.SequenceNumber, end.SequenceNumber, start.Value);
      start = cls;
      end = cls;
    }

    yield return Tuple.Create(start.SequenceNumber, end.SequenceNumber, start.Value);
  }
}

Here's the implementation:

static void Main() {
  List<Cls> items = new List<Cls> {
    new Cls { SequenceNumber = 1, Value = 9 },
    new Cls { SequenceNumber = 2, Value = 9 },
    new Cls { SequenceNumber = 3, Value = 15 },
    new Cls { SequenceNumber = 4, Value = 15 },
    new Cls { SequenceNumber = 5, Value = 15 },
    new Cls { SequenceNumber = 6, Value = 30 },
    new Cls { SequenceNumber = 7, Value = 9 }
  };

  Console.WriteLine("From  To    Value");
  Console.WriteLine("===== ===== =====");
  foreach (var item in items.OrderBy(i => i.SequenceNumber).GroupAdj()) {
    Console.WriteLine("{0,-5} {1,-5} {2,-5}", item.Item1, item.Item2, item.Item3);
  }
}

And the expected output:

From  To    Value
===== ===== =====
1     2     9
3     5     15
6     6     30
7     7     9
Slap answered 14/2, 2013 at 16:38 Comment(0)
H
2

Here is an implementation without any helper methods:

var grp = 0;
var results =
from i
in
input.Zip(
    input.Skip(1).Concat(new [] {input.Last ()}),
    (n1, n2) => Tuple.Create(
        n1, (n2.Value == n1.Value) ? grp : grp++
    )
)
group i by i.Item2 into gp
select new {SequenceNumFrom = gp.Min(x => x.Item1.SequenceNumber),SequenceNumTo = gp.Max(x => x.Item1.SequenceNumber), Value = gp.Min(x => x.Item1.Value)};

The idea is:

  • Keep track of your own grouping indicator, grp.
  • Join each item of the collection to the next item in the collection (via Skip(1) and Zip).
  • If the Values match, they are in the same group; otherwise, increment grp to signal the start of the next group.
Hahnke answered 14/2, 2013 at 17:24 Comment(0)
A
1

Untested dark magic follows. The imperative version seems like it would be easier in this case.

IEnumerable<Cls> data = ...;
var query = data
    .GroupBy(x => x.Value)
    .Select(g => new
    {
        Value = g.Key,
        Sequences = g
            .OrderBy(x => x.SequenceNumber)
            .Select((x,i) => new
            {
                x.SequenceNumber,
                OffsetSequenceNumber = x.SequenceNumber - i
            })
            .GroupBy(x => x.OffsetSequenceNumber)
            .Select(g => g
                .Select(x => x.SequenceNumber)
                .OrderBy(x => x)
                .ToList())
            .ToList()
    })
    .SelectMany(x => x.Sequences
        .Select(s => new { First = s.First(), Last = s.Last(), x.Value }))
    .OrderBy(x => x.First)
    .ToList();
Archerfish answered 14/2, 2013 at 16:51 Comment(0)
P
0

Let me propose another option, which yields lazily both sequence of groups and elements inside groups.

Demonstration in .NET Fiddle

Implementation:

public static class EnumerableExtensions
{
    public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
        this IEnumerable<TSource> source,
        Func<TSource, TKey> keySelector,
        IEqualityComparer<TKey>? comparer = null)
    {
        var comparerOrDefault = comparer ?? EqualityComparer<TKey>.Default;
        using var iterator = new Iterator<TSource>(source.GetEnumerator());
        iterator.MoveNext();
        while (iterator.HasCurrent)
        {
            var key = keySelector(iterator.Current);
            var elements = YieldAdjacentElements(iterator, key, keySelector, comparerOrDefault);
            yield return new Grouping<TKey, TSource>(key, elements);
            while (iterator.HasCurrentWithKey(key, keySelector, comparerOrDefault))
            {
                iterator.MoveNext();
            }
        }
    }

    static IEnumerable<TSource> YieldAdjacentElements<TKey, TSource>(
        Iterator<TSource> iterator,
        TKey key,
        Func<TSource, TKey> keySelector,
        IEqualityComparer<TKey> comparer)
    {
        while (iterator.HasCurrentWithKey(key, keySelector, comparer))
        {
            yield return iterator.Current;
            iterator.MoveNext();
        }
    }

    private static bool HasCurrentWithKey<TKey, TSource>(
        this Iterator<TSource> iterator,
        TKey key,
        Func<TSource, TKey> keySelector,
        IEqualityComparer<TKey> comparer) =>
        iterator.HasCurrent && comparer.Equals(keySelector(iterator.Current), key);

    private sealed class Grouping<TKey, TElement> : IGrouping<TKey, TElement>
    {
        public Grouping(TKey key, IEnumerable<TElement> elements)
        {
            Key = key;
            Elements = elements;
        }

        public TKey Key { get; }

        public IEnumerable<TElement> Elements { get; }

        public IEnumerator<TElement> GetEnumerator() => Elements.GetEnumerator();

        IEnumerator IEnumerable.GetEnumerator() => Elements.GetEnumerator();
    }

    private sealed class Iterator<T> : IDisposable
    {
        private readonly IEnumerator<T> _enumerator;

        public Iterator(IEnumerator<T> enumerator)
        {
            _enumerator = enumerator;
        }

        public bool HasCurrent { get; private set; }

        public T Current => _enumerator.Current;

        public void MoveNext()
        {
            HasCurrent = _enumerator.MoveNext();
        }

        public void Dispose()
        {
            _enumerator.Dispose();
        }
    }
}

Notice, that it is impossible to achieve such level of laziness with regular GroupBy operation, since it needs to look through the whole collection before yielding the first group.

Particularly, in my case migration from GroupBy to GroupAdjacent in connection with lazy handling of whole pipeline helped to resolve memory consumption issues for large sequences.

In general, GroupAdjacent can be used as lazy and more efficient alternative of GroupBy, provided that input collection satisfies condition, that keys are sorted (or at least not fragmented), and provided that all operations in pipeline are lazy.

Plossl answered 31/5, 2022 at 8:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.