Calculate difference from previous item with LINQ
Asked Answered
D

7

26

I'm trying to prepare data for a graph using LINQ.

The problem that i cant solve is how to calculate the "difference to previous.

the result I expect is

ID= 1, Date= Now, DiffToPrev= 0;

ID= 1, Date= Now+1, DiffToPrev= 3;

ID= 1, Date= Now+2, DiffToPrev= 7;

ID= 1, Date= Now+3, DiffToPrev= -6;

etc...

Can You help me create such a query ?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    public class MyObject
    {
        public int ID { get; set; }
        public DateTime Date { get; set; }
        public int Value { get; set; }
    }

    class Program
    {
        static void Main()
        {
               var list = new List<MyObject>
          {
            new MyObject {ID= 1,Date = DateTime.Now,Value = 5},
            new MyObject {ID= 1,Date = DateTime.Now.AddDays(1),Value = 8},
            new MyObject {ID= 1,Date = DateTime.Now.AddDays(2),Value = 15},
            new MyObject {ID= 1,Date = DateTime.Now.AddDays(3),Value = 9},
            new MyObject {ID= 1,Date = DateTime.Now.AddDays(4),Value = 12},
            new MyObject {ID= 1,Date = DateTime.Now.AddDays(5),Value = 25},
            new MyObject {ID= 2,Date = DateTime.Now,Value = 10},
            new MyObject {ID= 2,Date = DateTime.Now.AddDays(1),Value = 7},
            new MyObject {ID= 2,Date = DateTime.Now.AddDays(2),Value = 19},
            new MyObject {ID= 2,Date = DateTime.Now.AddDays(3),Value = 12},
            new MyObject {ID= 2,Date = DateTime.Now.AddDays(4),Value = 15},
            new MyObject {ID= 2,Date = DateTime.Now.AddDays(5),Value = 18}

        };

            Console.WriteLine(list);   

            Console.ReadLine();
        }
    }
}
Dues answered 10/9, 2010 at 8:14 Comment(0)
C
78

One option (for LINQ to Objects) would be to create your own LINQ operator:

// I don't like this name :(
public static IEnumerable<TResult> SelectWithPrevious<TSource, TResult>
    (this IEnumerable<TSource> source,
     Func<TSource, TSource, TResult> projection)
{
    using (var iterator = source.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
             yield break;
        }
        TSource previous = iterator.Current;
        while (iterator.MoveNext())
        {
            yield return projection(previous, iterator.Current);
            previous = iterator.Current;
        }
    }
}

This enables you to perform your projection using only a single pass of the source sequence, which is always a bonus (imagine running it over a large log file).

Note that it will project a sequence of length n into a sequence of length n-1 - you may want to prepend a "dummy" first element, for example. (Or change the method to include one.)

Here's an example of how you'd use it:

var query = list.SelectWithPrevious((prev, cur) =>
     new { ID = cur.ID, Date = cur.Date, DateDiff = (cur.Date - prev.Date).Days) });

Note that this will include the final result of one ID with the first result of the next ID... you may wish to group your sequence by ID first.

Crossstitch answered 10/9, 2010 at 8:38 Comment(19)
This seems like a right answer, but i cant figure how to use it.Dues
I guess this one would be more efficient than Branimir's answer, right ?Dues
@Martynas: It's more general than Branimir's answer, and more efficient than Felix's.Crossstitch
Cool :) seems i need to deepen my Linq knowledge more that i thought. Thank You.Dues
Hi. I Actually have data like "ID, "Color", "Date", "Value" where (id, color and date) are key, according which data should be grouped. but I cant figure out how to group the data and use Your function. Can You help adding a group by clause to this query ?Dues
@Martynas: If you want to group by multiple values, just use group x by new { x.Id, x.Color, x.Date }Crossstitch
@John - I've updated the list, so it would have 12 elements. and the query adding ".GroupBy(g => new {g.ID, g.Date})." but there's a problem as I've suspected. When query starts calculating the first element with ID=2, what it does is it takes last sequence element with ID=1 and subtracts value from element with ID=2. in this particular case it's 25 - 10 = 15. Is ti possible to ommit this element somehow ? I guess there will be nested queries in this, right ?Dues
@Martynas: I'm afraid it's hard to tell exactly what's going on at this point - it feels like it would be better served by asking a new question.Crossstitch
@John Thanks for Your help. Will prep a new example then :)Dues
@John, I've put a new example here - https://mcmap.net/q/535358/-calculate-difference-from-previous-item-in-a-group-with-single-linq-query/444149 Would be cool to hear Your Opinion on this.Dues
That's a nice little function Jon; sweet and simple.Sauveur
Added modified version to not skip first item... see my answer for code - https://mcmap.net/q/512438/-calculate-difference-from-previous-item-with-linq/26307521#26307521Shepperd
@JonSkeet Since you don't like the function name, can I suggest SelectAdjacent?Locklin
@MillieSmith: I'm not sure that's any better, to be honest... but let's keep thinking :)Crossstitch
ByFormer or ByAdjacent :)Clubwoman
I call mine Scan based on the APL scan operator (i have many variations). Why are you doing using with IEnumerator that doesn't implment IDisposable?Ridgepole
@NetMage: IEnumerator<T> does implement IDisposable, and you should always use it - just like foreach does implicitly. The non-generic version doesn't.Crossstitch
@JonSkeet Thanks - I see that now. My Reference Source spelunking is still weak.Ridgepole
clean, nice, and most important is O(1)Shamikashamma
C
21

Use index to get previous object:

   var LinqList = list.Select( 
       (myObject, index) => 
          new { 
            ID = myObject.ID, 
            Date = myObject.Date, 
            Value = myObject.Value, 
            DiffToPrev = (index > 0 ? myObject.Value - list[index - 1].Value : 0)
          }
   );
Clew answered 10/9, 2010 at 8:44 Comment(7)
@Martynas: Note that this isn't very general purpose though - it only works in scenarios where you can index into the collection.Crossstitch
@JonSkeet The OP has a list and didn't ask for general purpose, so this a superior answer.Transmogrify
@JimBalter: The purpose of Stack Overflow is to serve more than just the OP's question. Sometimes it makes sense to stick strictly to the bounds of what's required (although I'd at least have formatted this code to avoid scrolling), but other times I think it's helpful to give more generally-useful approaches.Crossstitch
I like it: nice and simple, as the LINQ is supposed to be! @JonSkeet, Your custom operator has enriched my skills, and also provided good example of operating iterator. But myself and my fellow team members would like to have the code as simple and readable as possible.Lang
@MichaelG: Note that this only works when you have random access to the list. It doesn't work with arbitrary IEnumerable<T>. (I also personally think it takes longer to understand that this is trying to do something with adjacent elements than a method called SelectWithPrevious. But readability is at least somewhat subjective.)Crossstitch
Hi @JonSkeet, I used BenchmarkDotNet, and .NET 4.8 to compare the both algorithms: SelectWithPrevious and SelectWithIndex. Interestingly, there is indeed favorable perforamnce Ratio of 1 vs 1.43 for the SelectWithPrevious, when there are only 5 elements in the list. However starting from list of 10 items and all the way to lists of up to 1000000 :D, the Ratio starts to be the same, and even better for the SelectWithIndex solution: 1 vs 0.9. I am very curious about that!Lang
@MichaelG: I wouldn't particularly expect a significant performance difference - but SelectWithIndex requires the source to be accessible by index, whereas SelectWithPrevious doesn't.Crossstitch
S
9

In C#4 you can use the Zip method in order to process two items at a time. Like this:

        var list1 = list.Take(list.Count() - 1);
        var list2 = list.Skip(1);
        var diff = list1.Zip(list2, (item1, item2) => ...);
Sorn answered 10/9, 2010 at 8:26 Comment(0)
S
8

Modification of Jon Skeet's answer to not skip the first item:

public static IEnumerable<TResult> SelectWithPrev<TSource, TResult>
    (this IEnumerable<TSource> source, 
    Func<TSource, TSource, bool, TResult> projection)
{
    using (var iterator = source.GetEnumerator())
    {
        var isfirst = true;
        var previous = default(TSource);
        while (iterator.MoveNext())
        {
            yield return projection(iterator.Current, previous, isfirst);
            isfirst = false;
            previous = iterator.Current;
        }
    }
}

A few key differences... passes a third bool parameter to indicate if it is the first element of the enumerable. I also switched the order of the current/previous parameters.

Here's the matching example:

var query = list.SelectWithPrevious((cur, prev, isfirst) =>
    new { 
        ID = cur.ID, 
        Date = cur.Date, 
        DateDiff = (isfirst ? cur.Date : cur.Date - prev.Date).Days);
    });
Shepperd answered 10/10, 2014 at 20:23 Comment(1)
Hi @Edyn, thank you for this, any idea why Jon skip first row?, I supposed is something realted to the requeriment of the main post, but your update, really help me, thank!Nonintervention
B
3

Further to Felix Ungman's post above, below is an example of how you can achieve the data you need making use of Zip():

        var diffs = list.Skip(1).Zip(list,
            (curr, prev) => new { CurrentID = curr.ID, PreviousID = prev.ID, CurrDate = curr.Date, PrevDate = prev.Date, DiffToPrev = curr.Date.Day - prev.Date.Day })
            .ToList();

        diffs.ForEach(fe => Console.WriteLine(string.Format("Current ID: {0}, Previous ID: {1} Current Date: {2}, Previous Date: {3} Diff: {4}",
            fe.CurrentID, fe.PreviousID, fe.CurrDate, fe.PrevDate, fe.DiffToPrev)));

Basically, you are zipping two versions of the same list but the first version (the current list) begins at the 2nd element in the collection, otherwise a difference would always differ the same element, giving a difference of zero.

I hope this makes sense,

Dave

Board answered 9/7, 2013 at 5:36 Comment(0)
C
2

Yet another mod on Jon Skeet's version (thanks for your solution +1). Except this is returning an enumerable of tuples.

public static IEnumerable<Tuple<T, T>> Intermediate<T>(this IEnumerable<T> source)
{
    using (var iterator = source.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            yield break;
        }
        T previous = iterator.Current;
        while (iterator.MoveNext())
        {
            yield return new Tuple<T, T>(previous, iterator.Current);
            previous = iterator.Current;
        }
    }
}

This is NOT returning the first because it's about returning the intermediate between items.

use it like:

public class MyObject
{
    public int ID { get; set; }
    public DateTime Date { get; set; }
    public int Value { get; set; }
}

var myObjectList = new List<MyObject>();

// don't forget to order on `Date`

foreach(var deltaItem in myObjectList.Intermediate())
{
    var delta = deltaItem.Second.Offset - deltaItem.First.Offset;
    // ..
}

OR

var newList = myObjectList.Intermediate().Select(item => item.Second.Date - item.First.Date);

OR (like jon shows)

var newList = myObjectList.Intermediate().Select(item => new 
{ 
    ID = item.Second.ID, 
    Date = item.Second.Date, 
    DateDiff = (item.Second.Date - item.First.Date).Days
});
Contagium answered 14/9, 2015 at 12:28 Comment(2)
Which Pair are you using? I don't see a public one in .Net?Ridgepole
@Ridgepole My bad, you can replace it by Tuple. I've changed it. Thanks you.Contagium
L
2

Here is the refactored code with C# 7.2 using the readonly struct and the ValueTuple (also struct).

I use Zip() to create (CurrentID, PreviousID, CurrDate, PrevDate, DiffToPrev) tuple of 5 members. It is easily iterated with foreach:

foreach(var (CurrentID, PreviousID, CurrDate, PrevDate, DiffToPrev) in diffs)

The full code:

public readonly struct S
{
    public int ID { get; }
    public DateTime Date { get; }
    public int Value { get; }

    public S(S other) => this = other;

    public S(int id, DateTime date, int value)
    {
        ID = id;
        Date = date;
        Value = value;
    }

    public static void DumpDiffs(IEnumerable<S> list)
    {
        // Zip (or compare) list with offset 1 - Skip(1) - vs the original list
        // this way the items compared are i[j+1] vs i[j]
        // Note: the resulting enumeration will include list.Count-1 items
        var diffs = list.Skip(1)
                        .Zip(list, (curr, prev) => 
                                    (CurrentID: curr.ID, PreviousID: prev.ID, 
                                    CurrDate: curr.Date, PrevDate: prev.Date, 
                                    DiffToPrev: curr.Date.Day - prev.Date.Day));

        foreach(var (CurrentID, PreviousID, CurrDate, PrevDate, DiffToPrev) in diffs)
            Console.WriteLine($"Current ID: {CurrentID}, Previous ID: {PreviousID} " +
                              $"Current Date: {CurrDate}, Previous Date: {PrevDate} " +
                              $"Diff: {DiffToPrev}");
    }
}

Unit test output:

// the list:

// ID   Date
// ---------------
// 233  17-Feb-19
// 122  31-Mar-19
// 412  03-Mar-19
// 340  05-May-19
// 920  15-May-19

// CurrentID PreviousID CurrentDate PreviousDate Diff (days)
// ---------------------------------------------------------
//    122       233     31-Mar-19   17-Feb-19      14
//    412       122     03-Mar-19   31-Mar-19      -28
//    340       412     05-May-19   03-Mar-19      2
//    920       340     15-May-19   05-May-19      10

Note: the struct (especially readonly) performance is much better than that of a class.

Thanks @FelixUngman and @DavidHuxtable for their Zip() ideas!

Lang answered 6/12, 2019 at 10:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.