Does LINQ use DataRelations to optimize joins?
Asked Answered
O

3

6

I can't find the answer to this anywhere, and before I start pawing through generated code with Reflector I thought it'd be worth asking:

Suppose I have the following LINQ query run against DataTables in a DataSet:

var list = 
   from pr in parentTable.AsEnumerable()
   join cr in childTable.AsEnumerable() on cr.Field<int>("ParentID") equals pr.Field<int>("ID")
   where pr.Field<string>("Value") == "foo"
   select cr;

If there's a DataRelation between the parent table and the child table that uses the key fields shown, will LINQ use it? That is, will it find the rows in the parent table for which Value is "foo" and then call GetChildRows to project the child rows?

Or is this something that I have to specify explicitly? (And if so, how do I do this?)

Odle answered 22/12, 2008 at 23:32 Comment(0)
S
2

I don't think so. In this case, LINQ to Objects will probably just treat the two sides as regular enumerable objects, and do the join manually (without looking at the DataRelation).

Satyr answered 23/12, 2008 at 10:53 Comment(0)
O
3

Digging around in Reflector didn't seem to give any indication that LINQ uses DataRelations, but that code's awfully hard to read. So I wrote a little performance test, and unless there's something dumb about the test that I missed, the results are pretty definitive: no, LINQ doesn't use DataRelations and GetChildRows() to project joined rows. If you want to iterate over child rows, you have to formulate the LINQ query to call GetChildRows() explicitly. And neither approach is anywhere near as performant as writing code that iterates over the array returned by GetChildRows().

Kind of unfortunate, because the difference in performance on large DataSets is significant enough that replacing LINQ with explicitly-implemented code is often going to be worthwhile, which generally isn't true of LINQ.

My test code is below. The timing of the LINQ iteration using the join remains the same (about 580-590 ms on my machine) irrespective of whether the DataRelation is created before or after it. The LINQ iteration that uses GetChildRows() takes around 280 ms,

The iteration directly over GetChildRows() takes under a millisecond. That's pretty surprising to me - enough so that I assumed I had a bug in the code when I first ran the test. (That's why I'm writing out the count each time - to make sure that the loops haven't been optimized out of existence by the compiler.)

class Program
{
    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        DataSet ds = new DataSet();
        DataTable t1 = new DataTable();
        t1.Columns.Add(new DataColumn
                           {
                               ColumnName = "ID",
                               DataType = typeof (int),
                               AutoIncrement = true
                           });
        t1.PrimaryKey = new [] { t1.Columns["ID"]};
        ds.Tables.Add(t1);

        DataTable t2 = new DataTable();
        t2.Columns.Add(new DataColumn
        {
            ColumnName = "ID",
            DataType = typeof(int),
            AutoIncrement = true
        });
        t2.Columns.Add("ParentID", typeof(int));
        t2.PrimaryKey = new[] { t2.Columns["ID"] };
        ds.Tables.Add(t2);

        sw.Reset();
        sw.Start();
        PopulateTables(t1, t2);
        sw.Stop();
        Console.WriteLine("Populating tables took {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        var list1 = from r1 in t1.AsEnumerable()
                   join r2 in t2.AsEnumerable()
                       on r1.Field<int>("ID") equals r2.Field<int>("ParentID")
                   where r1.Field<int>("ID") == 1
                   select r2;

        sw.Reset();
        sw.Start();
        int count = 0;
        foreach (DataRow r in list1)
        {
            count += r.Field<int>("ID");
        }
        sw.Stop();
        Console.WriteLine("count = {0}.", count);
        Console.WriteLine("Completed LINQ iteration in {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        sw.Reset();
        sw.Start();
        ds.Relations.Add(new DataRelation("FK_t2_t1", t1.Columns["ID"], t2.Columns["ParentID"]));
        sw.Stop();
        Console.WriteLine("Creating DataRelation took {0} ms.", sw.ElapsedMilliseconds);

        sw.Reset();
        sw.Start();
        var list2 =
            from r1 in t1.AsEnumerable()
            from r2 in r1.GetChildRows("FK_t2_t1")
            where r1.Field<int>("ID") == 1
            select r2;

        count = 0;
        foreach (DataRow r in list2)
        {
            count += r.Field<int>("ID");
        }
        sw.Stop();
        Console.WriteLine("count = {0}.", count);
        Console.WriteLine("Completed LINQ iteration using nested query in {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        sw.Reset();
        sw.Start();
        DataRow parentRow = t1.Select("ID = 1")[0];
        count = 0;
        foreach (DataRow r in parentRow.GetChildRows("FK_t2_t1"))
        {
            count += r.Field<int>("ID");
        }
        sw.Stop();
        Console.WriteLine("count = {0}.", count);
        Console.WriteLine("Completed explicit iteration of child rows in {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        Console.ReadLine();
    }

    private static void PopulateTables(DataTable t1, DataTable t2)
    {
        for (int count1 = 0; count1 < 1000; count1++)
        {
            DataRow r1 = t1.NewRow();
            t1.Rows.Add(r1);
            for (int count2 = 0; count2 < 1000; count2++)
            {
                DataRow r2 = t2.NewRow();
                r2["ParentID"] = r1["ID"];
                t2.Rows.Add(r2);
            }
        }
    }
} 
Odle answered 23/12, 2008 at 20:9 Comment(2)
Well, you could always write your own LINQ implementation that does this - it isn't as much work as it sounds (Jon attempted it in 1 hour at a demo, and got most of the way).Satyr
Oh, that's a dangerous temptation.Odle
S
2

I don't think so. In this case, LINQ to Objects will probably just treat the two sides as regular enumerable objects, and do the join manually (without looking at the DataRelation).

Satyr answered 23/12, 2008 at 10:53 Comment(0)
J
1

This is an explaination: http://msdn.microsoft.com/en-us/library/bb386969.aspx

But in your example above you're telling linq about the parent child relationship with the "JOIN" syntax. I would recommend you make a strongly-typed dataset if possible, and all the relationships and joining will be handled for you.

Julianajuliane answered 23/12, 2008 at 2:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.