How to sort depended objects by dependency
Asked Answered
R

10

90

I have a collection:

List<VPair<Item, List<Item>> dependencyHierarchy;

The first item in pair is some object (item) and the second one is a collection of the same type objects that the first one depends on. I want to get a List<Item> in order of dependency, so there are not items that depend on the first element and so on (no cycled dependency!).

Input:

Item4 depends on Item3 and Item5
Item3 depends on Item1
Item1 does not depend on any one
Item2 depends on Item4 
Item5 does not depend on any one 

Result:

Item1
Item5
Item3
Item4
Item2

Thank you.

SOLUTION:

Topological Sorting (thanks to Loïc Février for idea)

and

example on C#, example on Java (thanks to xcud for great examples)

Rathskeller answered 5/11, 2010 at 14:34 Comment(1)
For anyone coming across this and looking for a C# nuget package, here's one I created: github.com/madelson/MedallionTopologicalSortKean
D
50

Perfect example to use a topological sort:

http://en.wikipedia.org/wiki/Topological_sorting

It will give you exactly what you need.

You can either use Kahn's algorithm:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edge

while S is not empty do
    remove a node n from S
    add n to L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S

if graph has edges then
    return error   (graph has at least one cycle)
else 
    return L   (a topologically sorted order)

...or you can use Depth-first search:

L ← Empty list that will contain the sorted nodes
while exists nodes without a permanent mark do
    select an unmarked node n
    visit(n)

function visit(node n)
    if n has a permanent mark then
        return
    if n has a temporary mark then
        stop   (not a DAG)

    mark n with a temporary mark

    for each node m with an edge from n to m do
        visit(m)

    remove temporary mark from n
    mark n with a permanent mark
    add n to head of L
Depilatory answered 5/11, 2010 at 14:45 Comment(1)
Found a C# impl of tsort: tawani.blogspot.com/2009/02/topological-sorting-and-cyclic.htmlGoal
C
102

Having struggled with this for a while, here's my attempt at a Linq style TSort extension method:

public static IEnumerable<T> TSort<T>( this IEnumerable<T> source, Func<T, IEnumerable<T>> dependencies, bool throwOnCycle = false )
{
    var sorted = new List<T>();
    var visited = new HashSet<T>();

    foreach( var item in source )
        Visit( item, visited, sorted, dependencies, throwOnCycle );

    return sorted;
}

private static void Visit<T>( T item, HashSet<T> visited, List<T> sorted, Func<T, IEnumerable<T>> dependencies, bool throwOnCycle )
{
    if( !visited.Contains( item ) )
    {
        visited.Add( item );

        foreach( var dep in dependencies( item ) )
            Visit( dep, visited, sorted, dependencies, throwOnCycle );

        sorted.Add( item );
    }
    else
    {
        if( throwOnCycle && !sorted.Contains( item ) )
            throw new Exception( "Cyclic dependency found" );
    }
}
Corazoncorban answered 14/6, 2012 at 5:23 Comment(9)
+1 Much simpler and seems to work for me. The only change I made was to use a Dictionary<T, object> instead of List<T> for visited - it should be faster for large collections.Littleton
Thanks E M - I've updated to use a HashSet for the visited collection.Corazoncorban
+1 I had a look at the pseudo code for the algorithm on Wikipedia and thought it would be easy enough to implement, but having the actual implementation is even easier!Bursary
Thanks DMM! That works for me with one modification : At the end of the if( !visited.Contains( item ) ), I added something like (in Java) else if(!sorted.contains(item)){throw new Exception("Invalid dependency cycle!");} to manage the case where A->B, B->C and C->A.Laterite
@DilshodTadjibaev it's quite simple. You would say something like var sortedItems = items.TSort(i => i.Dependencies) where items is a List of T where T would include a property returning a List of T's dependencies.Footprint
Updated the example as per electrotype's suggestion, but allowing the caller to choose whether to throw on cycle or not.Corazoncorban
Very helpful answer, however, imagine the scenario where sources = {A, B}, where dependencies(B) = {A}. The code as you have it would detect that as a 'cyclic dependency', which seems incorrect.Penmanship
Thanks for the feedback Supericy, I've just updated to better reflect the code given by electrotype which has fixed it.Corazoncorban
Very similar to the implementation @ codeproject.com/Articles/869059/Topological-sorting-in-CsharpHatchett
D
50

Perfect example to use a topological sort:

http://en.wikipedia.org/wiki/Topological_sorting

It will give you exactly what you need.

You can either use Kahn's algorithm:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edge

while S is not empty do
    remove a node n from S
    add n to L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S

if graph has edges then
    return error   (graph has at least one cycle)
else 
    return L   (a topologically sorted order)

...or you can use Depth-first search:

L ← Empty list that will contain the sorted nodes
while exists nodes without a permanent mark do
    select an unmarked node n
    visit(n)

function visit(node n)
    if n has a permanent mark then
        return
    if n has a temporary mark then
        stop   (not a DAG)

    mark n with a temporary mark

    for each node m with an edge from n to m do
        visit(m)

    remove temporary mark from n
    mark n with a permanent mark
    add n to head of L
Depilatory answered 5/11, 2010 at 14:45 Comment(1)
Found a C# impl of tsort: tawani.blogspot.com/2009/02/topological-sorting-and-cyclic.htmlGoal
A
19

There's a nuget for that.

For those of us who prefer not to re-invent the wheel: use nuget to install the QuickGraph .NET library, which includes multiple graph algorithms including topological sort.

To use it, you need to create an instance of AdjacencyGraph<,> such as AdjacencyGraph<String, SEdge<String>>. Then, if you include the appropriate extensions:

using QuickGraph.Algorithms;

You can call:

var sorted = myGraph.TopologicalSort();

To get a list of sorted nodes.

Agone answered 3/12, 2013 at 20:56 Comment(0)
N
14

I liked DMM's answer, but it assumes that the input nodes are leaves (which may or may not be what is expected).

I am posting an alternate solution using LINQ that does not make this assumption. In addition, this solution uses yield return to be able to quickly return the leaves (using e.g. TakeWhile).

public static IEnumerable<T> TopologicalSort<T>(this IEnumerable<T> nodes, 
                                                Func<T, IEnumerable<T>> connected)
{
    var elems = nodes.ToDictionary(node => node, 
                                   node => new HashSet<T>(connected(node)));
    while (elems.Count > 0)
    {
        var elem = elems.FirstOrDefault(x => x.Value.Count == 0);
        if (elem.Key == null)
        {
            throw new ArgumentException("Cyclic connections are not allowed");
        }
        elems.Remove(elem.Key);
        foreach (var selem in elems)
        {
            selem.Value.Remove(elem.Key);
        }
        yield return elem.Key;
    }
}
Nastassia answered 5/6, 2014 at 10:53 Comment(5)
This is the most compact topological sort implementation I have seen thus far.Paymar
"but it assumes that the input nodes are leaves" I do not understand. Can you explain?Once
@osexpert, that's how the algorithm works: we need to always work with leaves - nodes that do not depend on any other node(s). This code ensures that: var elem = elems.FirstOrDefault(x => x.Value.Count == 0);. Particularly, x.Value.Count == 0 ensures there are no dependencies for the given node.Associate
Also, this implementation is compact, but not optimal. We search for a leaf on every iteration, which makes it O(n^2). It could be improved by creating a queue with all leaves before a while loop and then adding new items to it as long as they become "independent".Associate
True, and IIRC the above was based on something I wrote for handling dependencies in a compiler with the order of hundreds of module, for which if performed reasonably well.Nastassia
L
6

This is my own re-implementation of Topological sorting, the idea is based on http://tawani.blogspot.com/2009/02/topological-sorting-and-cyclic.html (The ported Java source code consumes too much memory, checking 50k objects costs 50k*50k*4 = 10GB which is unacceptable. In addition, it still has Java coding convention some places)

using System.Collections.Generic;
using System.Diagnostics;

namespace Modules
{
    /// <summary>
    /// Provides fast-algorithm and low-memory usage to sort objects based on their dependencies. 
    /// </summary>
    /// <remarks>
    /// Definition: http://en.wikipedia.org/wiki/Topological_sorting
    /// Source code credited to: http://tawani.blogspot.com/2009/02/topological-sorting-and-cyclic.html    
    /// Original Java source code: http://www.java2s.com/Code/Java/Collections-Data-Structure/Topologicalsorting.htm
    /// </remarks>
    /// <author>ThangTran</author>
    /// <history>
    /// 2012.03.21 - ThangTran: rewritten based on <see cref="TopologicalSorter"/>.
    /// </history>
    public class DependencySorter<T>
    {
        //**************************************************
        //
        // Private members
        //
        //**************************************************

        #region Private members

        /// <summary>
        /// Gets the dependency matrix used by this instance.
        /// </summary>
        private readonly Dictionary<T, Dictionary<T, object>> _matrix = new Dictionary<T, Dictionary<T, object>>();

        #endregion


        //**************************************************
        //
        // Public methods
        //
        //**************************************************

        #region Public methods

        /// <summary>
        /// Adds a list of objects that will be sorted.
        /// </summary>
        public void AddObjects(params T[] objects)
        {
            // --- Begin parameters checking code -----------------------------
            Debug.Assert(objects != null);
            Debug.Assert(objects.Length > 0);
            // --- End parameters checking code -------------------------------

            // add to matrix
            foreach (T obj in objects)
            {
                // add to dictionary
                _matrix.Add(obj, new Dictionary<T, object>());
            }
        }

        /// <summary>
        /// Sets dependencies of given object.
        /// This means <paramref name="obj"/> depends on these <paramref name="dependsOnObjects"/> to run.
        /// Please make sure objects given in the <paramref name="obj"/> and <paramref name="dependsOnObjects"/> are added first.
        /// </summary>
        public void SetDependencies(T obj, params T[] dependsOnObjects)
        {
            // --- Begin parameters checking code -----------------------------
            Debug.Assert(dependsOnObjects != null);
            // --- End parameters checking code -------------------------------

            // set dependencies
            Dictionary<T, object> dependencies = _matrix[obj];
            dependencies.Clear();

            // for each depended objects, add to dependencies
            foreach (T dependsOnObject in dependsOnObjects)
            {
                dependencies.Add(dependsOnObject, null);
            }
        }

        /// <summary>
        /// Sorts objects based on this dependencies.
        /// Note: because of the nature of algorithm and memory usage efficiency, this method can be used only one time.
        /// </summary>
        public T[] Sort()
        {
            // prepare result
            List<T> result = new List<T>(_matrix.Count);

            // while there are still object to get
            while (_matrix.Count > 0)
            {
                // get an independent object
                T independentObject;
                if (!this.GetIndependentObject(out independentObject))
                {
                    // circular dependency found
                    throw new CircularReferenceException();
                }

                // add to result
                result.Add(independentObject);

                // delete processed object
                this.DeleteObject(independentObject);
            }

            // return result
            return result.ToArray();
        }

        #endregion


        //**************************************************
        //
        // Private methods
        //
        //**************************************************

        #region Private methods

        /// <summary>
        /// Returns independent object or returns NULL if no independent object is found.
        /// </summary>
        private bool GetIndependentObject(out T result)
        {
            // for each object
            foreach (KeyValuePair<T, Dictionary<T, object>> pair in _matrix)
            {
                // if the object contains any dependency
                if (pair.Value.Count > 0)
                {
                    // has dependency, skip it
                    continue;
                }

                // found
                result = pair.Key;
                return true;
            }

            // not found
            result = default(T);
            return false;
        }

        /// <summary>
        /// Deletes given object from the matrix.
        /// </summary>
        private void DeleteObject(T obj)
        {
            // delete object from matrix
            _matrix.Remove(obj);

            // for each object, remove the dependency reference
            foreach (KeyValuePair<T, Dictionary<T, object>> pair in _matrix)
            {
                // if current object depends on deleting object
                pair.Value.Remove(obj);
            }
        }


        #endregion
    }

    /// <summary>
    /// Represents a circular reference exception when sorting dependency objects.
    /// </summary>
    public class CircularReferenceException : Exception
    {
        /// <summary>
        /// Initializes a new instance of the <see cref="CircularReferenceException"/> class.
        /// </summary>
        public CircularReferenceException()
            : base("Circular reference found.")
        {            
        }
    }
}
Logogram answered 3/4, 2012 at 11:1 Comment(0)
O
5

DMM"s looks good but I don't like recursive methods. Krumelur looks good but seems to use a lot of memory? Made an alternative stack based method that seems to work. Uses same logic as DMM's and I used its result as template when making it:

    public static IEnumerable<T> TopologicalSequenceDFS<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> deps)
    {
        var yielded = new HashSet<T>();
        var visited = new HashSet<T>();
        var stack = new Stack<(T, IEnumerator<T>)>();

        foreach (T t in source)
        {
            if (visited.Add(t))
                stack.Push((t, deps(t).GetEnumerator()));

            while (stack.Any())
            {
                var p = stack.Peek();
                bool depsPushed = false;
                while (p.Item2.MoveNext())
                {
                    var curr = p.Item2.Current;
                    if (visited.Add(curr))
                    {
                        stack.Push((curr, deps(curr).GetEnumerator()));
                        depsPushed = true;
                        break;
                    }
                    else if (!yielded.Contains(curr))
                        throw new Exception("cyclic");
                }

                if (!depsPushed)
                {
                    p = stack.Pop();
                    p.Item2.Dispose();
                    if (!yielded.Add(p.Item1))
                        throw new Exception("bug");
                    yield return p.Item1;
                }
            }
        }
    }

For completeness, here is a stack based BFS variant:

    public static IEnumerable<T> TopologicalSequenceBFS<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> deps)
    {
        var yielded = new HashSet<T>();
        var visited = new HashSet<T>();
        var stack = new Stack<(T, bool)>();

        foreach (var t in source)
        {
            stack.Push((t, false));

            while (stack.Any())
            {
                var item = stack.Pop();
                if (item.Item2)
                {
                    if (!yielded.Add(item.Item1))
                        throw new Exception("bug");
                    yield return item.Item1;
                }
                else
                {
                    if (visited.Add(item.Item1))
                    {
                        stack.Push((item.Item1, true)); // yield after processing dependencies
                        foreach (var dep in deps(item.Item1))
                            stack.Push((dep, false));
                    }
                    else if (!yielded.Contains(item.Item1))
                        throw new Exception("cyclic");
                }
            }
        }
    }
Once answered 8/7, 2018 at 19:5 Comment(1)
love TopogicalSequenceDFS!Tramontane
M
1

I would make this easier on myself by storing the dependencies of an Item within the Item itself:

public class Item
{
    private List<Item> m_Dependencies = new List<Item>();

    protected AddDependency(Item _item) { m_Dependencies.Add(_item); }

    public Item()
    {
    }; // eo ctor

    public List<Item> Dependencies {get{return(m_Dependencies);};}
} // eo class Item

Then, given this you can implement a custom Sort delegate for List that sorts based on whether the given Item is contained within the other's list of dependencies:

int CompareItem(Item _1, Item _2)
{
    if(_2.Dependencies.Contains(_1))
        return(-1);
    else if(_1.Dependencies.Contains(_2))
        return(1);
    else
        return(0);
}
Monson answered 5/11, 2010 at 14:46 Comment(5)
The order is not complete so it won't work. You would need to have for each item the list of all items he or any of his descendants depends on. (ie build the complete directed acyclic graph) Easy to find a counter-example : 1 depends of 3 and 2, 3 of 4. [3 4 1 2] is sorted according to your algorithm. But 3 must be after 1.Unmoving
ah, thankyou. I didn't think of that. Much appreciated. Here come the downvotes! :)Monson
Loic, would you be so kind as to explain further why my suggestion doesn't work? Not trying to say it's right, but just so I can learn better. I just tried it here and both for the OP's example and your example, my resulting list was in order. By luck, perhaps? :) Given your example (1 depending on 3 & 2, 2 depending on 4), my resulting sort was [4, 3, 2, 1]Monson
To order a list every sorting algorithm will only check if any consecutive elements are sorted. In your case sorted means : the second one does not depend of the first one. [3 4 1 2] and [4, 3, 2, 1] are two possible orders. The algorithm suppose transitivity : if x <= y and y <= z then x <= z. In this case it's not true. You can however modify the data : if x depends of y and y depends of z then add z to x' dependency list. Your partial order is now a complete partial order and a sorting algorithm can work. But the complexity to "complete it" is O(N^2) where as is' O(N) for topological sort.Unmoving
Thanks for taking the time to explain this further. In further testing, my idea did indeed break and fall on it's ass. Happy programming!Monson
N
1

A different idea, for cases with only one "parent" depending:

Instead of deps, you'd store the parents.
So you can tell very easily whether an issue is a dependency of some other.
And then use Comparable<T>, which would claim the dependencies "lesser" and the dependency "greater".
And then simply call Collections.sort( List<T>, ParentComparator<T>);

For multi-parent scenario, a tree search would be needed which would result in slow execution. But that could be solved by a cache in a form of A* sort matrix.

Nicety answered 23/5, 2013 at 3:52 Comment(0)
M
1

I merged DMM's idea with the depth-first-search algorithm on Wikipedia. It works perfect for what I needed.

public static class TopologicalSorter
{
public static List<string> LastCyclicOrder = new List<string>(); //used to see what caused the cycle

sealed class ItemTag
{
  public enum SortTag
  {
    NotMarked,
    TempMarked,
    Marked
  }

  public string Item { get; set; }
  public SortTag Tag { get; set; }

  public ItemTag(string item)
  {
    Item = item;
    Tag = SortTag.NotMarked;
  }
}

public static IEnumerable<string> TSort(this IEnumerable<string> source, Func<string, IEnumerable<string>> dependencies)
{
  TopologicalSorter.LastCyclicOrder.Clear();

  List<ItemTag> allNodes = new List<ItemTag>();
  HashSet<string> sorted = new HashSet<string>(StringComparer.OrdinalIgnoreCase);

  foreach (string item in source)
  {
    if (!allNodes.Where(n => string.Equals(n.Item, item, StringComparison.OrdinalIgnoreCase)).Any())
    {
      allNodes.Add(new ItemTag(item)); //don't insert duplicates
    }
    foreach (string dep in dependencies(item))
    {
      if (allNodes.Where(n => string.Equals(n.Item, dep, StringComparison.OrdinalIgnoreCase)).Any()) continue; //don't insert duplicates
      allNodes.Add(new ItemTag(dep));
    }
  }

  foreach (ItemTag tag in allNodes)
  {
    Visit(tag, allNodes, dependencies, sorted);
  }

  return sorted;
}

static void Visit(ItemTag tag, List<ItemTag> allNodes, Func<string, IEnumerable<string>> dependencies, HashSet<string> sorted)
{
  if (tag.Tag == ItemTag.SortTag.TempMarked)
  {
    throw new GraphIsCyclicException();
  }
  else if (tag.Tag == ItemTag.SortTag.NotMarked)
  {
    tag.Tag = ItemTag.SortTag.TempMarked;
    LastCyclicOrder.Add(tag.Item);

    foreach (ItemTag dep in dependencies(tag.Item).Select(s => allNodes.Where(t => string.Equals(s, t.Item, StringComparison.OrdinalIgnoreCase)).First())) //get item tag which falls with used string
      Visit(dep, allNodes, dependencies, sorted);

    LastCyclicOrder.Remove(tag.Item);
    tag.Tag = ItemTag.SortTag.Marked;
    sorted.Add(tag.Item);
  }
}
}
Mcgregor answered 13/6, 2013 at 20:49 Comment(0)
E
0

This is refactored code from post https://mcmap.net/q/235312/-how-to-sort-depended-objects-by-dependency.

// Version 1
public static class TopologicalSorter<T> where T : class {

    public struct Item {
        public readonly T Object;
        public readonly T Dependency;
        public Item(T @object, T dependency) {
            Object = @object;
            Dependency = dependency;
        }
    }


    public static T[] Sort(T[] objects, Func<T, T, bool> isDependency) {
        return Sort( objects.ToList(), isDependency ).ToArray();
    }

    public static T[] Sort(T[] objects, Item[] dependencies) {
        return Sort( objects.ToList(), dependencies.ToList() ).ToArray();
    }


    private static List<T> Sort(List<T> objects, Func<T, T, bool> isDependency) {
        return Sort( objects, GetDependencies( objects, isDependency ) );
    }

    private static List<T> Sort(List<T> objects, List<Item> dependencies) {
        var result = new List<T>( objects.Count );

        while (objects.Any()) {
            var obj = GetIndependentObject( objects, dependencies );
            RemoveObject( obj, objects, dependencies );
            result.Add( obj );
        }

        return result;
    }

    private static List<Item> GetDependencies(List<T> objects, Func<T, T, bool> isDependency) {
        var dependencies = new List<Item>();

        for (var i = 0; i < objects.Count; i++) {
            var obj1 = objects[i];
            for (var j = i + 1; j < objects.Count; j++) {
                var obj2 = objects[j];
                if (isDependency( obj1, obj2 )) dependencies.Add( new Item( obj1, obj2 ) ); // obj2 is dependency of obj1
                if (isDependency( obj2, obj1 )) dependencies.Add( new Item( obj2, obj1 ) ); // obj1 is dependency of obj2
            }
        }

        return dependencies;
    }


    private static T GetIndependentObject(List<T> objects, List<Item> dependencies) {
        foreach (var item in objects) {
            if (!GetDependencies( item, dependencies ).Any()) return item;
        }
        throw new Exception( "Circular reference found" );
    }

    private static IEnumerable<Item> GetDependencies(T obj, List<Item> dependencies) {
        return dependencies.Where( i => i.Object == obj );
    }

    private static void RemoveObject(T obj, List<T> objects, List<Item> dependencies) {
        objects.Remove( obj );
        dependencies.RemoveAll( i => i.Object == obj || i.Dependency == obj );
    }

}


// Version 2
public class TopologicalSorter {

    public static T[] Sort<T>(T[] source, Func<T, T, bool> isDependency) {
        var list = new LinkedList<T>( source );
        var result = new List<T>();

        while (list.Any()) {
            var obj = GetIndependentObject( list, isDependency );
            list.Remove( obj );
            result.Add( obj );
        }

        return result.ToArray();
    }

    private static T GetIndependentObject<T>(IEnumerable<T> list, Func<T, T, bool> isDependency) {
        return list.First( i => !GetDependencies( i, list, isDependency ).Any() );
    }

    private static IEnumerable<T> GetDependencies<T>(T obj, IEnumerable<T> list, Func<T, T, bool> isDependency) {
        return list.Where( i => isDependency( obj, i ) ); // i is dependency of obj
    }

}
Earp answered 7/5, 2018 at 11:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.