Find out which combinations of numbers in a set add up to a given total
Asked Answered
I

10

24

I've been tasked with helping some accountants solve a common problem they have - given a list of transactions and a total deposit, which transactions are part of the deposit? For example, say I have this list of numbers:

1.00
2.50
3.75
8.00

And I know that my total deposit is 10.50, I can easily see that it's made up of the 8.00 and 2.50 transaction. However, given a hundred transactions and a deposit in the millions, it quickly becomes much more difficult.

In testing a brute force solution (which takes way too long to be practical), I had two questions:

  1. With a list of about 60 numbers, it seems to find a dozen or more combinations for any total that's reasonable. I was expecting a single combination to satisfy my total, or maybe a few possibilities, but there always seem to be a ton of combinations. Is there a math principle that describes why this is? It seems that given a collection of random numbers of even a medium size, you can find a multiple combination that adds up to just about any total you want.

  2. I built a brute force solution for the problem, but it's clearly O(n!), and quickly grows out of control. Aside from the obvious shortcuts (exclude numbers larger than the total themselves), is there a way to shorten the time to calculate this?

Details on my current (super-slow) solution:

The list of detail amounts is sorted largest to smallest, and then the following process runs recursively:

  • Take the next item in the list and see if adding it to your running total makes your total match the target. If it does, set aside the current chain as a match. If it falls short of your target, add it to your running total, remove it from the list of detail amounts, and then call this process again

This way it excludes the larger numbers quickly, cutting the list down to only the numbers it needs to consider. However, it's still n! and larger lists never seem to finish, so I'm interested in any shortcuts I might be able to take to speed this up - I suspect that even cutting 1 number out of the list would cut the calculation time in half.

Thanks for your help!

Idel answered 21/10, 2010 at 16:39 Comment(2)
Look into the Knapsack Problem. There will be multiple solutions unless you have more information to start with.Sweatshop
double has 64 bits, and your numbers are from a small subset of that, so it is not much of a surprise that at 60 elements there are many solutions, there are 2^60 subsets whose sums map into the doubles.Eryneryngo
O
18

This special case of the Knapsack problem is called Subset Sum.

Orme answered 21/10, 2010 at 17:24 Comment(1)
That's exactly what it is, after reading the wiki article about it. Seems there are already some approaches to the problem that are more efficient than what I'm doing, but I was hoping there were some shortcuts that would let me cut the search space more - maybe recognizing which numbers add up to other numbers in the set, and then avoiding duplicating effort to check them. Thanks for the link!Idel
E
10

C# version

setup test:

using System;
using System.Collections.Generic;

public class Program
{
    public static void Main(string[] args)
    {
    // subtotal list
    List<double> totals = new List<double>(new double[] { 1, -1, 18, 23, 3.50, 8, 70, 99.50, 87, 22, 4, 4, 100.50, 120, 27, 101.50, 100.50 });

    // get matches
    List<double[]> results = Knapsack.MatchTotal(100.50, totals);

    // print results
    foreach (var result in results)
    {
        Console.WriteLine(string.Join(",", result));
    }

    Console.WriteLine("Done.");
    Console.ReadKey();
    }
}

code:

using System.Collections.Generic;
using System.Linq;

public class Knapsack
{
    internal static List<double[]> MatchTotal(double theTotal, List<double> subTotals)
    {
    List<double[]> results = new List<double[]>();

    while (subTotals.Contains(theTotal))
    {
        results.Add(new double[1] { theTotal });
        subTotals.Remove(theTotal);
    }

    // if no subtotals were passed
    // or all matched the Total
    // return
    if (subTotals.Count == 0)
        return results;

    subTotals.Sort();

    double mostNegativeNumber = subTotals[0];
    if (mostNegativeNumber > 0)
        mostNegativeNumber = 0;

    // if there aren't any negative values
    // we can remove any values bigger than the total
    if (mostNegativeNumber == 0)
        subTotals.RemoveAll(d => d > theTotal);

    // if there aren't any negative values
    // and sum is less than the total no need to look further
    if (mostNegativeNumber == 0 && subTotals.Sum() < theTotal)
        return results;

    // get the combinations for the remaining subTotals
    // skip 1 since we already removed subTotals that match
    for (int choose = 2; choose <= subTotals.Count; choose++)
    {
        // get combinations for each length
        IEnumerable<IEnumerable<double>> combos = Combination.Combinations(subTotals.AsEnumerable(), choose);

        // add combinations where the sum mathces the total to the result list
        results.AddRange(from combo in combos
                 where combo.Sum() == theTotal
                 select combo.ToArray());
    }

    return results;
    }
}

public static class Combination
{
    public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int choose)
    {
    return choose == 0 ?                        // if choose = 0
        new[] { new T[0] } :                    // return empty Type array
        elements.SelectMany((element, i) =>     // else recursively iterate over array to create combinations
        elements.Skip(i + 1).Combinations(choose - 1).Select(combo => (new[] { element }).Concat(combo)));
    }
}

results:

100.5
100.5
-1,101.5
1,99.5
3.5,27,70
3.5,4,23,70
3.5,4,23,70
-1,1,3.5,27,70
1,3.5,4,22,70
1,3.5,4,22,70
1,3.5,8,18,70
-1,1,3.5,4,23,70
-1,1,3.5,4,23,70
1,3.5,4,4,18,70
-1,3.5,8,18,22,23,27
-1,3.5,4,4,18,22,23,27
Done.

If subTotals are repeated, there will appear to be duplicate results (the desired effect). In reality, you will probably want to use the subTotal Tupled with some ID, so you can relate it back to your data.

Eba answered 27/6, 2011 at 19:8 Comment(0)
C
2

If I understand your problem correctly, you have a set of transactions, and you merely wish to know which of them could have been included in a given total. So if there are 4 possible transactions, then there are 2^4 = 16 possible sets to inspect. This problem is, for 100 possible transactions, the search space has 2^100 = 1267650600228229401496703205376 possible combinations to search over. For 1000 potential transactions in the mix, it grows to a total of

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

sets that you must test. Brute force will hardly be a viable solution on these problems.

Instead, use a solver that can handle knapsack problems. But even then, I'm not sure that you can generate a complete enumeration of all possible solutions without some variation of brute force.

Chemisette answered 21/10, 2010 at 17:4 Comment(2)
Might be a small problem with the math... Given a set of 4 amounts we need to sum all sub sets containing 1, 2, 3, and 4 amounts. This would be the sum of C(4,r) where r varies from 1 to 4. That is C(4,1) + C(4,2) + C(4,3) + C(4,4) = (4) + (6) + (4) + (1) = 15 <> 2^4. Anyhow, this still scales such that brute force solutions are doomed.Enneagon
The only other subset is the zero case, where no transactions at all are used. If you exclude that case from the search, then we have only 1 fewer elements in the set, for 2^n-1 sets to consider. For large n, the difference seems trivial to me.Chemisette
S
2

There is a cheap Excel Add-in that solves this problem: SumMatch

SumMatch in action

Selfcontrol answered 30/12, 2012 at 2:18 Comment(0)
S
2

The Excel Solver Addin as posted over on superuser.com has a great solution (if you have Excel) https://superuser.com/questions/204925/excel-find-a-subset-of-numbers-that-add-to-a-given-total

Stammel answered 30/7, 2013 at 14:50 Comment(0)
E
1

Its kind of like 0-1 Knapsack problem which is NP-complete and can be solved through dynamic programming in polynomial time.

http://en.wikipedia.org/wiki/Knapsack_problem

But at the end of the algorithm you also need to check that the sum is what you wanted.

Emblazonry answered 21/10, 2010 at 16:46 Comment(0)
Y
0

Depending on your data you could first look at the cents portion of each transaction. Like in your initial example you know that 2.50 has to be part of the total because it is the only set of non-zero cent transactions which add to 50.

Youlandayoulton answered 27/10, 2010 at 18:15 Comment(1)
But in order to figure this out, you had to scan all the other values to make sure that none of them could add up to have a cent value of "50", which would take just about as long as if you'd figured out the original problem in the first place.Faugh
B
0

Not a super efficient solution but heres an implementation in coffeescript

combinations returns all possible combinations of the elements in list

combinations = (list) ->
        permuations = Math.pow(2, list.length) - 1
        out = []
        combinations = []

        while permuations
            out = []

            for i in [0..list.length]
                y = ( 1 << i )
                if( y & permuations and (y isnt permuations))
                    out.push(list[i])

            if out.length <= list.length and out.length > 0
                combinations.push(out)

            permuations--

        return combinations

and then find_components makes use of it to determine which numbers add up to total

find_components = (total, list) ->
    # given a list that is assumed to have only unique elements

        list_combinations = combinations(list)

        for combination in list_combinations
            sum = 0
            for number in combination
                sum += number

            if sum is total
                return combination
        return []

Heres an example

list = [7.2, 3.3, 4.5, 6.0, 2, 4.1]
total = 7.2 + 2 + 4.1

console.log(find_components(total, list)) 

which returns [ 7.2, 2, 4.1 ]

Burmaburman answered 27/6, 2013 at 0:3 Comment(0)
B
0
#include <stdio.h>
#include <stdlib.h>

/* Takes at least 3 numbers as arguments.
 * First number is desired sum.
 * Find the subset of the rest that comes closest
 * to the desired sum without going over.
 */
static long *elements;
static int nelements;

/* A linked list of some elements, not necessarily all */
/* The list represents the optimal subset for elements in the range [index..nelements-1] */
struct status {
    long sum;                    /* sum of all the elements in the list */
    struct status *next;         /* points to next element in the list */
    int index;                   /* index into elements array of this element */
};

/*  find the subset of elements[startingat .. nelements-1]  whose sum is closest to but does not exceed desiredsum */
struct status *reportoptimalsubset(long desiredsum, int startingat) {
    struct status *sumcdr = NULL;
    struct status *sumlist = NULL;

    /* sum of zero elements or summing to zero */
    if (startingat == nelements || desiredsum == 0) {
        return NULL;
    }

    /* optimal sum using the current element */
    /* if current elements[startingat] too big, it won't fit, don't try it */
    if (elements[startingat] <= desiredsum) {
        sumlist = malloc(sizeof(struct status));
        sumlist->index = startingat;
        sumlist->next = reportoptimalsubset(desiredsum - elements[startingat], startingat + 1);
        sumlist->sum = elements[startingat] + (sumlist->next ? sumlist->next->sum : 0);
        if (sumlist->sum == desiredsum)
            return sumlist;
    }

    /* optimal sum not using current element */
    sumcdr = reportoptimalsubset(desiredsum, startingat + 1);

    if (!sumcdr) return sumlist;
    if (!sumlist) return sumcdr;

    return (sumcdr->sum < sumlist->sum) ?  sumlist : sumcdr;
}

int main(int argc, char **argv) {
  struct status *result = NULL;
  long desiredsum = strtol(argv[1], NULL, 10);

  nelements = argc - 2;
  elements = malloc(sizeof(long) * nelements);

  for (int i = 0; i < nelements; i++) {
      elements[i] = strtol(argv[i + 2], NULL , 10);
  }

  result = reportoptimalsubset(desiredsum, 0);
  if (result)
      printf("optimal subset = %ld\n", result->sum);

  while (result) {
      printf("%ld + ", elements[result->index]);
      result = result->next;
  }

  printf("\n");

}
Braiding answered 13/4, 2022 at 15:50 Comment(1)
With some memoing to prevent duplicate calls, this would become a dynamic programming solution. I use it to figure out which bills I can pay when I have 10 bills that add up to more than my bank account, or to figure out which mortgages I can pay off with a given amount of capital.Braiding
B
0

Best to avoid use of floats and doubles when doing arithmetic and equality comparisons btw.

Braiding answered 14/4, 2022 at 15:58 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.