Get the most efficient combination of a large List of objects based on a field

Asked 28/11, 2019 at 17:50 Answered 3/12, 2019 at 21:35

Solved python python-3.x combinations knapsack-problem

I'm looking to maximize the number of stars given a certain budget and max limit on the combination.

Example question:

With a budget of 500 euro, visiting only the maximum allowed restaurants or less, dine and collect the most stars possible.

I'm looking to write an efficient algorithm, that could potentially process 1 million Restaurant instances for up to 10 max Restaurants.

Note, this is a cross post from a question I asked yesterday: Java: Get the most efficient combination of a large List of objects based on a field

The solution below will assign 15$ per star to the r8 Restaurant, which means that when generating the list, it puts that into the list first, and with the remaining 70$ it can only get 2 more stars giving a total of 4 stars. However, if it was smart enough to skip the r8 restaurant ( even though it's the best dollar per star ratio ) the r1 restaurant would actually be a better choice for the budget, as it's 100$ cost and 5 stars.

Can anyone help attempt the problem and beat the current solution?

import itertools

class Restaurant():
  def __init__(self, cost, stars):
    self.cost = cost
    self.stars = stars
    self.ratio = cost / stars

  def display(self):
    print("Cost: $" + str(self.cost))
    print("Stars: " + str(self.stars))
    print()

r1 = Restaurant(100, 5)
r2 = Restaurant(140, 3)
r3 = Restaurant(90, 4)
r4 = Restaurant(140, 3)
r5 = Restaurant(120, 4)
r6 = Restaurant(60, 1)
r7 = Restaurant(40, 1)
r8 = Restaurant(30, 2)
r9 = Restaurant(70, 2)
r10 = Restaurant(250, 5)

print()
print("***************")
print("** Unsorted: **")
print("***************")
print()

restaurants = [r1, r2, r3, r4, r5, r6, r7, r8, r9, r10]

for restaurant in restaurants:
  print(restaurant.ratio, restaurant.stars)

print()
print("***************")
print("**  Sorted:  **")
print("***************")
print()

sorted_restaurants = sorted(restaurants, key = lambda x: x.ratio, reverse = True)

for restaurant in sorted_restaurants:
  print(restaurant.ratio, restaurant.stars)

print()
print("*********************")
print("** Begin Rucksack: **")
print("*********************")
print()

max = 5
budget = 100

spent = 0
quantity = 0

rucksack = []

for i in itertools.count():

  if len(rucksack) >= max or i == len(sorted_restaurants):
    break

  sorted_restaurants[i].display()

  if sorted_restaurants[i].cost + spent <= budget:
    spent = spent + sorted_restaurants[i].cost
    rucksack.append(sorted_restaurants[i])
  
print("Total Cost: $" + str(sum([x.cost for x in rucksack])))
print("Total Stars: " + str(sum([x.stars for x in rucksack])))

print()
print("*****************")
print("** Final List: **")
print("*****************")
print()

for restaurant in rucksack:
  restaurant.display()

Briones answered 28/11, 2019 at 17:50 Comment(4)

Is this knapsack? Forgive me, I skimmed. – Cockaleekie 28/11, 2019 at 20:54

It is the same concept of knapsack -- budget = max rucksack weight in kg, max = number of items the knapsack can hold, stars = some value on the item and cost = item weight in kg – Briones 28/11, 2019 at 23:0

And what's the issue with the code posted? – Moult 29/11, 2019 at 15:36

@cricket_007 based on the order, it assigns 15$ per star to the r8 Restaurant, which means that when generating the list, it puts that into the list first, and with the remaining 70$ it can only get 2 more stars. However, if it was smart enough to skip that ( even though it's the best dollar per star ratio, the r1 restaurant would actually be a better choice for the budget, as it's 100$ cost and 5 stars – Briones 29/11, 2019 at 20:56

Sounds like your problem is pretty much the same as the Knapsack problem: Maximize value given certain weight and volume constraints. Basically value = total stars, weight = price, rucksack limit = total budget. Now there's an additional constraint of total "items" (restaurant visits) but that doesn't change the gist.

As you may or may not know, the knapsack problem is NP hard, which means no algorithm with polynomial time scaling is known.

However, there may be efficient pseudopolynomial algorithms using dynamic programming, and of course there are efficient heuristics, such as the "greedy" heuristic you seem to have discovered. This heuristic involves starting to fill up with the highest "density" items (most stars per buck) first. As you have seen, this heuristic fails to find the true optimum in some cases.

The dynamic programming approach should be pretty good here. It's based on a recursion: Given a budget B and a number of remaining visits V, what is the best set of restaurants to visit out of a total set of restaurants R?

See here: https://en.wikipedia.org/wiki/Knapsack_problem#0/1_knapsack_problem

Basically we define an array m for "max stars", where m[i, b, v] is the maximum amount of stars we can get when we are allowed to visits restaurants up to (and including) restaurant number i, spending at most b, and visiting at most v restaurants (the limit).

Now, we bottom-up fill this array. For example, m[0, b, v] = 0 for all values of b and v because if we can't go to any restaurants, we can't get any stars.

Also, m[i, b, 0] = 0 for all values of i and b because if we used up all our visits, we can't get any more stars.

Next line isn't too hard either:

m[i, b, v] = m[i - 1, b, v] if p[i] > b where p[i] is the price of dining at restaurant i. What does this line say? Well, if restaurant i is more expensive than we have money left (b) then we can't go there. Which means the max amount of stars we can get is the same whether we include restaurants up to i or just up to i - 1.

Next line is a bit tricky:

m[i, b, v] = max(m[i-1, b, v]), m[i-1, b - p[i], v-1] + s[i]) if p[i] <= b

Phew. s[i] is the amount of stars you get from restaurant i btw.

What does this line say? It's the heart of the dynamic programming approach. When considering the max amount of stars we can get when looking at restaurants up to and including i, then in the resulting solution we either go there or we don't, and we "just" have to see which of these two paths leads to more stars:

If we don't go to restaurant i, then we keep the same amount of money and remaining visits. The max amount of stars we can get in this path is the same as if we didn't even look at restaurant i. That's the first part in the max.

But if we do go to restaurant i, then we're left with p[i] less money, one fewer visit, and s[i] more stars. That's the second part in the max.

Now the question is simple: which of the two is larger.

You can create this array and fill it with a relatively simple for loop (take inspiration from the wiki). This just gives you the amount of stars though, not the actual list of restaurants to visit. For that, add some extra bookkeeping to the calculation of w.

I hope that information is enough to set you off in the right direction.

Alternatively, you can write your problem in terms of binary variables and a quadratic objective function and solve it on the D-Wave quantum annelaer :-p Message me if you want to know more about that.

Stocktonontees answered 29/11, 2019 at 22:54 Comment(5)

Regarding polynomial time, the max of 10 restaurants means that the problem can be solved by brute force, iterating through all combinations of up to 10 restaurants, and keeping the best one, in O(n^10) time. Now, I don't want to be running an O(n^10) algorithm with n = 10^6 either, but it is polynomial time. – Whiplash 29/11, 2019 at 23:1

Is the "10 restaurants" a truly fixed number though, or just fixed in the above example, and could be larger for a different example? – Stocktonontees 2/12, 2019 at 16:17

That's a good question, and it's not clear which parameters of the problem out to be generalised over when analysing the running time. Of course, there's no known solution that's polynomial in k, I just mean that's quite a weak conclusion if we're only interested in the problem for small k. – Whiplash 2/12, 2019 at 16:43

The "max" number of restaurants could change. This iteration it may be 10, and next it may be 5. – Briones 3/12, 2019 at 14:54

@AK47 Regardless, the algorithm I sketched above should be pretty neat. The size of the multi-dimensional array is given by your budget, the number of restaurants, and the number of visits, and it takes O(1) to fill in one entry of the array, so the algo runs in time O(RBV). – Stocktonontees 3/12, 2019 at 18:47

Using the same idea as my answer here:

In a collection of n positive numbers that sum up to S, at least one of them will be less than S divided by n (S/n)

you could build the list starting from the potential "cheapest" restaurants.

The steps of the algorithm:

Find the 5 restaurants with cost < 500 / 10, each with different stars and the lowest cost for each star. eg r1, r2, r3, r4, r5
For each of the above values, find another 5 restaurants with cost < (500 - cost(x)) / 9 and different stars. Again select lowest cost for each star
do this until you reach 10 restaurants and you do not exceed your budget.
Rerun the 3 steps above for 1 - 9 restaurants limit.
Keep the solution that produces the most stars

Of course, you cannot reselect a restaurant.

I think the worst case, you will have to calculate 5x5x5... = 5^10 + 5^9 + ... + 5^2 + 5 (= about 12 million) solutions.

In javascript

function Restaurant(name, cost, stars) {
    this.name = name;
    this.cost = cost;
    this.stars = stars;
}

function RestaurantCollection() {
    var restaurants = [];
    var cost = 0;
    this.stars = 0;

    this.addRestaurant = function(restaurant) {
        restaurants.push(restaurant);
        cost += restaurant.cost;
        this.stars += restaurant.stars;
    };

    this.setRestaurants = function(clonedRestaurants, nCost, nStars) {
        restaurants = clonedRestaurants;
        cost = nCost;
        this.stars += nStars;
    };
    this.getAll = function() {
        return restaurants;
    };

    this.getCost = function() {
        return cost;
    };
    this.setCost = function(clonedCost) {
        cost = clonedCost;
    };

    this.findNext5Restaurants = function(restaurants, budget, totalGoal) {
        var existingRestaurants = this.getAll();
        var maxCost = (budget - cost) / (totalGoal - existingRestaurants.length);
        var cheapestRestaurantPerStarRating = [];
        for(var stars = 5; stars > 0; stars--) {
            var found = findCheapestRestaurant(restaurants, stars, maxCost, existingRestaurants);
            if(found) {
                cheapestRestaurantPerStarRating.push(found);
            }
        }
        return cheapestRestaurantPerStarRating;
    };

    this.clone = function() {
        var restaurantCollection = new RestaurantCollection();
        restaurantCollection.setRestaurants([...restaurants], this.getCost(), this.stars);
        return restaurantCollection;
    };
}

function findCheapestRestaurant(restaurants, stars, maxCost, excludeRestaurants) {
     var excludeRestaurantNames = excludeRestaurants.map(restaurant => restaurant.name);
     var found = restaurants.find(restaurant => restaurant.stars == stars && restaurant.cost <= maxCost && !excludeRestaurantNames.includes(restaurant.name));
     return found;
}

function calculateNextCollections(restaurants, collections, budget, totalGoal) {
    var newCollections = [];
    collections.forEach(collection => {
        var nextRestaurants = collection.findNext5Restaurants(restaurants, budget, totalGoal);
        nextRestaurants.forEach(restaurant => {
            var newCollection = collection.clone();
            newCollection.addRestaurant(restaurant);
            if(newCollection.getCost() <= budget) {
                 newCollections.push(newCollection);
            }
        });
    });
    return newCollections;
};

var restaurants = [];
restaurants.push(new Restaurant('r1', 100, 5));
restaurants.push(new Restaurant('r2',140, 3));
restaurants.push(new Restaurant('r3',90, 4));
restaurants.push(new Restaurant('r4',140, 3));
restaurants.push(new Restaurant('r5',120, 4));
restaurants.push(new Restaurant('r6',60, 1));
restaurants.push(new Restaurant('r7',40, 1));
restaurants.push(new Restaurant('r8',30, 2));
restaurants.push(new Restaurant('r9',70, 2));
restaurants.push(new Restaurant('r10',250, 5));

restaurants.sort((a, b) => a.cost - b.cost);
var max = 5;
var budget = 100;

var total = max;
var totalCollections = [];

for(var totalGoal = total; totalGoal > 0; totalGoal--) {
    var collections = [new RestaurantCollection()];

    for(var i = totalGoal; i > 0; i--) {
        collections = calculateNextCollections(restaurants, collections, budget, totalGoal);
    }
    totalCollections = totalCollections.concat(collections);
}

var totalCollections = totalCollections.map(collection => { 
      return {
          name: collection.getAll().map(restaurant => restaurant.name),
          stars: collection.stars,
          cost: collection.getCost()
      }
});

console.log("Solutions found:\n");
console.log(totalCollections);

totalCollections.sort((a, b) => b.stars - a.stars);
console.log("Best solution:\n");
console.log(totalCollections[0]);

Conlan answered 3/12, 2019 at 21:35 Comment(2)

Hey @Jannes Botis, It's taking 27 seconds for 100000 Restaurants: repl.it/repls/StripedMoralOptimization Do you think it's possible to optimize it to work with 1million records? – Briones 13/12, 2019 at 0:41

The bottleneck is .filter() function inside findCheapestRestaurant(), you could sort() the restaurants on cost after they are created and use .find() instead of filter() since only the 1st found will be the cheapest. I made the change in the link. But I think the best solution would be to use a database(e.g. mysql) for restaurants with an index on cost, so that you can replace .filter() with a conditional select. – Conlan 13/12, 2019 at 15:12

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Example question:

Recommended topics

Hot tags