Algorithm to find best dimensions combination
Asked Answered
A

2

7

I am looking for an algorithm to find the best dimension combination to accomplish a desired result.

Take the following as example:

|   A    |    B   |   C   |  y  |
|--------|--------|-------|-----|
| dog    | house1 | green | 30  |
| dog    | house1 | blue  | 15  |
| cat    | house1 | green | 20  |
| cat    | house2 | red   |  5  |
| turtle | house3 | green | 50  |

A, B, C are the measured dimensions. y is the measured result.

If I want to get all combinations of dimensions that accomplish y >= 50 so the results will be:

turtle, house3, green
turtle, any,    green
turtle, house3, any
turtle, any,    any
any,    house3, green
any,    house3, any
any,    any,    green
any,    house1, green
any,    house1, any

Maybe it's a easy problem but I was trying to figure an optimal solution in terms of O(n) and I didn't found it.

Apiculture answered 19/9, 2015 at 10:36 Comment(5)
Almost certainly related to Linear Programming. The solutions will be parts of (maybe "slices through"?) the simplex. Looking forward to see approaches for this. BTW: linear referring to the number of rows of the table? This may be difficult. My gut feeling is that it will be at least O(n*m), for 'n' rows and 'm' columns, and is likely to be even more expensive...Leboeuf
Can you explain the outputs? In what sense is any, house1, any a solution? Do you add the corresponding y values, getting 30 + 15 + 20 = 65 in that case? (Perhaps more background would be useful: what sort of quantity does y represent, and why does it make sense to be summing elements of the y column?)Posthorse
@MarkDickinson you are right, sum(y) when A=any, B=house1, C=anyApiculture
do you know the values of turtle, house1,2... - is this a sum of those values? Or all you've got is this example table it is your task to guess other combinations using this table only? If you know the basic values this seems like solvable by en.wikipedia.org/wiki/Simplex_algorithmSensor
@Sensor I don't know all possible values of each column but I can get them using some operations like distinct(A)Apiculture
A
0

Back here, 8 years later to answer the question using ClickHouse:

WITH table AS (
    SELECT 'dog' AS a, 'house1' AS b, 'green' AS c, 30 AS y
    UNION ALL
    SELECT 'dog' AS a, 'house1' AS b, 'blue' AS c, 15 AS y
    UNION ALL
    SELECT 'cat' AS a, 'house1' AS b, 'green' AS c, 20 AS y
    UNION ALL
    SELECT 'cat' AS a, 'house2' AS b, 'red' AS c,  5 AS y
    UNION ALL
    SELECT 'turtle' AS a, 'house3' AS b, 'green' AS c, 50 AS y
)
SELECT a, b, c, sum(y) y FROM table GROUP BY CUBE(a, b, c)
HAVING y >= 50
FORMAT PrettyCompactMonoBlock;

┌─a──────┬─b──────┬─c─────┬───y─┐
│ turtle │ house3 │ green │  50 │
│ turtle │ house3 │       │  50 │
│ turtle │        │ green │  50 │
│ turtle │        │       │  50 │
│        │ house3 │ green │  50 │
│        │ house1 │ green │  50 │
│        │ house3 │       │  50 │
│        │ house1 │       │  65 │
│        │        │ green │ 100 │
│        │        │       │ 120 │
└────────┴────────┴───────┴─────┘
Apiculture answered 1/2, 2023 at 8:49 Comment(0)
C
4

Start with a work queue containing (any, any, ..., any), 0. The elements of this queue will be pairs consisting of a combination and a number of elements on the left that cannot be changed from any (this will make more sense shortly). Until the work queue is empty, remove one element from it and compute the corresponding sum. If it doesn't meet the threshold, then discard it. Otherwise, report it as one of the sought combinations. For each any that can be changed, for each value in that column, enqueue a combination consisting of the current one with any replaced by that value, with the index locking down all previous any values.

Considering an output-sensitive bound, this is within a polynomial factor of optimal (in general, there can be exponentially many combinations).

In Python 3:

def overthreshold(data, threshold):
    queue = [(('any',) * len(data[0][0]), 0)]
    for combination, begin in queue:
        if sum(row[1] for row in data
               if all(x in {'any', y}
                      for x, y in zip(combination, row[0]))) < threshold:
            continue
        yield combination
        for i in range(begin, len(combination)):
            if combination[i] == 'any':
                queue.extend((combination[:i] + (x,) + combination[i+1:], i + 1)
                             for x in {row[0][i] for row in data})


def demo():
    data = [
        (('dog',    'house1', 'green'), 30),
        (('dog',    'house1', 'blue'),  15),
        (('cat',    'house1', 'green'), 20),
        (('cat',    'house2', 'red'),    5),
        (('turtle', 'house3', 'green'), 50),
    ]
    for combination in overthreshold(data, 50):
        print(combination)
Collective answered 22/9, 2015 at 12:18 Comment(0)
A
0

Back here, 8 years later to answer the question using ClickHouse:

WITH table AS (
    SELECT 'dog' AS a, 'house1' AS b, 'green' AS c, 30 AS y
    UNION ALL
    SELECT 'dog' AS a, 'house1' AS b, 'blue' AS c, 15 AS y
    UNION ALL
    SELECT 'cat' AS a, 'house1' AS b, 'green' AS c, 20 AS y
    UNION ALL
    SELECT 'cat' AS a, 'house2' AS b, 'red' AS c,  5 AS y
    UNION ALL
    SELECT 'turtle' AS a, 'house3' AS b, 'green' AS c, 50 AS y
)
SELECT a, b, c, sum(y) y FROM table GROUP BY CUBE(a, b, c)
HAVING y >= 50
FORMAT PrettyCompactMonoBlock;

┌─a──────┬─b──────┬─c─────┬───y─┐
│ turtle │ house3 │ green │  50 │
│ turtle │ house3 │       │  50 │
│ turtle │        │ green │  50 │
│ turtle │        │       │  50 │
│        │ house3 │ green │  50 │
│        │ house1 │ green │  50 │
│        │ house3 │       │  50 │
│        │ house1 │       │  65 │
│        │        │ green │ 100 │
│        │        │       │ 120 │
└────────┴────────┴───────┴─────┘
Apiculture answered 1/2, 2023 at 8:49 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.