Why is this CLP(FD) constraint solving slowly and how do I debug it?
Asked Answered
P

2

5

I'm learning Prolog by doing the Advent of Code challenges.

Spoilers for Advent of Code 2021 day 7 below:

The objective is: given a list of natural numbers n_1,..., n_k, find

min_(x\in \N) \sum_i=0^k |x - n_i|

In the below code norm1 computes the summand, cost computes the sum at a given x, and lowest_cost computes the minimum over all xs.

norm1(X, Y, N) :- N #= abs(X - Y).

cost(Nums, X, Cost) :-
    max_list(Nums, MaxX),
    min_list(Nums, MinX),
    X in MinX..MaxX,
    maplist(norm1(X), Nums, Costs),
    sum(Costs, #=, Cost).


lowest_cost(Nums, Cost) :-
    cost(Nums, X, Cost)
    once(labeling([min(Cost)], [X, Cost])).

Some sample queries:

?- lowest_cost([10,11,12], Cost).
Cost = 2.

?- cost([2,4,6,8], 4, Cost).
Cost = 8.

?- cost([2,4,6,8], 5, Cost).
Cost = 8.

?- cost([2,4,6,8], 8, Cost).
Cost = 12.

?- lowest_cost([2,4,6,8], Cost).
Cost = 8.

With short lists this solves it correctly and quickly, but when I try it on the full length 1,000 input it spins indefinitely.

The way I imagine it would solve this is by just trying all of the X in the domain, computing the cost for each, and taking the smallest. This doesn't seem to be the case though; the time seems to scale superlinearly with the length of the list and not at all with the range of the numbers in the list.

  1. How can I direct CLP(FD) to search in the manner I described, or another efficient way?
  2. How can I better understand what the solver is trying to do? trace is pretty messy and I'm not sure how to get any insight from it.
Philharmonic answered 7/12, 2021 at 21:39 Comment(5)
Which precise queries did you use?Polonaise
@Polonaise updated with the queryPhilharmonic
please show us several short queries as is and their results (for 3 to 5 elements in the input list so it is easier to understand). right now the question is unclear. on the one hand you ask "the number which minimizes the summed absolute differences of the number from the list elements". on the other, your code talks about "cost" and "positions"?? in particular, the number that you want, is it a member of the list, or just any number? (which is what your wording seems to imply) also, what is norm1/3?Ladyship
@WillNess thanks for the input. I've updated to hopefully make it clearer. I actually just want the value of the minimum "cost", the position (x) is not unique (it turns out that we can always chose it to be one of the input numbers, but that doesn't generalize to other cost functions).Philharmonic
@WillNess see adventofcode.com/2021/day/7 in the story, the numbers represent "positions" of military tanks along a 1-dimensional line. Moving a tank "costs" 1-fuel-unit-per-meter. The story asks: at which point on the line should all the tanks meet, to minimise total fuel cost of moving them all to one point? It's ((distance to move every tank from its start position to {point}) for every possible {point}). Site gives example [16,1,2,0,4,2,7,1,2,14] meet at position 2 for total cost 37.Frore
A
3

The way I imagine it would solve this is by just trying all of the X in the domain, computing the cost for each, and taking the smallest. This doesn't seem to be the case though; the time seems to scale superlinearly with the length of the list and not at all with the range of the numbers in the list.

What experiments did you do to come to this conclusion?

?- time(lowest_cost([0, 1000], Cost)).
% 596,377 inferences, 0.088 CPU in 0.088 seconds (100% CPU, 6799507 Lips)
Cost = 1000.

?- time(lowest_cost([0, 10000], Cost)).
% 5,933,218 inferences, 4.041 CPU in 4.041 seconds (100% CPU, 1468244 Lips)
Cost = 10000.

?- time(lowest_cost([0, 100000], Cost)).
^CAction (h for help) ? abort
% 31,000,962 inferences, 99.612 CPU in 100.751 seconds (99% CPU, 311218 Lips)
% Execution Aborted

How can I direct CLP(FD) to search in the manner I described, or another efficient way?

Don't withhold information that you know must hold. As you note in a comment, "we can always chose it to be one of the input numbers", so model that:

numbers_domain([N], N).
numbers_domain([N | Ns], N \/ Domain) :-
    numbers_domain(Ns, Domain).

cost_at_position(Nums, Pos, Cost) :-
    numbers_domain(Nums, Domain),
    Pos in Domain,
    maplist(norm1(Pos), Nums, Costs),
    sum(Costs, #=, Cost).

With this you get:

?- time(lowest_cost([0, 1000], Cost)).
% 5,252 inferences, 0.002 CPU in 0.002 seconds (100% CPU, 2439400 Lips)
Cost = 1000 .

?- time(lowest_cost([0, 10000], Cost)).
% 5,252 inferences, 0.002 CPU in 0.002 seconds (100% CPU, 2534299 Lips)
Cost = 10000 .

?- time(lowest_cost([0, 100000], Cost)).
% 5,252 inferences, 0.002 CPU in 0.002 seconds (100% CPU, 2952068 Lips)
Cost = 100000 .

How can I better understand what the solver is trying to do?

A technique I learned from this great answer is to add freeze/2 goals on your constraint variables. You will then get notified every time the solver tries to bind a variable to a concrete value.

For example:

cost_at_position(Nums, Pos, Cost) :-
    ...
    freeze(Pos, log('Pos', Pos)),
    freeze(Cost, log('Cost', Cost)).
    
log(Label, Value) :-
    write(Label), write(': '), writeln(Value).

?- lowest_cost([2,4,6,8], Cost).
Cost: 12
Pos: 2
Cost: 10
Pos: 3
Cost: 8
Pos: 4
Cost: 8
Pos: 4
Cost = 8.

I'm not sure it helps a lot in this particular case, though at least you can see that the solver tries position 3, which doesn't occur in the input positions, and which therefore (for this cost function) is useless.

Anceline answered 8/12, 2021 at 21:29 Comment(9)
I don't know if the OP knows it, but for the question Part 2 "we can always chose it to be one of the input numbers" no longer holds, and the solver does need to try points in between given numbers. Trying OP's code with bagof(X, between(0, 10, X), Input), time(lowest_cost(Input, Result)). 50 numbers takes 2 seconds and 11M inferences, if linear, doubling the quantity of numbers would roughly double the runtime, but 100 numbers takes 16 seconds and 81M inferences. 120 numbers 30 seconds and 145M inferences.Frore
(My unrelated brute-force non-clpfd works through all 1000 numbers in 10 seconds and 66M inferences, I'm here wondering if clpfd could dramatically cut that down, but I don't know enough to use it like this)Frore
Yes, I intentionally didn't use that fact because of part2. That's what I was referring to in my comment "it turns out that we can always chose it to be one of the input numbers, but that doesn't generalize to other cost functions". Maybe I should have been more specific.Philharmonic
@Frore the numbers you give show a cubic rate of growth. (also, thanks for the link in the other comment)Ladyship
@Frore re non-clpfd, I came up with what I think is a linear algorithm (in my head, i.e. not implemented). what time/infs does your non-clpfd code take for 500 numbers?Ladyship
@WillNess the optimal solution is in fact linear (both for part 1 and 2). Mine works in well under a second for the full solution.Philharmonic
@TjadenHess is it CLP(FD) based?Ladyship
No, just a simple manual solution.Philharmonic
@IsabelleNewbie thanks for the link to the other answer, it's very interestingPhilharmonic
C
3

There are a few things that could be better here. First, Cost is entirely determined by X so there is no reason to include Cost in the variables to be labeled. Second, avoid using once because it makes programs impure and hard to reason about. You can always stop searching for solutions interactively. Third, using the labeling option enum seems to be closer to how you wish to search.

So your labeling call should be labeling([enum, min(Cost)], X). Using the code below with input taken from Advent of Code Day 7:

:- use_module(library(clpfd)).

input([1101,1,29,67,1102,0,1,65,1008,65,35,66,1005,66,28,1,67,65,20,4,0,1001,65,1,65,1106,0,8,99,35,67,101,99,105,32,110,39,101,115,116,32,112,97,115,32,117,110,101,32,105,110,116,99,111,100,101,32,112,114,111,103,114,97,109,10,346,92,161,1,634,8,35,96,1341,1149,237,33,593,459,801,98,1160,322,67,98,1219,475,12,274,24,1111,1134,14,195,234,654,202,1057,598,15,471,1583,70,4,244,96,407,51,1158,275,962,1034,694,696,271,591,389,1067,317,99,1321,177,18,257,1569,238,492,1006,857,33,31,984,296,146,1910,214,367,600,62,1365,478,31,238,384,1013,732,445,214,645,123,1069,391,80,1052,70,886,18,1472,547,94,1483,729,1220,1246,694,615,775,581,1056,405,428,138,1227,23,0,273,466,963,1250,324,1628,1122,498,588,0,236,499,390,92,64,1190,387,47,501,62,269,470,720,567,694,666,280,0,57,203,377,1061,781,857,698,50,291,370,1494,8,1124,665,113,381,457,901,151,932,95,555,72,156,192,170,606,1033,39,542,19,453,1286,797,1055,190,1,1075,1007,932,1503,224,209,138,559,532,342,115,772,728,470,479,122,76,174,810,270,1284,210,1182,176,683,1548,73,605,252,879,180,482,544,479,755,282,1617,486,183,551,369,916,32,234,516,1,212,6,1094,224,1316,694,195,1256,371,413,287,916,250,1143,126,574,1523,14,659,152,90,76,333,15,122,222,165,1184,476,682,75,298,1630,285,777,1167,381,606,161,287,136,329,641,560,516,1491,142,39,203,1147,459,505,586,186,234,99,591,213,323,355,653,1030,586,29,136,1021,773,1241,1099,564,65,226,8,337,179,117,1599,29,871,503,189,1033,193,278,786,1270,517,427,93,43,35,66,77,128,9,3,1277,1564,1005,219,1205,1517,739,60,25,401,408,441,143,108,506,1638,588,406,828,11,147,1167,1434,458,678,244,214,42,67,129,121,280,563,445,216,712,158,914,981,454,362,775,582,409,1659,1236,9,408,519,885,163,918,1001,1044,109,451,805,114,1375,251,331,147,1580,14,368,3,723,21,1771,20,188,228,247,124,726,615,543,297,347,765,816,668,649,1061,1732,328,1197,690,497,367,1219,957,1206,188,133,196,222,47,479,1901,243,859,1331,976,541,556,584,1337,156,1675,349,1321,817,764,303,359,42,992,367,271,138,163,329,444,591,15,1137,1418,1051,24,254,1867,149,169,295,230,1243,1372,263,43,973,485,676,463,563,380,402,446,518,698,1318,49,172,479,215,377,1110,1774,1154,707,158,879,259,473,362,245,1466,1074,527,636,307,1522,1371,1228,556,522,423,161,39,1602,1135,437,89,273,320,1031,838,133,123,189,816,539,239,568,878,917,82,905,291,825,268,1391,326,26,793,55,1356,629,84,241,261,1220,295,23,642,403,809,168,28,86,434,1178,12,1145,106,1091,942,168,1761,788,666,376,1353,44,12,209,658,23,205,964,964,566,367,336,62,462,565,151,505,1264,23,40,251,140,104,20,328,222,734,296,14,89,199,715,382,200,246,34,3,549,173,650,1219,52,626,23,65,802,334,286,1039,254,408,528,608,1554,516,83,429,1454,384,709,414,384,397,27,706,586,125,91,81,278,178,111,263,102,190,1235,287,1113,34,50,258,126,191,268,1262,745,1205,217,16,45,829,263,52,229,822,639,955,729,1251,98,112,94,550,247,269,1001,22,1282,420,700,41,445,493,19,44,75,1518,36,943,68,1697,172,558,1232,1229,122,234,755,499,845,3,1448,100,662,654,983,92,126,89,391,1672,1546,324,149,136,412,649,288,633,1226,10,1725,717,88,50,890,820,1114,1519,15,162,1769,963,610,241,350,502,73,249,263,143,324,180,615,426,139,94,5,954,117,657,418,170,635,5,276,8,723,344,32,822,3,323,11,22,471,811,51,52,49,1,575,20,1,287,664,277,252,551,366,836,181,559,35,27,28,28,856,1057,349,447,602,447,258,1874,1493,452,138,846,1530,40,482,60,101,840,120,23,115,281,389,44,1170,37,558,467,357,1090,18,120,526,284,930,885,152,169,674,14,97,639,1935,61,320,1275,1009,13,672,351,194,872,30,214,158,658,302,70,1404,137,3,818,162,910,199,987,392,310,922,962,751,1772,260,108,686,932,204,312,130,794,6,82,49,24,167,188,905,772,422,735,54,931,1040,723,16,640,858,428,81,119,85,45,1550,138,142,508,899,626,9,401,957,158,24,132,548,139,376,115,1661,203,485,1334,860,213,93,128,8,799,611,1470,2,800,353,75,166,26,120,14,869,222,21,1146,78,1500,321,0,738,309,1337,323,460,301,1025,205,717,436,125,166,1282,265,482,373,1247,173,228,729,78,125,366]).

norm1(X, Y, N) :- N #= abs(X - Y).

cost_at_position(Nums, Pos, Cost) :-
    max_list(Nums, MaxPos),
    min_list(Nums, MinPos),
    Pos in MinPos..MaxPos,
    maplist(norm1(Pos), Nums, Costs),
    sum(Costs, #=, Cost).

lowest_cost(Nums, Cost) :-
    cost_at_position(Nums, Pos, Cost),
    labeling([enum, min(Cost)], [Pos]).

The query:

?- time((input(N), lowest_cost(N, C))).
% 6,698,130,873 inferences, 335.580 CPU in 335.663 seconds (100% CPU, 19959861 Lips)
N = [1101, 1, 29, 67, 1102, 0, 1, 65, 1008|...],
C = [Redacted for spoiler] 

Five and a half minutes on a midrange laptop is not great, but it's not unbearable.

The performance was confusing to me, so I tried removing the min(Cost) optimization to see how quickly solutions were generated. To my surprise, they are generated almost instantly, so fast you could exhaust the domain of X manually in this case by typing ; a few thousand times.

I've been looking for why this is a while, and this sort of optimization does not appear to work as intended in SWI-Prolog. If I had to guess, the min optimization causes lots temporary variables to stay around and take up time in constraint propagation when they are no longer relevant.

Crustal answered 9/12, 2021 at 2:29 Comment(2)
If you add a Pos argument to lowest_cost when you remove the min(Cost), you can watch how solutions are generated: Pos is counted up 0, 1, 2, ... . It shouldn't be surprising that these solutions are quick to compute. But they have horrible costs, and the search is pure brute force.Anceline
Thanks, enum is helpful - it lets me achieve the naieve brute-force search that I figured CLP(FD) should be able to do in the worst case. I would still love to see an answer that gives a result in time linear in the number of inputs (not the range of inputs). I know how to do this manually, but it seems like CLP(FD) should have some interesting tricks up it's sleeve.Philharmonic

© 2022 - 2024 — McMap. All rights reserved.