For any Linux Distro's packaging system, find the maximum number of simultaneously installable packages
Asked Answered
E

1

13

Some packages conflict, so it's impossible to install all available packages at once. What's the maximum possible number of installable packages for a given system? A brute-force trial and error method would be:

  1. make a list of every possible of package name, i.e. dglob -a > list
  2. from that, make sublists slist1 slist2 slist3 ... of every possible package combination. On my system dglob -a | wc -l returns 91327, which would require an unfeasibly large number (1.467×10^27492) of files.
  3. run apt-get install on each list, and rm those that produce conflicts.
  4. sort remaining lists by number of lines, and show the longest one. wc -l slist* | head -n -1 | sort -g | tail -1.

Easy, but too heavy on resources, so maybe there's some more feasible method.

Various related questions follow from that, such as:

  • given a package 'foo', how to find the maximum number of installable packages that don't conflict with 'foo'?

  • For all possible packages, which has the smallest such maximum, (making it the most "quarrelsome" package)?

(Note: The question applies to Debian, Red Hat, and any distro or OS with a packaging system that maps out package conflicts. Answers for any applicable platform would be valid.)


Background:

Debian has tens of thousands of packages. dglob (from the debian-goodies package) is handy for getting a quick count:

# show how many packages installed and available on the current system
echo $(dglob    | wc -l) packages installed of \
     $(dglob -a | wc -l) packages.

Example output, (both numbers may periodically fluctuate after updates and upgrades, and will vary between systems):

5107 packages installed of 91327 packages.

The number 5107's not a maximum of course, but there must be a maximum.

Emptyhanded answered 1/4, 2016 at 13:51 Comment(17)
Questions about general software are off-topic, since every package isn't focused on tools for software development. Your question might be a better fit for the Unix and Linux stackexchange site. However, how many mp3 players do you really need to install?Cinchonism
Oddly enough, for this question the contents of the packages are wholly irrelevant. Since it makes no difference, then if you like, let us suppose a Debian distro exists consisting exclusively of that subset of Debian packages specifically intended for software development. The question of how to feasibly find a maximum number of installable packages would be unchanged. (Also, see the first line of the OP, which begins with "Note:...", where your question about mp3 players was anticipated, and in part answered.)Emptyhanded
The question of how to feasibly find a maximum number of installable packages would be unchanged And still entirely general use.Cinchonism
Think of it as more of a graph theory and logical problem, which maps to a given package dataset, and whether or not the problem can be solved with the programming tools at hand. It's not a strictly Debian puzzle, it could be asked of Red Hat .rpm files, or any distro's packaging system, or on any OS with a conflicting package tree, and any platform whatsoever. Solve one in the abstract, solve them all. Perhaps this question needs clarifying with regards to that...Emptyhanded
as you rightly claim that this is a general problem of logic and graph-theory, you might want to ask the question on Mathematics (or similar)Fiat
What about packages that don't conflict, but are superceeded by others? Kernels for example... you only need one, but there are lots available in apt...Goffer
@ivanivan, For the sake of the question assume that a superseded status is not relevant, and consider only whether packages conflict or do not conflict. (In other words, yes, install a thousand kernels if possible.)Emptyhanded
Considering only the conflicts, you have an instance of the NP-complete minimum vertex cover problem. You want to choose a minimum set of packages to not install, which includes at least one member of each conflicting pair. The problem would be much more interesting if you also consider Depends relationships: A conflicts with B, but 10 packages depend on A and 7 depend on B, but maybe eliminating those 10 packages solves some other conflicts too, while eliminating the 7 doesn't gain you much elsewhere...Indene
@WumpusQ.Wumbley, Starting with a package list and only their respective individual Depends, we can construct a complete Conflicts list. (OTOH starting with a package list and only a their respective individual Conflicts, inferring their Depends could be quite ambiguous.) For this question, it seems like the Depends and Conflicts relationships are kind of the same thing.Emptyhanded
That makes sense... construct the complete downstream dependency list for each package in a conflicting pair, and take the Cartesian product of those lists to form a bigger conflict list. Then you have a regular old minimum vertex cover problem. But the graph becomes much bigger so this may not be the most efficient method.Indene
@MartijnPieters, Sixteen months ago Elliott Frisch already attempted to close this Q. for the same reason, which has already been addressed in the comments. (i.e. dependencies and package management are even more relevant to programmers than users.) If you are aware of additional reasons for closing, please specify.Emptyhanded
@agc: and they were quite right, this question is simply not on topic for this site. There is no programming problem here, nor would a software developer need to be able to know this information in order to develop software, a package management system is not a tool in that sense. If you want to discuss what is and is not a tool in that respect, please post a question on Meta Stack Overflow.Jacquelynnjacquenetta
@MartijnPieters, Thanks for the prompt reply. While it's obvious you feel strongly about the relevance of this question to SE, your comment is unclear. It seems difficult to imagine generating a minimum conflicting packages number for Debian without programming. Re: "There is no programming problem here, nor does would a software developer need to be able to know this information in order to develop software, it is not a tool in that sense.": very sorry, the sense of that sentence is unclear -- the pronoun "it" refers to which noun?Emptyhanded
I've corrected my comment to make it clearer. If you have a specific piece of code to discuss, then discuss that. Otherwise, if this is really a 'how do I code this up' question, then the too broad close reason applies too.Jacquelynnjacquenetta
@MartijnPieters, Thanks. Re "specific piece of code": the Q. does outline a reasonably specific piece of brute-force code, "(i.e. if it's a Debian system make lists of every possible combination of package names, and run apt-get install on each list, then sort for the longest conflict free list) -- if you insist, I could write actual code that does that, but it should be obvious that it'll virtually never finish, so the description alone seems sufficient.Emptyhanded
You don't need to apt-get install anything, just analyze the dependency graph as outlined in previous comments. Stack Overflow specifically isn't for "what if..." questions; apart from idle curiosity, is there a reason a developer would need to solve this problem? If graph analysis to find the largest subgraphs without conflicts is a problem you want to solve, you can remove all the package management stuff from the question (or, more realistically, ask a better-scoped new question).Immigrate
@tripleee, I'm not sure how to analyze the dependency graph here, hence the question. Re "...developer would need...": see a prior comment on Meta which seems like a better venue for that discussion.Emptyhanded
H
1

The brute force option is the only option in this case. Here is a paper that will describe why in depth but the issue is that package installation and dependency/conflict resolution is a NP-Complete problem.

A problem is in NP if every TRUE answer has an easily-checked polynomial-size explanation. In this case this can be done by listing the installed packages and available packages.

Debian package installation is in NP-hard if an efficient solution for the problem can be adapted to efficient solutions to every other problem in NP. I will defer to the above listed paper as it is a bit complex to prove here but it can be encoded as 3-SAT.

As Debian package installation is in NP and NP-hard, thus it is NP-complete.

Here are some ways the default solver in APT tries to avoid the NP-completeness:

  • Use of heuristics
  • A preference for the first element in an or-group
  • A strict package version convention
  • Giving up when it encounters a major conflict.

Basically the constraints have to be specifically designed to get the problem to fall into known tractable classes for a NP-Complete problem like HORN-SAT

Unfortunately find the maximum possible number of installable packages for a given system pretty much excludes all of the known tractable classes that I am aware of.

So brute force is the only option and an expensive one until a suitable tractable class is discovered or if someone proves that P=NP.

Hamid answered 4/5, 2021 at 18:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.