Update a specific R package and its dependencies
Asked Answered
T

2

47

I have around 4000 R packages installed in my system (a server) and most of them are outdated because they were built before R-3.0.0. Now I know

update.packages(checkBuilt=TRUE, ask=FALSE)

would update all my packages but that's too slow. The thing is the users do not use most of the packages and now and then they ask me to update a package (say fields) they'd use. Now if I run

install.packages("fields")

it would only update the package fields but not the package maps even if fields depends on maps. Thus when I try to load the package fields:

library("fields")

I get an error message

Error: package ‘maps’ was built before R 3.0.0: please re-install it

Is there a way to upgrade fields so that it would also automatically update the packages fields depends on?

Toole answered 9/1, 2014 at 2:40 Comment(6)
Instead of attempting to re-engineer or re-write R's package system, you really truly would be better off to bite the bullet and run update.packages(checkBuilt=TRUE, ask=FALSE).Elbow
I would start with ap <- available.packages(); pkgs <- tools::package_dependencies("fields",db=ap,recursive=TRUE). Then you need to filter out built-in and recommended packages, and install the rest. (This doesn't deal with the order in which the dependency graph goes, but it might work for your case.)Auntie
Please don't undo the edits I made to use the correct markdown for code! You are using the blockquote markdown > and you should be using the code/pre markdown which is to indent by 4 spaces.Callow
Aiya! Did I undo your edits? I just wanted to put double quotes around fields in install.packages(fields).Toole
@Toole Ah, yes, you did. Never mind. I'll edit in the quotes now. Hope the Answer was useful? May need a bit of work to make it bomb proof, but is a start. Also, be careful with the which argument. If I do which = "most" with fields you'll need to install almost 400 packages! For some more popular packages you could end up installing big chunks of CRAN, in which case you might just update all from CRAN at the weekend.Callow
@DirkEddelbuettel: Why is it better to use the existing update.packages() function, with all its limitations? Why do you consider writing new package installation functions to be an attempt to "re-engineer or re-write R's package system"? Isn't this rather an attempt to improve R's package system? And haven't the functions we use, like package_dependencies() and installed.packages(), been made available for just this purpose?Pseudaxis
C
23

As Ben indicated in his comment, you need to get the dependencies for fields, then filter out the packages with Priority "Base" or "Recommended", and then pass that list of package to install.packages() to deal with the installation. Something like:

instPkgPlusDeps <- function(pkg, install = FALSE,
                            which = c("Depends", "Imports", "LinkingTo"),
                            inc.pkg = TRUE) {
  stopifnot(require("tools")) ## load tools
  ap <- available.packages() ## takes a minute on first use
  ## get dependencies for pkg recursively through all dependencies
  deps <- package_dependencies(pkg, db = ap, which = which, recursive = TRUE)
  ## the next line can generate warnings; I think these are harmless
  ## returns the Priority field. `NA` indicates not Base or Recommended
  pri <- sapply(deps[[1]], packageDescription, fields = "Priority")
  ## filter out Base & Recommended pkgs - we want the `NA` entries
  deps <- deps[[1]][is.na(pri)]
  ## install pkg too?
  if (inc.pkg) {
    deps = c(pkg, deps)
  }
  ## are we installing?
  if (install) {
    install.packages(deps)
  }
  deps ## return dependencies
}

This gives:

R> instPkgPlusDeps("fields")
Loading required package: tools
[1] "fields" "spam"   "maps"

which matches with

> packageDescription("fields", fields = "Depends")
[1] "R (>= 2.13), methods, spam, maps"

You get warnings from the sapply() line if a dependency in deps is not actually installed. I think these are harmless as the returned value in that case is NA and we use that to indicate packages we want to install. I doubt it will affect you if you have 4000 packages installed.

The default is not to install packages but just return the list of dependencies. I figured this was safest as you may not realise the chain of dependencies implied and end up installing hundreds of packages accidentally. Pass in install = TRUE if you are happy to install the packages indicated.

Note that I restrict the types of dependencies searched for - things balloon if you use which = "most" - fields has over 300 such dependencies once you recursively resolve those dependences (which include Suggests: fields too). which = "all" will look for everything, including Enhances: which will be a bigger list of packages again. See ?tools::package_dependencies for valid inputs for the which argument.

Callow answered 9/1, 2014 at 5:27 Comment(2)
This worked! Thanks Gavin. BTW I edited the command install.packages(deps) to install.packages(c(pkg,deps)).Toole
I guess, given the name I gave the function, then that change should be made. I'll make it as two other users rejected it. I'll see what I can do about that too, they shouldn't have.Callow
P
12

My answer builds on Gavin's answer... Note that the original poster, user3175783, asked for a more intelligent version of update.packages(). That function skips installing packages that are already up-to-date. But Gavin's solution installs a package and all its dependencies, whether they are up-to-date or not. I used Gavin's tip of skipping base packages (which are not actually installable), and coded up a solution which also skips up-to-date packages.

The main function is installPackages(). This function and its helpers perform a topological-sort of the dependency tree rooted at a given set of packages. The packages in the resulting list are checked for staleness and installed one by one. Here's some example output:

> remove.packages("tibble")
Removing package from ‘/home/frederik/.local/lib/x86_64/R/packages’
(as ‘lib’ is unspecified)
> installPackages(c("ggplot2","stringr","Rcpp"), dry_run=T)
##  Package  digest  is out of date ( 0.6.9 < 0.6.10 )
Would have installed package  digest 
##  Package  gtable  is up to date ( 0.2.0 )
##  Package  MASS  is up to date ( 7.3.45 )
##  Package  Rcpp  is out of date ( 0.12.5 < 0.12.8 )
Would have installed package  Rcpp 
##  Package  plyr  is out of date ( 1.8.3 < 1.8.4 )
Would have installed package  plyr 
##  Package  stringi  is out of date ( 1.0.1 < 1.1.2 )
Would have installed package  stringi 
##  Package  magrittr  is up to date ( 1.5 )
##  Package  stringr  is out of date ( 1.0.0 < 1.1.0 )
Would have installed package  stringr 
...
##  Package  lazyeval  is out of date ( 0.1.10 < 0.2.0 )
Would have installed package  lazyeval 
##  Package  tibble  is not currently installed, installing
Would have installed package  tibble 
##  Package  ggplot2  is out of date ( 2.1.0 < 2.2.0 )
Would have installed package  ggplot2 

Here's the code, sorry about the length:

library(tools)

# Helper: a "functional" interface depth-first-search
fdfs = function(get.children) {
  rec = function(root) {
    cs = get.children(root);
    out = c();
    for(c in cs) {
      l = rec(c);
      out = c(out, setdiff(l, out));
    }
    c(out, root);
  }
  rec
}

# Entries in the package "Priority" field which indicate the
# package can't be upgraded. Not sure why we would exclude
# recommended packages, since they can be upgraded...
#excl_prio = c("base","recommended")
excl_prio = c("base")

# Find the non-"base" dependencies of a package.
nonBaseDeps = function(packages,
  ap=available.packages(),
  ip=installed.packages(), recursive=T) {

  stopifnot(is.character(packages));
  all_deps = c();
  for(p in packages) {
    # Get package dependencies. Note we are ignoring version
    # information
    deps = package_dependencies(p, db = ap, recursive = recursive)[[1]];
    ipdeps = match(deps,ip[,"Package"])
    # We want dependencies which are either not installed, or not part
    # of Base (e.g. not installed with R)
    deps = deps[is.na(ipdeps) | !(ip[ipdeps,"Priority"] %in% excl_prio)];
    # Now check that these are in the "available.packages()" database
    apdeps = match(deps,ap[,"Package"])
    notfound = is.na(apdeps)
    if(any(notfound)) {
      notfound=deps[notfound]
      stop("Package ",p," has dependencies not in database: ",paste(notfound,collapse=" "));
    }
    all_deps = union(deps,all_deps);
  }
  all_deps
}

# Return a topologically-sorted list of dependencies for a given list
# of packages. The output vector contains the "packages" argument, and
# recursive dependencies, with each dependency occurring before any
# package depending on it.
packageOrderedDeps = function(packages, ap=available.packages()) {

  # get ordered dependencies
  odeps = sapply(packages,
    fdfs(function(p){nonBaseDeps(p,ap=ap,recursive=F)}))
  # "unique" preserves the order of its input
  odeps = unique(unlist(odeps));

  # sanity checks
  stopifnot(length(setdiff(packages,odeps))==0);
  seen = list();
  for(d in odeps) {
    ddeps = nonBaseDeps(d,ap=ap,recursive=F)
    stopifnot(all(ddeps %in% seen));
    seen = c(seen,d);
  }

  as.vector(odeps)
}

# Checks if a package is up-to-date. 
isPackageCurrent = function(p,
  ap=available.packages(),
  ip=installed.packages(),
  verbose=T) {

    if(verbose) msg = function(...) cat("## ",...)
    else msg = function(...) NULL;

    aprow = match(p, ap[,"Package"]);
    iprow = match(p, ip[,"Package"]);
    if(!is.na(iprow) && (ip[iprow,"Priority"] %in% excl_prio)) {
      msg("Package ",p," is a ",ip[iprow,"Priority"]," package\n");
      return(T);
    }
    if(is.na(aprow)) {
      stop("Couldn't find package ",p," among available packages");
    }
    if(is.na(iprow)) {
      msg("Package ",p," is not currently installed, installing\n");
      F;
    } else {
      iv = package_version(ip[iprow,"Version"]);
      av = package_version(ap[aprow,"Version"]);
      if(iv < av) {
        msg("Package ",p," is out of date (",
            as.character(iv),"<",as.character(av),")\n");
        F;
      } else {
        msg("Package ",p," is up to date (",
            as.character(iv),")\n");
        T;
      }
    }
}

# Like install.packages, but skips packages which are already
# up-to-date. Specify dry_run=T to just see what would be done.
installPackages =
    function(packages,
             ap=available.packages(), dry_run=F,
             want_deps=T) {

  stopifnot(is.character(packages));

  ap=tools:::.remove_stale_dups(ap)
  ip=installed.packages();
  ip=tools:::.remove_stale_dups(ip)

  if(want_deps) {
    packages = packageOrderedDeps(packages, ap);
  }

  for(p in packages) {
    curr = isPackageCurrent(p,ap,ip);
    if(!curr) {
      if(dry_run) {
        cat("Would have installed package ",p,"\n");
      } else {
        install.packages(p,dependencies=F);
      }
    }
  }
}

# Convenience function to make sure all the libraries we have loaded
# in the current R session are up-to-date (and to update them if they
# are not)
updateAttachedLibraries = function(dry_run=F) {
  s=search();
  s=s[grep("^package:",s)];
  s=gsub("^package:","",s)
  installPackages(s,dry_run=dry_run);
}
Pseudaxis answered 27/11, 2016 at 5:39 Comment(3)
If you downvote, please comment before or after so I know what to fix...Pseudaxis
probably someone downvoted because of the complexity of this answer, but I find it very good one, yet still complexHomiletics
I got this error while trying to run this function: Error in nonBaseDeps(p, ap = ap, recursive = F) : Package dplyr has dependencies not in database: methods utils Calls: installPackages ... sapply -> lapply -> FUN -> get.children -> nonBaseDeps Execution halted Do you know what could have caused it? Running the version from @Gavin works well though.Shortcut

© 2022 - 2024 — McMap. All rights reserved.