Is it a good practice to call functions in a package via ::
Asked Answered
M

1

59

I'm writing some R functions that employ some useful functions in other packages like stringr and base64enc. Is it good not to call library(...) or require(...) to load these packages first but to use :: to directly refer to the function I need, like stringr::str_match(...)?

Is it a good practice in general case? Or what problem might it induce?

Meson answered 23/4, 2014 at 0:38 Comment(1)
require is generally used within a function in a package and this SO post does a good job distinguishing between it and library. If you're sure you're only going to need one (or two) functions from a package, the :: is fine but I only gravitate towards it when there are namespace collisions. And, don't forget about the ::: operator, too.Embroidery
I
82

It all depends on context.

:: is primarily necessary if there are namespace collisions, functions from different packages with the same name. When I load the dplyr package, it provides a function filter, which collides with (and masks) the filter function loaded by default in the stats package. So if I want to use the stats version of the function after loading dplyr, I'll need to call it with stats::filter.

This also gives motivation for not loading lots of packages. If you really only want one function from a package, it can be better to use :: than load the whole package, especially if you know the package will mask other functions you want to use.

Not in code, but in text, I do find :: very useful. It's much more concise to type stats::filter than "the filter function from the stats package".

From a performance perspective, there is a (very) small price for using ::. Long-time R-Core development team member Martin Maechler wrote (on the r-devel mailing list (Sept 2017))

Many people seem to forget that every use of :: is an R function call and using it is inefficient compared to just using the already imported name.

The performance penalty is very small, on the order of a few microseconds, so it's only a concern when you need highly optimized code. Running a line of code that uses :: one million times will take a second or two longer than code that doesn't use ::.

As far as portability goes, it's nice to explicitly load packages at the top of a script because it makes it easy to glance at the first few lines and see what packages are needed, installing them if necessary before getting too deep in anything else, i.e., getting halfway through a long process that now can't be completed without starting over.

Aside: a similar argument can be made to prefer library() over require(). Library will cause an error and stop if the package isn't there, whereas require will warn but continue. If your code has a contingency plan in case the package isn't there, then by all means use if (require(package)) ..., but if your code will fail without a package you should use library(package) at the top so it fails early and clearly.

Within your own package

The general solution is to make your own package that imports the other packages you need to use in the DESCRIPTION file. Those packages will be automatically installed when your package is installed, so you can use pkg::fun internally. Or, by also importing them in the NAMESPACE file, you can import an entire package or selectively importFrom specific functions and not need ::. Opinions differ on this. Martin Maechler (same r-devel source as above) says:

Personally I've got the impression that :: is much "overused" nowadays, notably in packages where I'd strongly advocate using importFrom() in NAMESPACE, so all this happens at package load time, and then not using :: in the package sources itself.

On the other hand, RStudio Chief Scientist Hadley Wickham says in his R Packages book:

It's common for packages to be listed in Imports in DESCRIPTION, but not in NAMESPACE. In fact, this is what I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun(). Unless there is a strong reason not to, it's better to be explicit.

With two esteemed R experts giving opposite recommendations, I think it's fair to say that you should pick whichever style suits you best and meets your needs for clarity, efficiency, and maintainability.


If you frequently find yourself using just one function from another package, you can copy the code and add it to your own package. For example, I have a package for personal use that borrows %nin% from the Hmisc package because I think it's a great function, but I don't often use anything else from Hmisc. With roxygen2, it's easy to add @author and @references to properly attribute the code for a borrowed function. Also make sure the package licenses are compatible when doing this.

Isometry answered 23/4, 2014 at 0:49 Comment(7)
Another advantage to using library(...) at the top of your script is that if someone tries to source your file without the package installed the source command will fail early (and not after potentially long data loads or manipulations).Tachycardia
require is written to return a logical value making it easy to use in the construction: if( !require(pkg) ){ cat("informative error message")}Archivolt
@BondedDust In a case like that it'd be better to use stop or warning instead of cat.Isometry
If you're writing a package, I think it's better to use :: rather than namespace imports or importsFrom (unless you're using a lot of functions from the other package)Tawana
@Tawana I'll defer to you on this, but I'd like some clarification. If the primary functionality of your package needs pkg::foo--even if foo is the only function needed from pkg---then shouldn't pkg be in Imports or Depends to guarantee that it is installed? If relegated to Suggests or Enhances then there is no guarantee that pkg is there.Isometry
@shujaa Yes, it should be in DESCRIPTION imports, but it shouldn't be in NAMESPACE import(pkg) or importFrom(pkg,foo) (I think my original comment was a bit confusing)Tawana
Short comment on readability of the code might be added. Having each function call prefixed with package name can make the code messy. Especially with some function call being argument of another function call, etc... (common in tidyr)Drought

© 2022 - 2024 — McMap. All rights reserved.