Finding objects from other packages' namespaces in package code
Asked Answered
W

2

11

I'm refactoring a package that imports many other packages' full namespaces. I believe that many of these dependencies are used for single function call uses that would be better handled using importFrom, or are orphaned dependencies that are no longer used.

There's enough code in the package that it would be tedious to manually examine every line looking for unfamiliar function calls.

How can I determine where and how many times objects from imported namespaces are being used in the package? Please note that this package does not include unit tests.

Here is a reproducible example:

DESCRIPTION file:

Package: my_package
Title: title
Version: 0.0.1
Authors@R: person(
  given = "A",
  family = "Person",
  role = c("aut", "cre"),
  email = "[email protected]"
)
Description: Something
License: Some license
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
Imports: 
  dplyr,
  purrr,
  stringr

NAMESPACE file:

import(dplyr)
import(purrr)
import(stringr)

my_package.R file:

#' my_package
#' @docType package
#' @name my_package
NULL
#' @import dplyr
#' @import purrr
#' @import stringr
NULL

functions.R file

#' add 1 to "banana" column and call it "apple"
#' @description demonstrate a variety of dplyr functions
#' @param x a data.frame object
#' @return a data.frame object with columns "apple" and "banana"
#' @examples
#' my_fruit <- data.frame(banana = c(1,2,3), pear = c(4,5,6))
#' my_function(my_fruit)
#' @export
my_function <- function(x) {
  x %>%
    mutate(apple = banana + 1) %>%
    select(apple, banana)
}

I am looking for a solution that would identifies that %>%, mutate and select are exports from dplyr, %>% is an export from purrr, and there are no used exports from the attached namespace stringr. In the case of functions like %>% exported from multiple namespaces it's not that important to me to distinguish which namespace the export is coming from (in the example both %>% are rexports from the magrittr dependency) since where actual masking occurs a warning is generated when the package gets loaded.

Willms answered 11/5, 2021 at 17:55 Comment(3)
You should consider using awk/perl/sed to go over each file and check for a specific function calls from the other packagesPartial
It's easier to help you if you include a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions. You could remove the import and run the package check to find all the errors. or some variant of this answer about finding free variables might work.Size
@Size I have added a reprexWillms
E
7

Here's a base solution

pkgs <- readLines("NAMESPACE")
pattern <- "^import\\((.*?)\\)$"
pkgs <- pkgs[grepl(pattern, pkgs)]
pkgs <- sub(pattern, "\\1", pkgs)
pkgs
#> [1] "dplyr"   "purrr"   "stringr"

exports <- sapply(pkgs, getNamespaceExports)
exports <- do.call(rbind, Map(data.frame, package = pkgs, fun = exports))
rownames(exports) <- NULL
head(exports)
#>   package         fun
#> 1   dplyr rows_upsert
#> 2   dplyr   src_local
#> 3   dplyr  db_analyze
#> 4   dplyr    n_groups
#> 5   dplyr    distinct
#> 6   dplyr  summarise_

code <- sapply(list.files("R", full.names = TRUE), parse)
funs <- sapply(code, function(x) setdiff(all.names(x), all.vars(x)))
funs <- funs[lengths(funs) > 0]
funs <- do.call(rbind, Map(data.frame, fun = funs, file = names(funs)))
rownames(funs) <- NULL
funs
#>        fun          file
#> 1       <- R/functions.R
#> 2 function R/functions.R
#> 3        { R/functions.R
#> 4      %>% R/functions.R
#> 5   mutate R/functions.R
#> 6        + R/functions.R
#> 7   select R/functions.R

final output :

merge(exports, funs)
#>      fun package          file
#> 1    %>% stringr R/functions.R
#> 2    %>%   purrr R/functions.R
#> 3    %>%   dplyr R/functions.R
#> 4 mutate   dplyr R/functions.R
#> 5 select   dplyr R/functions.R

It is not 100% robust as for instance a function function(x) {select<-identity; select(x)} will show select as being taken from {dplyr}.

It will also miss functions that are not used in fun() form, as in lapply(my_list, fun).

We can't really detect those robustly, a way around, that might get us there or at least closer if we have 100% test coverage, is to curry those imported functions so they tell us when they're called, then run the tests.

You probably don't need this though.

Equipment answered 15/5, 2021 at 13:20 Comment(0)
C
3

You could use getParsedData to get all function calls used in the package, and join them with available functions in NAMESPACE to find out their origin.

Tested on reproducible example my_package:

library(dplyr)
library(purrr)
library(stringr)

# List functions used in Package
path <- "./my_package"
files <- file.path(path,list.files(path= path, recursive = TRUE, pattern ='\\.R$'))

functions <- files %>% map_dfr(~{
  getParseData(parse(.x, keep.source=TRUE)) %>% 
          filter(token %in% c("SYMBOL_FUNCTION_CALL","SPECIAL")) %>%
          mutate(file = .x) %>%
          rename(fctname = text) %>%
          select(file, fctname) %>% unique })

# List of all possible functions imports
imports <- readLines(file.path(path,"NAMESPACE"))
imports <- str_match(imports, "import\\(\\s*(.*?)\\s*\\)")[,2]
imports <- imports[!is.na(imports)]

possible.imported.functions <- imports %>% map_dfr(~{
  data.frame(package.import = .x,fctname = getNamespaceExports(.x)) })

# Imported functions in use
inner_join(functions,possible.imported.functions, by = c('fctname'='fctname')) %>%
  arrange(package.import,fctname) %>%
  select(file,package.import,fctname)
#>                             file package.import fctname
#> 1 my_package/R/functions.R          dplyr     %>%
#> 2 my_package/R/functions.R          dplyr  mutate
#> 3 my_package/R/functions.R          dplyr  select
#> 4 my_package/R/functions.R          purrr     %>%
#> 5 my_package/R/functions.R        stringr     %>%

Cutis answered 14/5, 2021 at 16:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.