Finding out which functions are called within a given function [duplicate]
Asked Answered
L

3

9

Possible Duplicate:
Generating a Call Graph in R

I'd like to systematically analyze a given function to find out which other functions are called within that very function. If possible, recursively.

I came across this function in a blog post by milktrader with which I can do something similar for packages (or namespaces)

listFunctions <- function(
    name,
    ...
){ 
    name.0  <- name
    name    <- paste("package", ":", name, sep="")
    if (!name %in% search()) {
        stop(paste("Invalid namespace: '", name.0, "'"))
    }
    # KEEP AS REFERENCE       
#    out <- ls(name)
    funlist <- lsf.str(name)
    out     <- head(funlist, n=length(funlist))
    return(out)
}

> listFunctions("stats")
  [1] "acf"                  "acf2AR"               "add.scope"           
  [4] "add1"                 "addmargins"           "aggregate"           
  [7] "aggregate.data.frame" "aggregate.default"    "aggregate.ts"        
 [10] "AIC"                  "alias"                "anova"               
....
[499] "xtabs"   

Yet, I'd like a function where name would be the name of a function and the return value is a character vector (or a list, if done recursively) of functions that are called within name.

Motivation

I actually need some sort of character based output (vector or list). The reason for this is that I'm working on a generic wrapper function for parallelizing an abitrary "inner function" where you don't have to go through a time consuming trial-and-error process in order to find out which other functions the inner function depends on. So the output of the function I'm after will directly be used in snowfall::sfExport() and/or snowfall::sfSouce.

EDIT 2012-08-08

As there's been some close-votes due to duplicity, I'll check how answers can be merged with the other question tomorrow.

Luciusluck answered 8/8, 2012 at 20:44 Comment(9)
See Generating a Call Graph in R. The keyword being call graphMonoceros
I don't know the answer, but listFunctions <- function(name) ls(paste("package", name, sep=":")) works too.Annelid
@GuyCoder: thanks for the pointer. AFAIU, all answers are will generate a call graph. That's nice, but not exactly what I need (see updated post). I'm sure the functions mentioned will give me valuable insights on how to come up with a function that fits my needs, but maybe there's already something out there?Luciusluck
@KarstenW.: thanks, always nice to see alternative ways! I really just copied it from the post and didn't give it much more thought.Luciusluck
While foodweb will generate a graphic by default, it also returns (invisibly) an object that contains the calling information as a matrix (along with other things). Check out the Value section of the foodweb help page as well as callers.of and callees.of documented in the same page.Boelter
@BrianDiggs: cool, that sounds more like it. Thanks!Luciusluck
It is a call graph even if there is no graphical output. Graph in this sense is in a mathematical sense of nodes and (directional) connecting edges. That is, graph theory.Boelter
@BrianDiggs: ah, okay get it ;-)Luciusluck
@guycoder: hey no problem, didn't want to sound ungreatful, just didn't see the direct link after your first post. Much appreciated!Luciusluck
R
6

There must be better ways out there, but here's my attempt:

listFunctions <- function(function.name, recursive = FALSE, 
                          checked.functions = NULL){

    # Get the function's code:
    function.code <- deparse(get(function.name))

    # break code up into sections preceding left brackets:
    left.brackets <- c(unlist(strsplit(function.code, 
                                       split="[[:space:]]*\\(")))

    called.functions <- unique(c(unlist(sapply(left.brackets, 
                                               function (x) {

        # Split up according to anything that can't be in a function name.
        # split = not alphanumeric, not '_', and not '.'
        words <- c(unlist(strsplit(x, split="[^[:alnum:]_.]")))

        last.word <- tail(words, 1)
        last.word.is.function <- tryCatch(is.function(get(last.word)),
                                      error=function(e) return(FALSE))
        return(last.word[last.word.is.function])
    }))))

    if (recursive){

        # checked.functions: We need to keep track of which functions 
        # we've checked to avoid infinite loops.
        functs.to.check <- called.functions[!(called.functions %in%
                                          checked.functions)]

        called.functions <- unique(c(called.functions,
            do.call(c, lapply(functs.to.check, function(x) {
                listFunctions(x, recursive = T,
                              checked.functions = c(checked.functions,          
                                                    called.functions))
                }))))
    }
    return(called.functions)
}

And the results:

> listFunctions("listFunctions", recursive = FALSE)
 [1] "function"      "deparse"       "get"           "c"            
 [5] "unlist"        "strsplit"      "unique"        "sapply"       
 [9] "tail"          "tryCatch"      "is.function"   "return"       
[13] "if"            "do.call"       "lapply"        "listFunctions"

> system.time(all.functions <- listFunctions("listFunctions", recursive = TRUE))
   user  system elapsed 
  92.31    0.08   93.49 

> length(all.functions)
  [1] 518

As you can see, the recursive version returns a lot of functions. The problem with this is it returns every function called in the process, which obviously adds up as you go. In any case, I hope you can use this (or modify it) to suit your needs.

Runion answered 9/8, 2012 at 3:40 Comment(2)
Thanks man, I worked with your approach a little today. Changed a couple of things (mainly a "global" buffer for functions already checked; think that's the reason your recursive code took so long) and added some functionality (see my answer).Luciusluck
Nice, your approach has a processing time that's at 0.6870 of the one relying on codetools functionality!Luciusluck
B
8

try this example:

library(codetools)

ff <- function(f) {
  leaf <- function (e, w) {
    r <- try(eval(e), silent = TRUE)
    if(!is.null(r) && is.function(r)) ret <<- c(ret, as.character(e))
  }
  call <- function (e, w) {
    walkCode(e[[1]], w)
    for (a in as.list(e[-1])) if (!missing(a)) walkCode(a, w)
  }
  ret <- c()
  walkCode(body(f), makeCodeWalker(call = call, leaf = leaf, write = cat))
  unique(ret)
}

then,

> ff(data.frame)
 [1] "{"               "<-"              "if"              "&&"              "is.null"         "row.names"       "function"        "is.character"   
 [9] "new"             "as.character"    "anyDuplicated"   "return"          "||"              "all"             "=="              "stop"           
[17] "gettextf"        "warning"         "paste"           "which"           "duplicated"      "["               "as.list"         "substitute"     
[25] "list"            "-"               "missing"         "length"          "<"               "!"               "is.object"       "is.integer"     
[33] "any"             "is.na"           "unique"          "integer"         "structure"       "character"       "names"           "!="             
[41] "nzchar"          "for"             "seq_len"         "[["              "is.list"         "as.data.frame"   ".row_names_info" ">"              
[49] "deparse"         "substr"          "nchar"           "attr"            "abs"             "max"             "("               "%%"             
[57] "unclass"         "seq_along"       "is.vector"       "is.factor"       "rep"             "class"           "inherits"        "break"          
[65] "next"            "unlist"          "make.names"      "match"           ".set_row_names" 
> ff(read.table)
 [1] "{"              "if"             "&&"             "missing"        "file"           "!"              "text"           "<-"             "textConnection"
[10] "on.exit"        "close"          "is.character"   "nzchar"         "inherits"       "stop"           "isOpen"         "open"           ">"             
[19] "readLines"      "<"              "min"            "("              "+"              "lines"          ".Internal"      "quote"          "length"        
[28] "all"            "=="             "pushBack"       "c"              "stdin"          "scan"           "col"            "numeric"        "-"             
[37] "for"            "seq_along"      "["              "max"            "!="             "warning"        "paste0"         ":"              "make.names"    
[46] "names"          "is.null"        "rep"            "match"          "any"            "<="             "rep.int"        "list"           "%in%"          
[55] "sapply"         "do.call"        "data"           "flush"          "[["             "which"          "is.logical"     "is.numeric"     "|"             
[64] "gettextf"       "&"              "is.na"          "type.convert"   "character"      "as.factor"      "as.Date"        "as.POSIXct"     "::"            
[73] "methods"        "as"             "row.names"      ".set_row_names" "as.integer"     "||"             "is.object"      "is.integer"     "as.character"  
[82] "anyDuplicated"  "class"          "attr"          
Branca answered 9/8, 2012 at 7:51 Comment(2)
Thanks a lot for your answer and the pointer to codetools! I didn't have the time to check both of the answers so far, so I started out with Edward's answer as I like to understand the details of how to do it with base R functionality before turning to contrib packages. But codetools seems pretty promising.Luciusluck
Seems like the codetools approach is a little slower compared to Edwards approach which I actually wouldn't have thought (see own answer).Luciusluck
R
6

There must be better ways out there, but here's my attempt:

listFunctions <- function(function.name, recursive = FALSE, 
                          checked.functions = NULL){

    # Get the function's code:
    function.code <- deparse(get(function.name))

    # break code up into sections preceding left brackets:
    left.brackets <- c(unlist(strsplit(function.code, 
                                       split="[[:space:]]*\\(")))

    called.functions <- unique(c(unlist(sapply(left.brackets, 
                                               function (x) {

        # Split up according to anything that can't be in a function name.
        # split = not alphanumeric, not '_', and not '.'
        words <- c(unlist(strsplit(x, split="[^[:alnum:]_.]")))

        last.word <- tail(words, 1)
        last.word.is.function <- tryCatch(is.function(get(last.word)),
                                      error=function(e) return(FALSE))
        return(last.word[last.word.is.function])
    }))))

    if (recursive){

        # checked.functions: We need to keep track of which functions 
        # we've checked to avoid infinite loops.
        functs.to.check <- called.functions[!(called.functions %in%
                                          checked.functions)]

        called.functions <- unique(c(called.functions,
            do.call(c, lapply(functs.to.check, function(x) {
                listFunctions(x, recursive = T,
                              checked.functions = c(checked.functions,          
                                                    called.functions))
                }))))
    }
    return(called.functions)
}

And the results:

> listFunctions("listFunctions", recursive = FALSE)
 [1] "function"      "deparse"       "get"           "c"            
 [5] "unlist"        "strsplit"      "unique"        "sapply"       
 [9] "tail"          "tryCatch"      "is.function"   "return"       
[13] "if"            "do.call"       "lapply"        "listFunctions"

> system.time(all.functions <- listFunctions("listFunctions", recursive = TRUE))
   user  system elapsed 
  92.31    0.08   93.49 

> length(all.functions)
  [1] 518

As you can see, the recursive version returns a lot of functions. The problem with this is it returns every function called in the process, which obviously adds up as you go. In any case, I hope you can use this (or modify it) to suit your needs.

Runion answered 9/8, 2012 at 3:40 Comment(2)
Thanks man, I worked with your approach a little today. Changed a couple of things (mainly a "global" buffer for functions already checked; think that's the reason your recursive code took so long) and added some functionality (see my answer).Luciusluck
Nice, your approach has a processing time that's at 0.6870 of the one relying on codetools functionality!Luciusluck
L
2

Disclaimer

This answer is based on answers by Edward and Kohske. I will not consider this for the answer finally accepted, its main purpose is simply to document another/extended approach and some benchmarks for other users.

Inner Function 1

Courtesy of Edward.

listFunctions_inner <- function(
    name, 
    do.recursive=FALSE,
    .do.verbose=FALSE,
    .buffer=new.env()
){
    ..name  <- "listFunctions_inner"
    if (!is.character(name) | missing(name)) {
        stop(paste(..name, " // expecting 'name' of class 'character'", sep=""))
    }
    name.0 <- name
    if (tryCatch(is.function(get(name)), error=function(e) FALSE)) {
    # PROCESS FUNCTIONS       
        if (.do.verbose) {
            message(paste(..name, " // processing function: '", name, "'", sep=""))
        } 
        # Get the function's code:
        code <- deparse(get(name))  
        # break code up into sections preceding left brackets:
        left.brackets <- c(unlist(strsplit(code, split="[[:space:]]*\\(")))  
        out <- sort(unique(unlist(lapply(left.brackets, function (x) {
            # Split up according to anything that can't be in a function name.
            # split = not alphanumeric, not '_', and not '.'
            words <- c(unlist(strsplit(x, split="[^[:alnum:]_.]")))

            last.word <- tail(words, 1)
            last.word.is.function <- tryCatch(is.function(get(last.word)),
                                          error=function(e) return(FALSE))
            out <- last.word[last.word.is.function]
            return(out)
        }))))
        if (do.recursive){           
            # funs.checked: We need to keep track of which functions 
            # we've checked to avoid infinite loops.
            .buffer$funs.checked <- c(.buffer$funs.checked, name)
            funs.next <- out[!(out %in% .buffer$funs.checked)]        
            if (length(funs.next)) {
                out <- sort(unique(unlist(c(out, do.call(c,
                    lapply(funs.next, function(x) {
                        if (x == ".Primitive") {
                            return(NULL)
                        }
                        listFunctions_inner(
                            name=x, 
                            do.recursive=TRUE,
                            .buffer=.buffer
                        )
                    })
                )))))            
            }
        } 
        out <- sort(unique(unlist(out)))
    } else {
    # PROCESS NAMESPACES
        if (.do.verbose) {
            message(paste(..name, " // processing namespace: '", name, "'", sep=""))
        }
        name    <- paste("package", ":", name, sep="")
        if (!name %in% search()) {
            stop(paste(..name, " // invalid namespace: '", name.0, "'"))
        }
        # KEEP AS REFERENCE       
#        out <- ls(name)
        funlist <- lsf.str(name)
        out     <- head(funlist, n=length(funlist))
    }
    out
}

Inner Function 2

Courtesy of Kohske

listFunctions2_inner <- function(
    name,
    do.recursive=FALSE,
    .do.verbose=FALSE,
    .buffer=new.env()
) {
    ..name <- "listFunctions2_inner"
    if (!is.character(name) | missing(name)) {
        stop(paste(..name, " // expecting 'name' of class 'character'", sep=""))
    }
    name.0 <- name
    if (tryCatch(is.function(get(name)), error=function(e) FALSE)) {
    # PROCESS FUNCTIONS       
        leaf <- function (e, w) {
            r <- try(eval(e), silent = TRUE)
            if(!is.null(r) && is.function(r)) out <<- c(out, as.character(e))
        }
        call <- function (e, w) {
            walkCode(e[[1]], w)
            for (a in as.list(e[-1])) if (!missing(a)) walkCode(a, w)
        }
        out <- c()
        walkCode(
            body(name), 
            makeCodeWalker(call=call, leaf=leaf, write=cat)
        )
        if (do.recursive){           
            # funs.checked: We need to keep track of which functions 
            # we've checked to avoid infinite loops.
            .buffer$funs.checked <- c(.buffer$funs.checked, name)
            funs.next <- out[!(out %in% .buffer$funs.checked)]        
            if (length(funs.next)) {
                out <- sort(unique(unlist(c(out, do.call(c,
                    lapply(funs.next, function(x) {
                        if (x == ".Primitive") {
                            return(NULL)
                        }
                        listFunctions_inner(
                            name=x, 
                            do.recursive=TRUE,
                            .buffer=.buffer
                        )
                    })
                )))))            
            }
        }
        out <- sort(unique(out))
    } else {
    # PROCESS NAMESPACES
        if (.do.verbose) {
            message(paste(..name, " // processing namespace: '", name, "'", sep=""))
        }
        name    <- paste("package", ":", name, sep="")
        if (!name %in% search()) {
            stop(paste(..name, " // invalid namespace: '", name.0, "'"))
        }
        # KEEP AS REFERENCE       
#        out <- ls(name)
        funlist <- lsf.str(name)
        out     <- head(funlist, n=length(funlist))
    }
}

Wrapper Function

This wrapper let's you choose the actual inner function used and allows to specify namespaces that should or should not be considered. That's important for my use case (see section Motivation above), as I'm usually only interested in "own" functions (in .GlobalEnv) that have not yet been moved to a package.

listFunctions <- function(
    name, 
    ns,
    innerFunction=listFunctions,
    do.inverse=FALSE,
    do.table=FALSE,
    do.recursive=FALSE,
    .do.verbose=FALSE
){
    ..name  <- "listFunctions_inner"
    if (!is.character(name) | missing(name)) {
        stop(paste(..name, " // expecting 'name' of class 'character'", sep=""))
    }
    out <- innerFunction(name, do.recursive=do.recursive, 
        .do.verbose=.do.verbose)

    if (do.table) {
        x.ns <- sapply(out, function(x) {
            out <- environmentName(environment(get(x)))
            if (out == "") {
                out <- ".Primitive"
            }
            out
        })
        if (!missing(ns)) {
            if (!do.inverse) {
                idx <- which(x.ns %in% ns)
            } else {
                idx <- which(!x.ns %in% ns)
            }
            if (!length(idx)) {
                return(NULL)
            }
            out <- out[idx]
            x.ns  <- x.ns[idx]
        }
        out <- data.frame(name=out, ns=x.ns, stringsAsFactors=FALSE)
        rownames(out) <- NULL
    }
    out
}

Application

# Character vector
listFunctions("install.packages")

# Data Frame (table)
> listFunctions("install.packages", do.table=TRUE)
                 name         ns
1           .libPaths .Primitive
2   .standard_regexps       base
3                 any .Primitive
4  available.packages      utils
...
84          winDialog      utils

# Consider 'base' only
> listFunctions("install.packages", ns="base", do.table=TRUE)
                name   ns
1  .standard_regexps base
2           basename base
3       capabilities base
...
56           warning base

# Consider all except 'base'
> listFunctions("install.packages", ns="base", do.inverse=TRUE, do.table=TRUE)
                 name         ns
1           .libPaths .Primitive
2                 any .Primitive
3  available.packages      utils
...
28          winDialog      utils

# Recursively, no table
listFunctions("install.packages", do.recursive=TRUE)

# Recursively table
listFunctions("install.packages", do.table=TRUE, do.recursive=TRUE)
                                name         ns
1                     .amatch_bounds       base
2                      .amatch_costs       base
3                                 .C .Primitive
...
544                           xzfile       base

# List functions inside a namespace
listFunctions("utils")
listFunctions("utils", do.table=TRUE)

Benchmark Inner Function 1

> bench <- microbenchmark(listFunctions("install.packages"))
bench
> Unit: milliseconds
                               expr      min       lq   median       uq
1 listFunctions("install.packages") 152.9654 157.2805 160.5019 165.4688
       max
1 244.6589

> bench <- microbenchmark(listFunctions("install.packages", do.recursive=TRUE), times=3)
bench
> Unit: seconds
                                                    expr      min      lq
1 listFunctions("install.packages", do.recursive = TRUE) 6.272732 6.30164
    median       uq      max
1 6.330547 6.438158 6.545769

Benchmark Inner Function 2

> bench <- microbenchmark(listFunctions("install.packages",
+         innerFunction=listFunctions2_inner))
bench
> Unit: milliseconds
                                                                     expr
1 listFunctions("install.packages", innerFunction = listFunctions2_inner)
       min       lq   median       uq      max
1 207.0299 212.3286 222.6448 324.6399 445.4154

> bench <- microbenchmark(listFunctions("install.packages", 
+     innerFunction=listFunctions2_inner, do.recursive=TRUE), times=3)
bench
Warning message:
In nm[nm == ""] <- exprnm[nm == ""] :
  number of items to replace is not a multiple of replacement length
> Unit: seconds
                                                                      expr
1 listFunctions("install.packages", innerFunction = listFunctions2_inner, 
       min       lq   median       uq      max
1 7.673281 8.065561 8.457841 8.558259 8.658678
Luciusluck answered 9/8, 2012 at 16:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.