I am running a parallelized calculation using foreach
to work on a lot of time series simultaneously. Among those calculations (within a function called compute_slope()
I do something like this
lBd <- floor(TMax^delta) # lower bound
uBd <- ceiling(m * TMax^delta) # upper bound
# process is a tibble with columns `n` and `variance`
process %>%
dplyr::filter(between(n, lBd, uBd)) %>%
lm(data = ., log(variance) ~ log(n)) %>%
coefficients() %>%
.[2]
So, this is something pretty straightforward: With parameters TMax
, delta
and m
I truncate a time series on the left and on the right (using filter()
) and then I run a linear regression on the truncated time series.
For some strange reason, most of the time everything works out nicely but sometimes (I suspect that error happens more likely for longer time series, i.e TMax
is larger, but that has been sort of irregular too) I get
✖ Problem with `filter()` input `..1`.
ℹ Input `..1` is `between(n, lBd, uBd)`.
✖ `ancestor` must be an environment"
I have really no clue how to interpret this error. I also have a hard time replicating this "ancestor" error but so far no luck. For instance, I have tried
library(tidyverse)
# This is the straightforward use-case and should work (it does here)
mpg %>% filter(between(hwy, 30, 31))
#> # A tibble: 11 x 11
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 2 2008 4 manual~ f 20 31 p comp~
#> 2 audi a4 2 2008 4 auto(a~ f 21 30 p comp~
#> 3 chevrolet malibu 2.4 2008 4 auto(l~ f 22 30 r mids~
#> 4 hyundai sonata 2.4 2008 4 auto(l~ f 21 30 r mids~
#> 5 hyundai sonata 2.4 2008 4 manual~ f 21 31 r mids~
#> 6 nissan altima 2.5 2008 4 auto(a~ f 23 31 r mids~
#> 7 toyota camry 2.4 2008 4 manual~ f 21 31 r mids~
#> 8 toyota camry 2.4 2008 4 auto(l~ f 21 31 r mids~
#> 9 toyota camry s~ 2.4 2008 4 manual~ f 21 31 r comp~
#> 10 toyota camry s~ 2.4 2008 4 auto(s~ f 22 31 r comp~
#> 11 toyota corolla 1.8 1999 4 auto(l~ f 24 30 r comp~
# bounds are undefined
mpg %>% filter(between(hwy, x, 31))
#> Error: Problem with `filter()` input `..1`.
#> i Input `..1` is `between(hwy, x, 31)`.
#> x object 'x' not found
# bounds are functions
mpg %>% filter(between(hwy, slice, 31))
#> Error: Problem with `filter()` input `..1`.
#> i Input `..1` is `between(hwy, slice, 31)`.
#> x cannot coerce type 'closure' to vector of type 'double'
In each case, a different (interpretable) error message was created. I suspect that the error message results from something weird happening as part of the parallel processing but I am not sure what that could be. In any case, examples for this ancestor error would be appreciated. Maybe from there I can work my way back to what goes awry in my calculations.
Update
I still cannot figure out what is going on with the parallelizations even after adding a traceback to the script. This is what it delivers
Error in { :
task 34 failed - "Problem with `mutate()` column `grid_estimates`.
ℹ `grid_estimates = map(data, ~estimate_var_on_grid(process = ., TMax = TMax, grid = grid))`.
✖ Problem with `mutate()` column `slope`.
ℹ `slope = map2_dbl(m, delta, ~compute_slope(process, .x, .y, TMax))`.
✖ could not find function "::""
Calls: compute_metrics_on_stable_splits ... tibble -> tibble_quos -> eval_tidy -> %dopar% -> <Anonymous>
11: (function ()
traceback(2))()
10: stop(simpleError(msg, call = expr))
9: e$fun(obj, substitute(ex), parent.frame(), e$data)
8: foreach(i = itx, .packages = c("tidyverse", "yardstick", "rsample"),
.export = #vector of exports removed for legibility
) %dopar% {
i %>%
pull(splits) %>%
.[[1]] %>%
train_and_test(., train_grid = grid, my_mset = my_mset,
method = method, TMax = TMax_eval)
}
}
7: eval_tidy(xs[[j]], mask)
6: tibble_quos(xs, .rows, .name_repair)
5: tibble(metrics = .)
4: list2(...)
3: bind_cols(select(splits, alpha), .)
2: foreach(i = itx, .packages = c("tidyverse", "yardstick", "rsample"),
.export = #vector of exports removed for legibility
) %dopar% {
i %>%
pull(splits) %>%
.[[1]] %>%
train_and_test(., train_grid = grid, my_mset = my_mset,
method = method, TMax = TMax_eval)
}
} %>%
tibble(metrics = .) %>%
bind_cols(select(splits, alpha), .)
1: compute_metrics_on_stable_splits(method = method, grid = grid,
my_mset = metric_set(accuracy, mcc, sens, spec), TMax_eval = TMax_eval,
v = 40)
The error is now could not find function "::"
which is as weird as the ancestor error. At other times I also received
'rho' must be an environment not pairlist: detected in C-level eval
Apparently, the error can be different even though the code in the script stays the same. At this point any clue would be appreciated. What is weird is that in some cases the exact same code either fails with a changing error message or sometimes completes (and if I wouldn't need to run more computations with this script, then I would already be happy with the results I get when the code finishes successfully).
Session Info
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.2 (Ootpa)
Matrix products: default
BLAS/LAPACK: /pfs/data5/software_uc2/all/toolkit/Intel_OneAPI/mkl/2021.4.0/lib/intel64/libmkl_intel_lp64.so.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] yardstick_0.0.9 doParallel_1.0.16 iterators_1.0.13 foreach_1.5.1
[5] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[9] readr_2.1.1 tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5
[13] tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 haven_2.4.3 colorspace_2.0-2 vctrs_0.3.8
[5] generics_0.1.1 utf8_1.2.2 rlang_0.4.12 pillar_1.6.4
[9] glue_1.5.1 withr_2.4.3 DBI_1.1.1 dbplyr_2.1.1
[13] modelr_0.1.8 readxl_1.3.1 lifecycle_1.0.1 plyr_1.8.6
[17] munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 rvest_1.0.2
[21] codetools_0.2-18 tzdb_0.2.0 fansi_0.5.0 broom_0.7.10
[25] Rcpp_1.0.7 scales_1.1.1 backports_1.4.0 jsonlite_1.7.2
[29] fs_1.5.1 hms_1.1.1 stringi_1.7.6 grid_4.1.2
[33] cli_3.1.0 tools_4.1.2 magrittr_2.0.1 crayon_1.4.2
[37] pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.3 pROC_1.18.0
[41] reprex_2.0.1 lubridate_1.8.0 assertthat_0.2.1 httr_1.4.2
[45] rstudioapi_0.13 R6_2.5.1 compiler_4.1.2
sessionInfo()
). Perhaps agrep
on their source code might find the culprit. Also, its useful to add a traceback – Hyaenax
is not defined in your environment and in last error you have used a functionslice
instead of value. – SolidaritysessionInfo()
to reduce the size of the haystack a bit. – Hyaenabad generic call environment
error message. Don't know if that helps... – Kemberlykemble