How to make purrr map function run faster?

Asked 6/12, 2016 at 22:14 Answered 11/4, 2023 at 12:14

I am using map function from purrr library to apply segmented function (from segmented library) as follows:

library(purrr)
library(dplyr)
library(segmented)

# Data frame is nested to create list column
by_veh28_101 <- df101 %>% 
  filter(LCType=="CFonly", Lane %in% c(1,2,3)) %>% 
  group_by(Vehicle.ID2) %>% 
  nest() %>% 
  ungroup()

# Functions:
segf2 <- function(df){
  try(segmented(lm(svel ~ Time, data=df), seg.Z = ~Time,
                psi = list(Time = df$Time[which(df$dssvel != 0)]),
                control = seg.control(seed=2)),
      silent=TRUE)
}


segf2p <- function(df){
  try(segmented(lm(PrecVehVel ~ Time, data=df), seg.Z = ~Time,
                psi = list(Time = df$Time[which(df$dspsvel != 0)]),
                control = seg.control(seed=2)),
      silent=TRUE)
}  

# map function:
models8_101 <- by_veh28_101 %>% 
  mutate(segs = map(data, segf2),
         segsp = map(data, segf2p))

The object by_veh28_101 contains 2457 tibbles. And the last step, where map function is used, takes 16 minutes to complete. Is there any way to make this faster?

Backstay answered 6/12, 2016 at 22:14 Comment(1)

It's not really purrr that's slow here, it's segmented. You're running thousands of models, which takes a while. Profile your code to see exactly what the bottlenecks are. – Olgaolguin 6/12, 2016 at 22:27

You may use the function future_map instead of map.

This function comes from the package furrr and is a parallel option for the map family. Here is the link for the README of the package.

Because your code question it is not reproducible, I cant prepare a benchmark between the map and future_map functions.

Your code with the future_map function is the following:

library(tidyverse)
library(segmented)
library(furrr)


# Data frame stuff....

# Your functions....

# future_map function

# this distribute over the different cores of your computer
# You set a "plan" for how the code should run. The easiest is `multiprocess`
# On Mac this picks plan(multicore) and on Windows this picks plan(multisession)

plan(strategy = multiprocess)

models8_101 <- by_veh28_101 %>% 
  mutate(segs = future_map(data, segf2),
         segsp = future_map(data, segf2p))

Mandi answered 9/6, 2018 at 21:0 Comment(1)

Just to note that 'multicore' does not work from within RStudio any more. Forked processing is considered unstable when running R from certain environments, such as the RStudio environment. Because of this, 'multicore' futures have been disabled in those cases since future 1.13.0. https://cran.case.edu/web/packages/future/NEWS – Conjoin 11/11, 2021 at 19:12

Well, just now I rewrote a purrr::map loop (filtering >40,000 vector elements in a list with some logical tests) with some simple Rcpp. Previously it did not finish in > 2 min; but now it finished running in seconds (about 2~3 seconds, to be precise). Just for reference.

Maypole answered 11/4, 2023 at 12:14 Comment(1)

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Preemption 13/4, 2023 at 14:52

Recommended topics

Hot tags