Download all files and subdirectories from a Google Drive directory from R
Asked Answered
r
W

2

5

There are some prior related questions (1, 2, 3), but nothing quite what I want, and I can't get the code example to work that Jenny Bryan posted in 2018.

I have a folder shared with me with some large files. The files are nested. So I want to recurse into the sub-directories and get all files from each. In my case, there are only two layers, but it would be nice with an approach that works for arbitrary number of layers.

The most obvious command to try is simply telling it to download the folder, hoping it will figure out the substructure:

#load the libraries
library(tidyverse)
library(googledrive)

#folder link to id
#hidden for privacy reasons
jp_folder = "https://drive.google.com/drive/folders/XXXXX" 
folder_id = drive_get(as_id(jp_folder))

#download in entirety
drive_download(folder_id)

Unfortunately, this doesn't work because it apparently cannot deal with folders:

> drive_download(folder_id)
Error: Not a recognized Google MIME type:
  * application/vnd.google-apps.folder

Here's my attempt at avoiding this issue by going into each subdir:

#load the libraries
library(tidyverse)
library(googledrive)

#folder link to id
#hidden for privacy reasons
jp_folder = "https://drive.google.com/drive/folders/XXXXX" 

#get the id data frame
folder_id = drive_get(as_id(jp_folder))

#find files in folder
files = drive_ls(folder_id)

#loop dirs and download files inside them
for (i in seq_along(files$name)) {
  i_dir = drive_ls(files$id[i])
  
  #download files
  walk(i_dir$id, ~ drive_download(as_id(.x)))
}

The files object seems fine (replacing the strings with fillers):

# A tibble: 6 x 3
  name   id                                drive_resource   
* <chr>  <chr>                             <list>           
1 A      AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA <named list [32]>
2 B      BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB <named list [32]>
3 C      CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC <named list [31]>
4 D      DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD <named list [31]>
5 E      EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE <named list [31]>
6 F      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF <named list [31]>

However, when one attempts to get the contents of the subdir, it throws this error:

> i_dir = drive_ls(files$id[i])
Error: 'path' does not identify at least one Drive file.

What's wrong here?

Weevily answered 4/11, 2020 at 20:57 Comment(0)
W
7

Actually, it is simple: drive_ls() wants a data frame input with 1 row, not a character vector. The error message is misleading (it would be nice if it simply told the user to give it a data frame). If one changes the code to that, and adds the required loops, one can automatically download the contents of the sub-dirs. It will fail if there are sub-sub-dirs. A proper recursive function needs to be written and implemented in the package.

This code works for me:

#load the libraries
library(stringr)
library(googledrive)
  
#folder link to id
jp_folder = "https://drive.google.com/drive/folders/XXXXX"
folder_id = drive_get(as_id(jp_folder))

#find files in folder
files = drive_ls(folder_id)

#loop dirs and download files inside them
for (i in seq_along(files$name)) {
  #list files
  i_dir = drive_ls(files[i, ])
  
  #mkdir
  dir.create(files$name[i])
  
  #download files
  for (file_i in seq_along(i_dir$name)) {
    #fails if already exists
    try({
      drive_download(
        as_id(i_dir$id[file_i]),
        path = str_c(files$name[i], "/", i_dir$name[file_i])
      )
    })
  }
}

This version skips files already downloaded.

Weevily answered 4/11, 2020 at 20:57 Comment(1)
Hey this is excellent! Just as a note you don't need the full tidyverse for this, you are only using stringr so library(stringr) minimizes loads since tidyverse is huge (especially if you need to install it).Phage
L
0

if you want to recurse into the sub-directories and download all files and folders (i.e. copy down the whole mother folder), you will need something very recursive (with a self defined function):

library(googledrive)
setwd('YOUR DOWNLOADING DESTINATION HERE')
# Authenticate with Google Drive
drive_auth()

# Define the recursive download function
download_folder <- function(folder_id, local_path) {
  # Create the local directory if it doesn't exist
  dir.create(local_path, showWarnings = FALSE, recursive = TRUE)
  
  # List all items in the folder
  items <- drive_ls(as_id(folder_id))
  
  # Loop through each item
  for (i in seq_len(nrow(items))) {
    item <- items[i, ]
    
    # Check if the item is a folder
    if (item$drive_resource[[1]]$mimeType == "application/vnd.google-apps.folder") {
      # If it's a folder, recursively download its contents
      subfolder_local_path <- file.path(local_path, item$name)
      download_folder(item$id, subfolder_local_path)
###
# the magic is here, it recalls the self-defined function again
###
    } else {
      # If it's a file, download it to the local path
      cat(sprintf("Downloading file: %s\n", file.path(local_path, item$name)))
      drive_download(
        file = item,
        path = file.path(local_path, item$name),
        overwrite = TRUE
      )
    }
  }
}

# Specify the Google Drive folder URL that you wants to download
folder_url <- "https://drive.google.com/drive/u/0/folders/REPLACE THIS"
folder_id <- as_id(folder_url)

# Specify the local path where you want to save the folder
local_folder <- "YOUR FOLDER NAME"

# Start the recursive download
download_folder(folder_id, local_folder)

Lumpy answered 1/10 at 13:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.