Folders not showing up in Bucket storage
Asked Answered
S

3

24

So my problem is that a have a few files not showing up in gcsfuse when mounted. I see them in the online console and if I 'ls' with gsutils. Also, if If I manually create the folder in the bucket, i then can see the files inside it, but I need to create it first. Any suggestions?

gs://mybucket/ dir1/ ok.txt dir2 lafu.txt

If I mount mybucket with gcsfuse and do 'ls' it only returns dir1/ok.txt. Then I'll create the folder dir2 inside dir1 at the root of the mounting point, and suddenly 'lafu.txt' shows up.

Synthesis answered 11/7, 2016 at 15:42 Comment(1)
What incredibly odd behavior. Sure enough, after I re-created the three layers of parent directories by hand, the last layer had my file inside. Poor form, Google. :/Delicacy
C
37

By default, gcsfuse won't show a directory "implicitly" defined by a file with a slash in its name. For example if your bucket contains an object named dir/foo.txt, you won't be able to find it unless there is also an object nameddir/.

You can work around this by setting the --implicit-dirs flag, but there are good reasons why this is not the default. See the documentation for more information.

Conformable answered 12/7, 2016 at 3:51 Comment(6)
Thank you very much!! This is what i was searching for. Latency is not that big of a problem, so this solves everything :)Synthesis
Done :) I didn't know that was a thing. (My first stack overflow question)Synthesis
I appreciate the documentation link explanation, but that's still a questionable UI. Perhaps detection of 'invisible' 'directories' leading to a notification pointing to the appropriate documentation (or suggestion of the --implicit-dirs flag) would be appropriate. I shouldn't have to waste an hour of my time trying to figure out what's going on.Delicacy
I file this one under "things I would have never ever ever solved without Stack Overflow" ;)Multilateral
can someone explain the drawbacks (beside the latency part) ? i have read it and don't really understand it, the third point seems scary, should we avoid to use the flags then?Nebuchadnezzar
There simply MUST be an option to "--pre-create-implicit-dirs". It just makes sense and matches so many use-cases, I imagine! Why isn't it there.Lonnalonnard
O
5

Google Cloud Storage doesn't have folders. The various interfaces use different tricks to pretend that folders exist, but ultimately there's just an object whose name contains a bunch of slashes. For example, "pictures/january/0001.jpg" is the full name of a single object.

If you need to be sure that a "folder" exists, put an object inside it.

Ode answered 11/7, 2016 at 16:5 Comment(1)
Thanks for the clarification, already helps. I think I didn't explain myself to well then, I'll modify the question.Synthesis
C
0

@Brandon Yarbrough suggests creating needed directory entries in the GCS bucket. This avoids the performance penalty described by @jacobsa.

Here is a bash script for doing so:

# 1.  Mount $BUCKET_NAME at $MOUNT_PT
# 2.  Run this script
MOUNT_PT=${1:-HOME/mnt}
BUCKET_NAME=$2
DEL_OUTFILE=${3:-y}    # Set to y or n

echo "Reading objects in $BUCKET_NAME"
OUTFILE=dir_names.txt
gsutil ls -r gs://$BUCKET_NAME/** | while read BUCKET_OBJ
do   
    dirname "$BUCKET_OBJ"
done | sort -u > $OUTFILE
echo "Processing directories found"
cat $OUTFILE | while read DIR_NAME
do
    LOCAL_DIR=`echo "$DIR_NAME" | sed "s=gs://$BUCKET_NAME/==" | sed "s=gs://$BUCKET_NAME=="`
    #echo $LOCAL_DIR
    TARG_DIR="$MOUNT_PT/$LOCAL_DIR"
    if ! [ -d "$TARG_DIR" ]
    then
        echo "Creating $TARG_DIR"
        mkdir -p "$TARG_DIR"
    fi
done
if [ $DEL_OUTFILE = "y" ]
then
    rm $OUTFILE
fi
echo "Process complete"

I wrote this script, and have shared it at https://github.com/mherzog01/util/blob/main/sh/mk_bucket_dirs.sh.

This script assumes that you have mounted a GCS bucket locally on a Linux (or similar) system. The script first specifies the GCS bucket and location where the bucket is mounted. It then identifies all "directories" in the GCS bucket which are not visible locally, and creates them.

This (for me) fixed the issue with folders (and associated objects) not showing up in the mounted folder structure.

Christoforo answered 27/10, 2020 at 19:2 Comment(3)
If you are linking to your own script then please add a proper affiliation in your answer. Otherwise, it will be considered spamPolyzoan
Just a link to your GitHub repo doesn't make for an answer on Stack Overflow. Answers must actually answer the question, without the requirement that the user click through to some other site to get the answer. Please add context around links. Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline. Take into account that being barely more than a link to an external site is a reason as to Why and how are some answers deleted?.Orelia
Thank you for adding affiliation. However, to get the real answer (your script), one still has to go off-site. That might be reasonable, if the code required exceeded the capacity of an answer (then only major parts would need to be in the answer), but in this case, the script fits in an answer. In cases where I have something like this, I've both included the code in the answer and provided a link to it on GitHub, perhaps mentioning that the GitHub version is going to be the most current. As it is, this is still just an announcement that your script exists, rather than the actual answer.Orelia

© 2022 - 2024 — McMap. All rights reserved.