How to count number of file in a bucket-folder with gsutil
Asked Answered
P

8

75

Is there an option to count the number of files in bucket-folders?

Like:

gsutil ls -count -recursive gs://bucket/folder

Result:   666 files

I just want an total number of files to compare the amount to the sync-folder on my server.

I don't get it in the manual.

Pennypennyaliner answered 24/9, 2013 at 15:50 Comment(0)
S
128

Newer Approach


gsutil now has a du command. This makes it even easier to get a count:

$ gsutil du gs://pub | wc -l
232

Older Approach


The gsutil ls command with options -l (long listing) and -R (recursive listing) will list the entire bucket recursively and then produce a total count of all objects, both files and directories, at the end:

$ gsutil ls -lR gs://pub
    104413  2011-04-03T20:58:02Z  gs://pub/SomeOfTheTeam.jpg
       172  2012-06-18T21:51:01Z  gs://pub/cloud_storage_storage_schema_v0.json
      1379  2012-06-18T21:51:01Z  gs://pub/cloud_storage_usage_schema_v0.json
   1767691  2013-09-18T07:57:42Z  gs://pub/gsutil.tar.gz
   2445111  2013-09-18T07:57:44Z  gs://pub/gsutil.zip
      1136  2012-07-19T16:01:05Z  gs://pub/gsutil_2.0.ReleaseNotes.txt
... <snipped> ...

gs://pub/apt/pool/main/p/python-socksipy-branch/:
     10372  2013-06-10T22:52:58Z  gs://pub/apt/pool/main/p/python-socksipy-branch/python-socksipy-branch_1.01_all.deb

gs://pub/shakespeare/:
        84  2010-05-07T23:36:25Z  gs://pub/shakespeare/rose.txt
TOTAL: 144 objects, 102723169 bytes (97.96 MB)

If you really just want the total, you can pipe the output to the tail command:

$ gsutil ls -lR gs://pub | tail -n 1
TOTAL: 144 objects, 102723169 bytes (97.96 MB)
Slunk answered 24/9, 2013 at 16:20 Comment(12)
Great, thanks ... just a liddle bit slow for 4 mio files .. Is this Operation 1 Call or counted as numbers of bucket elements? ... could become expensive .. :-)Pennypennyaliner
It does an object listing on the bucket, and pages through the results, I think 1000 at a time, so it will make N/1000 calls, where N is the number of objects you have. This is a class A operation per the pricing page.Slunk
Hello just logged in to say thanks this helped. I was trying to use find but that was not supported so when searching for an alternative stumbled upon your answer. Its been a great help.Disorganization
the gsutil solution works great in gsutil v 4.15, @jterrace, but only if there are no "subdirectories" in the bucket/path you are listing. If there are subdirectories, du will roll up the size of the files below that directory and print a line to stdout for that directory (making the file count incorrect). Sorry for the late update to an old question.Florri
While gsutil ls -lworks is there a way in Windows (no tail or ws) to get a summary without needing to list the entire bucket contentsMassimo
du and ls aren't counting as much as wc -l is.Gies
@Slunk Great, thanks. It also includes directory as an object and adds to count. Can we somehow only consider files count excluding directories.Prescript
@Slunk looks like du is giving file sizes, not counts!Industrials
@Industrials - yes, but it prints one per line, which is why I pipe to wc -lSlunk
@Massimo If you have Git for Windows, you have Git Bash. Use that.Bellarmine
Clearly GCP is using this to get more money from us. They clearly know the size and count. It should be available in the API. We should not accept less.Maltase
@YogeshPatil A minor trick to ignore the directory itself. gsutil du gs://folder/* | wc -lAct
P
42

If you have the option to not use gsutil, the easiest way is to check it on Google Cloud Platform. Go to Monitoring > Metrics explorer :

  • Resource type : GCS Bucket
  • Metric : Object count Then, in the table below, you have for each bucket the number of document it contains.
Presently answered 17/11, 2021 at 9:41 Comment(5)
this is an underappreciated answer.Distillate
This is WAY faster than using gsutil if you aren't doing something programmatically and you just need the count, AND it doesn't dip into your Class A Operations quota.Dagnah
Especially helpful when your bucket has more than a million objects and the total size exceeds a few GBs.Primogenitor
Of course, this only works if you want to count the amount of files in the entire bucket. You can't use this to check the amount of files in a specific folder inside the bucket.Sundew
The downside to this great solution is that the calculation only occurs once per day. This means that any results shown are stale and may not reflect the current story.Infinite
G
13

You want to gsutil ls -count -recursive in gs://bucket/folder? Alright; gsutil ls gs://bucket/folder/** will list just full urls of the paths to files under gs://bucket/folder without the footer or the lines ending in a colon. Piping that to wc -l will give you the line-count of the result.

gsutil ls gs://bucket/folder/** | wc -l

Gies answered 24/5, 2017 at 1:27 Comment(2)
Why use ** not just *?Sundsvall
@Sundsvall I think in this case it might be equivalent, but ** does work for multiple levels at once, so I think /folder/**/*.js would find all js files under any depth of directories after folder (except in folder itself) while /folder/*/*.js would only work for js files within a directory in folder.Gies
W
10

As someone that had 4.5M objects in a bucket, I used gsutil du gs://bucket/folder | wc -l which took ~24 min

Wanitawanneeickel answered 25/10, 2021 at 23:54 Comment(0)
A
9

This doesn't work recursively, but you can also get the count of a single large folder from the console. This method has the advantage of being very fast.

  1. Select Sort and filter from the filter menu in your bucket. Select sort and filter

  2. Reverse the sort order to let Google Cloud Storage calculate the number of files/folders. Sort by name

  3. View the count of files/folders in the current folder. View the count

Achlamydeous answered 27/12, 2022 at 20:5 Comment(2)
This is a wise solution!Bumble
it will work just if you dont have alot of files. we have alot of files in the bucket and it shows: "1 – 100 of many"Dilemma
C
4
gsutil ls -lR gs://Floder1/Folder2/Folder3/** |tail -n 1
Celestinacelestine answered 11/2, 2020 at 11:17 Comment(0)
O
2

This gist shows how to iterate through all Cloud Storage buckets and list the number of objects in each. Compliments of @vinoaj

for VARIABLE in $(gsutil ls)
do
  echo $(gsutil du $VARIABLE | grep -v /$ | wc -l) $VARIABLE
done

To filter buckets, add a grep such as for VARIABLE in $(gsutil ls | grep "^gs://bucketname")

In the console, you can click Activate Cloud Shell in the top right and paste this in to get results. If you save the commands as a bash script, then run chmod u+x program_name so the script can run in the GCP Cloud Shell.

NOTE: When you do gsutil du gs://my-bucket/logs | wc -l the result includes an "extra" result for each bucket and sub-directory. For example, 3 files in a top-level bucket will be 4. 3 files in a sub-directory will be 5.

Obi answered 28/7, 2021 at 9:32 Comment(0)
P
2

Official way how to show count of files (or size of bucket): https://console.cloud.google.com/monitoring/metrics-explorer enter image description here

to see it splitted by bucket names: on the top: change to aggregation: Unaggregated

enter image description here

Parrotfish answered 11/1 at 10:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.