On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.
What would be the best way to achieve this from a shell?
On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.
What would be the best way to achieve this from a shell?
Try this (not sure if it's the best way, but it works):
find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
It work as following:
.bashrc
, you have to escape $1
as \$1
. In fact it seems escaping $1
doesn't do harm for console usage either. –
Agency git ls-tree -r HEAD --name-only
instead of find
–
Agency theme
from page_manager.theme.inc
. –
Ci find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort | uniq -c | sort -n
–
Ypres configs-0.1.6
which don't have extensions but have dots in it's name. –
Amimia perl -ne 's/.+\.// && print'
–
Vandavandal No need for the pipe to sort
, awk can do it all:
find . -type f | awk -F. '!a[$NF]++{print $NF}'
alias
command but the command itself already uses quotes in the find command. To fix this I would use bash
's literal string syntax as so: alias file_ext=$'find . -type f -name "*.*" | awk -F. \'!a[$NF]++{print $NF}\''
–
Sturgeon maindir/test.dir/myfile
–
Eudemonics -printf "%f\n"
to the end of the 'find' command and re-run your test. –
Sturgeon My awk-less, sed-less, Perl-less, Python-less POSIX-compliant alternative:
find . -name '*.?*' -type f | rev | cut -d. -f1 | rev | tr '[:upper:]' '[:lower:]' | sort | uniq --count | sort -rn
The trick is that it reverses the line and cuts the extension at the beginning.
It also converts the extensions to lower case.
Example output:
3689 jpg
1036 png
610 mp4
90 webm
90 mkv
57 mov
12 avi
10 txt
3 zip
2 ogv
1 xcf
1 trashinfo
1 sh
1 m4v
1 jpeg
1 ini
1 gqv
1 gcs
1 dv
uniq
doesn't have the full flag --count
, but -c
works just fine –
Guck find . -type f -name '*.?* .... '
, not fully tested but should work. –
Rosalynrosalynd uniq
also lacks --count
, but it does have -c
–
Hershel find . -name '*.*' -type f | rev | cut -d. -f1 | rev | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn
(to limit results to files that have an extension) –
Unknowable Recursive version:
find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u
If you want totals (how may times the extension was seen):
find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rn
Non-recursive (single folder):
for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -u
I've based this upon this forum post, credit should go there.
git show --name-only --pretty="" | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u
–
Hammer Powershell:
dir -recurse | select-object extension -unique
Thanks to http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html
.
in them (e.g. jquery-1.3.4
will show up as .4
in the output). Change to dir -file -recurse | select-object extension -unique
to get only file extensions. –
Aguilera Adding my own variation to the mix. I think it's the simplest of the lot and can be useful when efficiency is not a big concern.
find . -type f | grep -oE '\.(\w+)$' | sort -u
$ find . -type f | grep -o -E '\.[^.\/]+$' | sort -u
–
Coom $ find . -type f | grep -Eo '\.(\w+)$' | sort -u
. The original one shows files without extension in my case that was not what I needed. –
Aiden Find everythin with a dot and show only the suffix.
find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u
if you know all suffix have 3 characters then
find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -u
or with sed shows all suffixes with one to four characters. Change {1,4} to the range of characters you are expecting in the suffix.
find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$/\1/p'| sort -u
-name "."
thing because that's what it already is –
Sturgeon I tried a bunch of the answers here, even the "best" answer. They all came up short of what I specifically was after. So besides the past 12 hours of sitting in regex code for multiple programs and reading and testing these answers this is what I came up with which works EXACTLY like I want.
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort -u
If you need a count of the file extensions then use the below code
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort | uniq -c | sort -rn
While these methods will take some time to complete and probably aren't the best ways to go about the problem, they work.
Update: Per @alpha_989 long file extensions will cause an issue. That's due to the original regex "[[:alpha:]]{3,6}". I have updated the answer to include the regex "[[:alpha:]]{2,16}". However anyone using this code should be aware that those numbers are the min and max of how long the extension is allowed for the final output. Anything outside that range will be split into multiple lines in the output.
Note: Original post did read "- Greps for file extensions between 3 and 6 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail)."
Idea: Could be used to find file extensions over a specific length via:
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{4,}" | awk '{print tolower($0)}' | sort -u
Where 4 is the file extensions length to include and then find also any extensions beyond that length.
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort | uniq -c | sort -rn
- this works well - but is there a way to get the total file size of each php extension ? –
Castorena In Python using generators for very large directories, including blank extensions, and getting the number of times each extension shows up:
import json
import collections
import itertools
import os
root = '/home/andres'
files = itertools.chain.from_iterable((
files for _,_,files in os.walk(root)
))
counter = collections.Counter(
(os.path.splitext(file_)[1] for file_ in files)
)
print json.dumps(counter, indent=2)
Since there's already another solution which uses Perl:
If you have Python installed you could also do (from the shell):
python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)"
Another way:
find . -type f -name "*.*" -printf "%f\n" | while IFS= read -r; do echo "${REPLY##*.}"; done | sort -u
You can drop the -name "*.*"
but this ensures we are dealing only with files that do have an extension of some sort.
The -printf
is find
's print, not bash. -printf "%f\n"
prints only the filename, stripping the path (and adds a newline).
Then we use string substitution to remove up to the last dot using ${REPLY##*.}
.
Note that $REPLY
is simply read
's inbuilt variable. We could just as use our own in the form: while IFS= read -r file
, and here $file would be the variable.
None of the replies so far deal with filenames with newlines properly (except for ChristopheD's, which just came in as I was typing this). The following is not a shell one-liner, but works, and is reasonably fast.
import os, sys
def names(roots):
for root in roots:
for a, b, basenames in os.walk(root):
for basename in basenames:
yield basename
sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:]))
for suf in sufs:
if suf:
print suf
I think the most simple & straightforward way is
for f in *.*; do echo "${f##*.}"; done | sort -u
It's modified on ChristopheD's 3rd way.
I don't think this one was mentioned yet:
find . -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c
The accepted answer uses REGEX and you cannot create an alias command with REGEX, you have to put it into a shell script, I'm using Amazon Linux 2 and did the following:
I put the accepted answer code into a file using :
sudo vim find.sh
add this code:
find ./ -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
save the file by typing: :wq!
sudo vim ~/.bash_profile
alias getext=". /path/to/your/find.sh"
:wq!
. ~/.bash_profile
you could also do this
find . -type f -name "*.php" -exec PATHTOAPP {} +
I've found it simple and fast...
# find . -type f -exec basename {} \; | awk -F"." '{print $NF}' > /tmp/outfile.txt
# cat /tmp/outfile.txt | sort | uniq -c| sort -n > tmp/outfile_sorted.txt
If you are looking for answer that respect .gitignore
then check below answer.
git ls-tree -r HEAD --name-only | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
Another version of Ondra Žižka's one:
find . -name '*.?*' -type f | rev | cut -d. -f1 | rev | sort | uniq
On case sensitive file systems different cases should imho not be treated as the same extension. Also I don't think counting files is necessary as an answer to OPs question.
© 2022 - 2024 — McMap. All rights reserved.
.svn
), usefind . -type f -path '*/.svn*' -prune -o -print | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
source – Moan