I need to loop over all csv files in a Hadoop file system. I can list all of the files in a HDFS directory with
> hadoop fs -ls /path/to/directory
Found 2 items
drwxr-xr-x - hadoop hadoop 2 2016-10-12 16:20 /path/to/directory/tmp
-rwxr-xr-x 3 hadoop hadoop 4691945927 2016-10-12 19:37 /path/to/directory/myfile.csv
and can loop over all files in a standard directory with
for filename in /path/to/another/directory/*.csv; do echo $filename; done
but how can I combine the two? I've tried
for filename in `hadoop fs -ls /path/to/directory | grep csv`; do echo $filename; done
but that gives me some nonsense like
Found
2
items
drwxr-xr-x
hadoop
hadoop
2
2016-10-12
....
hadoop fs -ls /path/to/directory | grep csv
should give you a list of lines of standard out, not necessarily just filenames. – Nichols