adding HdfsFindTool as alias in .bash_profile,will make it easy to use always.
--add below to profile
alias hdfsfind='hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-job.jar org.apache.solr.hadoop.HdfsFindTool'
alias hdfs='hadoop fs'
--u can use as follows now :(here me using find tool to get HDFS source folder wise File name and record counts.)
$> cnt=1;for ff in hdfsfind -find /dev/abc/*/2018/02/16/*.csv -type f
; do pp=echo ${ff}|awk -F"/" '{print $7}'
;fn=basename ${ff}
; fcnt=hdfs -cat ${ff}|wc -l
; echo "${cnt}=${pp}=${fn}=${fcnt}"; cnt=expr ${cnt} + 1
; done
--simple to get folder /file details:
$> hdfsfind -find /dev/abc/ -type f -name "*.csv"
$> hdfsfind -find /dev/abc/ -type d -name "toys"
find
is what I expect most people use. – Sappherahadoop 2.6.0-cdh5.4.1
, it seems that this doesn't work:hadoop fs -ls -R <pattern>
, but a reasonable solution is this:hadoop fs -ls -R <filepath> | egrep <regex_pattern>
– Morningglory