Checking if directory in HDFS already exists or not
Asked Answered
F

6

17

I am having following directory structure in HDFS,

/analysis/alertData/logs/YEAR/MONTH/DATE/HOURS

That is data is coming on houly basis and stored in format of year/month/day/hour.

I have written a shell script in which i am passing path till

"/analysis/alertData/logs"   ( this will vary depending on what product of data i am handling)

then shell script go through the year/month/date/hour folders and return the most latest path.

For example:

 Directories present in HDFS has following structure: 

 /analysis/alertData/logs/2014/10/22/01
 /analysis/alertData/logs/2013/5/14/04

 shell script is given path till :   " /analysis/alertData/logs "

 it outputs most recent directory :    /analysis/alertData/logs/2014/10/22/01

My question is here is how can i validate whether HDFS directory path pass to shell script is valid or not. Lets say i pass a wrong path as input or path which does not exist so how to handle that in shell script.

Sample wrong path can be:

  wrong path   :  /analysis/alertData ( correct path :  /analysis/alertData/logs/ )
  wrong path   :  /abc/xyz/  ( path does not exit in HDFS )

I tried using Hadoop dfs -test -z/-d/-e options did not worked for me. Any suggestion for this.

NOTE : Not posting my original code here, as solution to my problem does not depend on it.

Thanks in advance.

Fusspot answered 22/10, 2014 at 17:50 Comment(0)
Q
33

Try w/o test command []:

if $(hadoop fs -test -d $yourdir) ; then echo "ok";else echo "not ok"; fi
Quarrelsome answered 1/2, 2016 at 11:7 Comment(2)
I would like to add I've tried and this command worked only without [].Betseybetsy
Why we have to omit the [] operator?, to make it work.Juline
C
19

Since

hdfs dfs -test -d $yourdir

return 0 if exists, then

if [ $? == 0 ]; then
    echo "exists"
else
    echo "dir does not exists"
fi
Cattery answered 10/9, 2015 at 10:30 Comment(0)
O
7

Hadoop fs is deprecated Usage: hdfs dfs -test -[ezd] URI

Options: The -e option will check to see if the file exists, returning 0 if true. The -z option will check to see if the file is zero length, returning 0 if true. The -d option will check to see if the path is directory, returning 0 if true. Example: hdfs dfs -test -d $yourdir

Please check the following for more info: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html Regards

Opsonize answered 29/3, 2016 at 9:29 Comment(0)
S
5

Hi I have used following script to test the HDFS directory exists or not. I have seen in your question that you tried this test command and not worked. Could you please provide any trace on why this not working..

 hadoop fs -test -d $dirpath
    if [ $? != 0 ]
            then
                hadoop fs -mkdir $dirpath
                else
                    echo "Directory already present in HDFS"
    fi
Scipio answered 23/10, 2014 at 6:21 Comment(0)
B
0

works for scala with spark.

import org.apache.hadoop.fs.{FileSystem, Path}
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val fileExists = fs.exists(new Path(<HDFSPath>)) //return boolean of true or false
Bandoline answered 9/6, 2021 at 8:46 Comment(0)
S
-1

In Java we can verify this by using FileSystem class.

FileSystem

Sellers answered 9/8, 2016 at 5:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.