what's the difference between "hadoop fs" shell commands and "hdfs dfs" shell commands?
Asked Answered
O

8

128

Are they supposed to be equal?

but, why the "hadoop fs" commands show the hdfs files while the "hdfs dfs" commands show the local files?

here is the hadoop version information:

Hadoop 2.0.0-mr1-cdh4.2.1 Subversion git://ubuntu-slave07.jenkins.cloudera.com/var/lib/jenkins/workspace/CDH4.2.1-Packaging-MR1/build/cdh4/mr1/2.0.0-mr1-cdh4.2.1/source -r Compiled by jenkins on Mon Apr 22 10:48:26 PDT 2013

Ossicle answered 9/8, 2013 at 8:37 Comment(2)
It's my wrong to ask this question. the hdfs dfsshow the hdfs files too.Ossicle
Possible duplicate of Differnce between `hadoop dfs` and `hadoop fs`Wilona
B
162

Following are the three commands which appears same but have minute differences

  1. hadoop fs {args}
  2. hadoop dfs {args}
  3. hdfs dfs {args}

  hadoop fs <args>

FS relates to a generic file system which can point to any file systems like local, HDFS etc. So this can be used when you are dealing with different file systems such as Local FS, (S)FTP, S3, and others


  hadoop dfs <args>

dfs is very specific to HDFS. would work for operation relates to HDFS. This has been deprecated and we should use hdfs dfs instead.


  hdfs dfs <args>

same as 2nd i.e would work for all the operations related to HDFS and is the recommended command instead of hadoop dfs

below is the list categorized as hdfs commands.

  namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups

So even if you use hadoop dfs , it will look locate hdfs and delegate that command to hdfs dfs

Britain answered 25/6, 2014 at 8:49 Comment(6)
Interesting :-). So, if hadoop fs relates to any filesystem like, local or hdfs, How hadoop choose to show HDFS root directory content when I do hadoop fs -ls / ? Also, How can I tell hadoop to show my local root directory content when I run the hadoop fs -ls / command ?Montcalm
You can refer to the local FS by using the file schema at the URIs passed as argument to hadoop fs commands (e.g. hdoop fs -ls file:///). If nothing is said, it defaults to hdfs schema, AFAIK (hdoop fs -ls / == hadoop fs -ls hdfs:///).Caitiff
And why would I need hadoop fs -ls file:///, while there are more traditional ways for listing local files?Textualist
why was 'hadoop' deprecated in favor of 'hdfs'? Is there any functional difference or is it just a change in syntax?Clyte
@Britain @OneCricketeer Wth which version of Hadoop, hadoop dfs was deprecated?Clique
@Puru I'm guessing 3.x, or maybe a later version of 2.xJaffa
O
50

enter image description here

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html

The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, WebHDFS, S3 FS, and others.

bin/hadoop fs <args>

All FS shell commands take path URIs as arguments. The URI format is scheme://authority/path. For HDFS the scheme is hdfs, and for the Local FS the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost).

Most of the commands in FS shell behave like corresponding Unix commands. Differences are described with each of the commands. Error information is sent to stderr and the output is sent to stdout.

If HDFS is being used,

hdfs dfs

is a synonym.

Oxpecker answered 28/8, 2017 at 1:26 Comment(0)
H
8

fs refers to any file system, it could be local or HDFS but dfs refers to only HDFS file system. So if you need to perform access/transfer data between different filesystems, fs is the way to go.

Hallmark answered 9/8, 2013 at 8:45 Comment(0)
T
5

From what I can tell, there is no difference between hdfs dfs and hadoop fs. They're simply different naming conventions based on which version of Hadoop you're using. For example, the notes in 1.2.1 use hdfs dfs while 0.19 uses hadoop fs. Notice that the separate commands are described verbatim. They are used identically.

Also note that both commands can refer to different file systems depending on what you specify (hdfs, file, s3, etc). If no file system is listed, they fall back to the default which is specified in your configuration.

You're using Hadoop 2.0.0 and it looks like (based on 2.0.5 documentation) that Alpha versions use hadoop fs and is set to use the HDFS as the default scheme in your configuration. The hdfs dfs command might be left in from before, and since not specified in the configuration, could just be defaulting to the local file system.

So I would just stick with hadoop fs and not worry too much since in documentation, they are identical.

Trisomic answered 9/8, 2013 at 16:16 Comment(0)
A
5

fs = file system
dfs = distributed file system

fs = other file systems + distributed file systems

FS relates to a generic file system which can point to any file systems like local, HDFS etc. But dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination . But specifying DFS operation relates to HDFS.

It all depends upon the scheme configure. When using this two command with absolute URI, i.e. scheme://a/b the behavior shall be identical. Only its the default configured scheme value for file:// and hdfs:// for fs and dfs respectively which is the cause for difference in behavior.

Apolitical answered 3/9, 2017 at 2:42 Comment(0)
C
3

FS relates to a generic file system which can point to any file systems like local, HDFS etc., but dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination, but specifying DFS operation relates to HDFS.

Below are the excerpts from Hadoop documentation which describe these two as different shells.

FS Shell:

The FileSystem (FS) shell is invoked by bin/hadoop fs. All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands.

DFShell:

The HDFS shell is invoked by bin/hadoop dfs. All the HDFS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell behave like corresponding Unix commands.

From the above it can be concluded that it all depends upon the scheme configure. When using this two command with absolute URI, i.e. scheme://a/b the behavior shall be identical. Only its the default configured scheme value for file and hdfs for fs and dfs respectively which is the cause for the difference in behavior.

Celt answered 17/10, 2015 at 13:34 Comment(2)
Why hdfs dfs points to the different location than hdfs dfs /?Nanceenancey
it would be nice if there was an interactive shell like bash for hadoopPhung
C
1

The “fs” term refers to a generic file system, which by the definition can point to ANY file system ( including HDFS), but dfs is very specific. On the other hand, “DFS” refers precisely to Hadoop Distributed File System access. So when we use FS it can perform operation related to local or hadoop distributed file system and dfs can perform operation related to hadoop distributed file system only.

So,

  1. hadoop fs

It is used when we are dealing with different file systems such as Local FS, HDFS etc.

hdfs dfs

  1. It is used when we are dealing for operations related to HDFS.

Another command, which looks similiar to these two is

  1. hadoop dfs

This command should not be used, as it is deprecated. Even if you use it, it will send the command to hdfs dfs.

Cosentino answered 21/10, 2021 at 19:6 Comment(0)
M
-1

hadoop fs and hdfs dfs are basically same. Both gives same result with some linux commands like ls, rm. you should use the commands like this

hadoop fs -ls <path>
hdfs dfs -ls <path>
Musick answered 15/7, 2021 at 5:11 Comment(1)
There is a difference, though, and this answer doesn't explain itJaffa

© 2022 - 2024 — McMap. All rights reserved.