how to find file from blockName in HDFS hadoop
Asked Answered
H

3

10

What's the easiest way to find file associated with a block in HDFS given a block Name/ID

Harlot answered 4/6, 2012 at 12:40 Comment(0)
N
11

The long and painful way, assuming you have read access to all the files (and execute for the directories):

hadoop fsck / -files -blocks | grep blk_520275863902385418_1002 -B 20

Then scan back up from your block match to the previous file name:

/hadoop/mapred/system/jobtracker.info 4 bytes, 1 block(s):  OK
0. blk_520275863902385418_1002 len=4 repl=1

In this case blk_5202... is part of the /hadoop/mapred/system/jobtracker.info file

Programmatically, these isn't an interface to the name node that allows you to search by block ID, but you could look into the source for the secondary name node and see how it consolidates the edits - then experiment on the saved output from the secondary name node (rather than risking working on the live name node file).

Good luck!

Nightcap answered 5/6, 2012 at 1:22 Comment(1)
This may take longer time if you cast on a relatively large cluster. However this give you the block size and the full block identifier that includes which DataNode it resides.Irritating
J
18

Not sure when this was introduced but you can do this

hdfs fsck -blockId <block_id>

hdfs fsck -blockId blk_1100790203
Connecting to namenode 
FSCK started by hdfs 

Block Id: blk_1100790203
Block belongs to: /tmp/1447685899336.txt
Janiuszck answered 6/3, 2017 at 21:38 Comment(1)
This works faster than Chris' answer. My HDFS version is 3.1.2.Irritating
N
11

The long and painful way, assuming you have read access to all the files (and execute for the directories):

hadoop fsck / -files -blocks | grep blk_520275863902385418_1002 -B 20

Then scan back up from your block match to the previous file name:

/hadoop/mapred/system/jobtracker.info 4 bytes, 1 block(s):  OK
0. blk_520275863902385418_1002 len=4 repl=1

In this case blk_5202... is part of the /hadoop/mapred/system/jobtracker.info file

Programmatically, these isn't an interface to the name node that allows you to search by block ID, but you could look into the source for the secondary name node and see how it consolidates the edits - then experiment on the saved output from the secondary name node (rather than risking working on the live name node file).

Good luck!

Nightcap answered 5/6, 2012 at 1:22 Comment(1)
This may take longer time if you cast on a relatively large cluster. However this give you the block size and the full block identifier that includes which DataNode it resides.Irritating
C
2

Option 1: the suffix .meta is needed if using the blockId with generationStamp

$ hdfs fsck -blockId blk_1073823706_82968.meta

Option 2: use the blockId without generationStamp

$ hdfs fsck -blockId blk_1073823706
Cyclist answered 22/9, 2021 at 12:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.