Locking a directory in HDFS
Asked Answered
C

2

7

Is there a way to acquire lock on a directory in HDFS? Here's what I am trying to do:

I've a directory called ../latest/...

Every day I need to add fresh data into this directory, but before I copy new data in here, I want to acquire lock so no one is using it while I copy new data into it.

Is there a way to do this in HDFS?

Cheung answered 19/2, 2014 at 0:20 Comment(0)
B
8

No, there is no way to do this through HDFS.

In general, when I have this problem, I try to copy the data into a random temp location and then move the file once the copy is complete. This is nice because mv is pretty instantaneous, while copying takes longer. That way, if you check to see if anyone else is writing and then mv, the time period and "lock" is held for a shorter time

  1. Generate a random number
  2. Put the data into a new folder in hdfs://tmp/$randomnumber
  3. Check to see if the destination is OK (hadoop fs -ls perhaps)
  4. hadoop fs -mv the data to the latest directory.

There is a slim chance that between 3 and 4 you might have someone clobber something. If that really makes you nervous, perhaps you can implement a simple lock in ZooKeeper. Curator can help you with that.

Bluebottle answered 19/2, 2014 at 3:10 Comment(2)
Right! Creating data in a 'temp' location & moving it is not bullet proof 'cause some user might be running (a long) MR job. Not sure how a simple lock in ZooKeeper would help. There's no guarantee that a user will first acquire a lock before running a MR job against my data, right? Am I missing something? Somehow I think the lock has to be at the Namenode level. Please clarify the ZooKeeper approach. Thanks.Cheung
Yeah, you are right. The ZooKeeper approach assumes that you trust that everyone uses ZK to acquire a lock. Nothing is stopping a user from just ignoring that. In my opinion, you're going to have to find a nontechnical or design approach to solving your problem.Bluebottle
L
0

As it describes in Hadoop FS introduction , creating a file in HDFS is an atmoic operator.

There are some operations that MUST be atomic...

  1. Creating a file. If the overwrite parameter is false, the check and creation MUST be atomic.
  2. Deleting a file.
    ...

We can create a LOCK file in the folder as an exclusive lock, and delete it after we finish the operations.

But be aware that the lock could be "dead" if current process (or job) is down, so we should add a lock expire machanism to avoid that.

Lisa answered 8/5, 2023 at 3:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.