I have a namenode that had to be brought down for an emergency that has not had an FSImage taken for 9 months and has about 5TB worth of edit files to process in its next restart. The secondary namenode has not been running (or had any checkpoint operations performed) since about 9 months ago, thus the 9 month old FSImage.
There are about 7.8 million inodes in the HDFS cluster. The machine has about 260GB of total memory.
We've tried a few different combinations of Java heap size, GC algorithms, etc... but have not been able to find a combination that allows the restart to complete without eventually slowing down to a crawl due to FGCs.
I have 2 questions: 1. Has anyone found a namenode configuration that allows this large of an edit file backlog to complete successfully?
- An alternate approach I've considered is restarting the namenode with only a manageable subset of the edit files present. Once the namenode comes up and creates a new FSImage, bring it down, copy the next subset of edit files over, and then restart it. Repeat until it's processed the entire set of edit files. Would this approach work? Is it safe to do, in terms of the overall stability of the system and the file system?
dfs.namenode.checkpoint.txns
. The SNN will checkpoint everydfs.namenode.checkpoint.txns
edits. See github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/…. You should bump updfs.namenode.num.checkpoints.retained
to something higher (defaults to 2) just in case, too. – Waken