Getting "No space left on device" for approx. 10 GB of data on EMR m1.large instances

I am getting an error "No space left on device" when I am running my Amazon EMR jobs using m1.large as the instance type for the hadoop instances to be created by the jobflow. The job generates approx. 10 GB of data at max and since the capacity of a m1.large instance is supposed to be 420GB*2 (according to: EC2 instance types ). I am confused how just 10GB of data could lead to a "disk space full" kind of a message. I am aware of the possibility that this kind of an error can also be generated if we have completely exhausted the total number of inodes allowed on the filesystem but that is like a big number amounting to millions and I am pretty sure that my job is not producing that many files. I have seen that when I try to create an EC2 instance independently of m1.large type it by default assigns a root volume of 8GB to it. Could this be the reason behind the provisioning of instances in EMR also? Then, when do the disks of size 420GB get alloted to an instance?

Also, here is the output of of "df -hi" and "mount"

$ df -hi
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/xvda1              640K    100K    541K   16% /
tmpfs                   932K       3    932K    1% /lib/init/rw
udev                    930K     454    929K    1% /dev
tmpfs                   932K       3    932K    1% /dev/shm
ip-10-182-182-151.ec2.internal:/mapr
                        100G     50G     50G   50% /mapr

$ mount
/dev/xvda1 on / type ext3 (rw,noatime)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/var/run on /run type none (rw,bind)
/var/lock on /run/lock type none (rw,bind)
/dev/shm on /run/shm type none (rw,bind)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
ip-10-182-182-151.ec2.internal:/mapr on /mapr type nfs (rw,addr=10.182.182.151)


$ lsblk
NAME  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
xvda1 202:1    0    10G  0 disk /
xvdb  202:16   0   420G  0 disk 
xvdc  202:32   0   420G  0 disk

With the help of @slayedbylucifer I was able to identify the problem was that the complete disk space is made available to the HDFS on the cluster by default. Hence, there is the default 10GB of space mounted on / available for local use by the machine. There is an option called --mfs-percentage which can be used (while using MapR distribution of Hadoop) to specify the split of disk space between the local filesystem and HDFS. It mounts the local filesystem quota at /var/tmp. Make sure that the option mapred.local.dir is set to a directory inside /var/tmp because that is where all the logs of the tasktracker attempts go in which can be huge in size for big jobs. The logging in my case was causing the disk space error. I set the value of --mfs-percentage to 60 and was able to run the job successfully thereafter.

Recommended topics

Hot tags