No space when running a hadoop job

Discussion:

Abdul Navaz

2014-09-26 14:37:09 UTC

Hi

I am facing some space issue when I saving file into HDFS and/or running map
reduce job.

***@nn:~# df -h

Filesystem Size Used Avail Use%
Mounted on

/dev/xvda2 5.9G 5.9G 0 100% /

udev 98M 4.0K 98M 1% /dev

tmpfs 48M 192K 48M 1% /run

none 5.0M 0 5.0M 0%
/run/lock

none 120M 0 120M 0%
/run/shm

overflow 1.0M 4.0K 1020K 1% /tmp

/dev/xvda4 7.9G 147M 7.4G 2% /mnt

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59%
/proj/ch-geni-net

***@nn:~#

I can see there is no space left on /dev/xvda2.

How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to move
the file manually from /dev/xvda2 to xvda4 ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388

Matt Narrell

2014-09-26 14:54:05 UTC

Permalink

You can add a comma separated list of paths to the dfs.datanode.data.dir property in your hdfs-site.xml

mn

Hi
I am facing some space issue when I saving file into HDFS and/or running map reduce job.
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 5.9G 5.9G 0 100% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 192K 48M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 120M 0 120M 0% /run/shm
overflow 1.0M 4.0K 1020K 1% /tmp
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59% /groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59% /proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388

Susheel Kumar Gadalay

2014-09-27 14:27:55 UTC

Permalink

Correct me if I am wrong.

Adding multiple directories will not balance the files distributions
across these locations.

Hadoop will add exhaust the first directory and then start using the
next, next ..

How can I tell Hadoop to evenly balance across these directories.

You can add a comma separated list of paths to the “dfs.datanode.data.dir”
property in your hdfs-site.xml
mn

Hi
I am facing some space issue when I saving file into HDFS and/or running map reduce job.
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 5.9G 5.9G 0 100% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 192K 48M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 120M 0 120M 0% /run/shm
overflow 1.0M 4.0K 1020K 1% /tmp
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59%
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59% /proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388

Alexander Pivovarov

2014-09-27 17:11:36 UTC

Permalink

It can read/write in parallel to all drives. More hdd more io speed.

Post by Susheel Kumar Gadalay
Correct me if I am wrong.
Adding multiple directories will not balance the files distributions
across these locations.
Hadoop will add exhaust the first directory and then start using the
next, next ..
How can I tell Hadoop to evenly balance across these directories.

Post by Matt Narrell
You can add a comma separated list of paths to the

âdfs.datanode.data.dirâ

Post by Matt Narrell
property in your hdfs-site.xml
mn

Post by Abdul Navaz
Hi
I am facing some space issue when I saving file into HDFS and/or running
map reduce job.
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 5.9G 5.9G 0 100% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 192K 48M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 120M 0 120M 0% /run/shm
overflow 1.0M 4.0K 1020K 1% /tmp
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59%
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59%
/proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388

Susheel Kumar Gadalay

2014-09-29 04:53:40 UTC

Permalink

You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.

But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.

In this case how can we balance the distribution of files?

One way is to list the files and move.

Will start balance script will work?

Post by Alexander Pivovarov
It can read/write in parallel to all drives. More hdd more io speed.

Post by Matt Narrell
You can add a comma separated list of paths to the

“dfs.datanode.data.dir”

Post by Matt Narrell
property in your hdfs-site.xml
mn

Post by Abdul Navaz
Hi
I am facing some space issue when I saving file into HDFS and/or
running
map reduce job.
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 5.9G 5.9G 0 100% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 192K 48M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 120M 0 120M 0% /run/shm
overflow 1.0M 4.0K 1020K 1% /tmp
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59%
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59%
/proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
to
move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388

Aitor Cedres

2014-09-29 11:53:45 UTC

Permalink

Hi Susheel,

Adding a new directory to âdfs.datanode.data.dirâ will not balance your
disks straightforward. Eventually, by HDFS activity (deleting/invalidating
some block, writing new ones), the disks will become balanced. If you want
to balance them right after adding the new disk and changing the
âdfs.datanode.data.dirâ
value, you have to shutdown the DN and manually move (mv) some files in the
old directory to the new one.

The balancer will try to balance the usage between HDFS nodes, but it won't
care about "internal" node disks utilization. For your particular case, the
balancer won't fix your issue.

Hope it helps,
Aitor

Post by Susheel Kumar Gadalay
You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.
But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.
In this case how can we balance the distribution of files?
One way is to list the files and move.
Will start balance script will work?

Post by Alexander Pivovarov
It can read/write in parallel to all drives. More hdd more io speed.

Post by Matt Narrell
You can add a comma separated list of paths to the

âdfs.datanode.data.dirâ

Post by Matt Narrell
property in your hdfs-site.xml
mn

Post by Abdul Navaz
Hi
I am facing some space issue when I saving file into HDFS and/or
running
map reduce job.
Filesystem Size Used Avail

Use%