Discussion:
No space when running a hadoop job
Abdul Navaz
2014-09-26 14:37:09 UTC
Permalink
Hi

I am facing some space issue when I saving file into HDFS and/or running map
reduce job.

***@nn:~# df -h

Filesystem Size Used Avail Use%
Mounted on

/dev/xvda2 5.9G 5.9G 0 100% /

udev 98M 4.0K 98M 1% /dev

tmpfs 48M 192K 48M 1% /run

none 5.0M 0 5.0M 0%
/run/lock

none 120M 0 120M 0%
/run/shm

overflow 1.0M 4.0K 1020K 1% /tmp

/dev/xvda4 7.9G 147M 7.4G 2% /mnt

172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59%
/groups/ch-geni-net/Hadoop-NET

172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59%
/proj/ch-geni-net

***@nn:~#



I can see there is no space left on /dev/xvda2.

How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to move
the file manually from /dev/xvda2 to xvda4 ?



Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
Matt Narrell
2014-09-26 14:54:05 UTC
Permalink
You can add a comma separated list of paths to the “dfs.datanode.data.dir” property in your hdfs-site.xml

mn
Hi
I am facing some space issue when I saving file into HDFS and/or running map reduce job.
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 5.9G 5.9G 0 100% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 192K 48M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 120M 0 120M 0% /run/shm
overflow 1.0M 4.0K 1020K 1% /tmp
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59% /groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59% /proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
Susheel Kumar Gadalay
2014-09-27 14:27:55 UTC
Permalink
Correct me if I am wrong.

Adding multiple directories will not balance the files distributions
across these locations.

Hadoop will add exhaust the first directory and then start using the
next, next ..

How can I tell Hadoop to evenly balance across these directories.
You can add a comma separated list of paths to the “dfs.datanode.data.dir”
property in your hdfs-site.xml
mn
Hi
I am facing some space issue when I saving file into HDFS and/or running map reduce job.
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 5.9G 5.9G 0 100% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 192K 48M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 120M 0 120M 0% /run/shm
overflow 1.0M 4.0K 1020K 1% /tmp
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59%
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59% /proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
Alexander Pivovarov
2014-09-27 17:11:36 UTC
Permalink
It can read/write in parallel to all drives. More hdd more io speed.
Post by Susheel Kumar Gadalay
Correct me if I am wrong.
Adding multiple directories will not balance the files distributions
across these locations.
Hadoop will add exhaust the first directory and then start using the
next, next ..
How can I tell Hadoop to evenly balance across these directories.
Post by Matt Narrell
You can add a comma separated list of paths to the
“dfs.datanode.data.dir”
Post by Matt Narrell
property in your hdfs-site.xml
mn
Post by Abdul Navaz
Hi
I am facing some space issue when I saving file into HDFS and/or running
map reduce job.
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 5.9G 5.9G 0 100% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 192K 48M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 120M 0 120M 0% /run/shm
overflow 1.0M 4.0K 1020K 1% /tmp
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59%
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59%
/proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need to
move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
Susheel Kumar Gadalay
2014-09-29 04:53:40 UTC
Permalink
You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.

But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.

In this case how can we balance the distribution of files?

One way is to list the files and move.

Will start balance script will work?
Post by Alexander Pivovarov
It can read/write in parallel to all drives. More hdd more io speed.
Post by Susheel Kumar Gadalay
Correct me if I am wrong.
Adding multiple directories will not balance the files distributions
across these locations.
Hadoop will add exhaust the first directory and then start using the
next, next ..
How can I tell Hadoop to evenly balance across these directories.
Post by Matt Narrell
You can add a comma separated list of paths to the
“dfs.datanode.data.dir”
Post by Matt Narrell
property in your hdfs-site.xml
mn
Post by Abdul Navaz
Hi
I am facing some space issue when I saving file into HDFS and/or
running
map reduce job.
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 5.9G 5.9G 0 100% /
udev 98M 4.0K 98M 1% /dev
tmpfs 48M 192K 48M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 120M 0 120M 0% /run/shm
overflow 1.0M 4.0K 1020K 1% /tmp
/dev/xvda4 7.9G 147M 7.4G 2% /mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G 59%
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G 59%
/proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
to
move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
Aitor Cedres
2014-09-29 11:53:45 UTC
Permalink
Hi Susheel,

Adding a new directory to “dfs.datanode.data.dir” will not balance your
disks straightforward. Eventually, by HDFS activity (deleting/invalidating
some block, writing new ones), the disks will become balanced. If you want
to balance them right after adding the new disk and changing the
“dfs.datanode.data.dir”
value, you have to shutdown the DN and manually move (mv) some files in the
old directory to the new one.

The balancer will try to balance the usage between HDFS nodes, but it won't
care about "internal" node disks utilization. For your particular case, the
balancer won't fix your issue.

Hope it helps,
Aitor
Post by Susheel Kumar Gadalay
You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.
But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.
In this case how can we balance the distribution of files?
One way is to list the files and move.
Will start balance script will work?
Post by Alexander Pivovarov
It can read/write in parallel to all drives. More hdd more io speed.
Post by Susheel Kumar Gadalay
Correct me if I am wrong.
Adding multiple directories will not balance the files distributions
across these locations.
Hadoop will add exhaust the first directory and then start using the
next, next ..
How can I tell Hadoop to evenly balance across these directories.
Post by Matt Narrell
You can add a comma separated list of paths to the
“dfs.datanode.data.dir”
Post by Matt Narrell
property in your hdfs-site.xml
mn
Post by Abdul Navaz
Hi
I am facing some space issue when I saving file into HDFS and/or
running
map reduce job.
Filesystem Size Used Avail
Use%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
Mounted on
/dev/xvda2 5.9G 5.9G 0
100%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/
udev 98M 4.0K 98M
1%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/dev
tmpfs 48M 192K 48M
1%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/run
none 5.0M 0 5.0M
0%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/run/lock
none 120M 0 120M
0%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/run/shm
overflow 1.0M 4.0K 1020K
1%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/tmp
/dev/xvda4 7.9G 147M 7.4G
2%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G
59%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G
59%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
to
move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
Susheel Kumar Gadalay
2014-09-29 12:15:23 UTC
Permalink
Thank Aitor.

That is what is my observation too.

I added a new disk location and manually moved some files.

But if 2 locations are given at the beginning itself for
dfs.datanode.data.dir, will hadoop balance the disks usage, if not
perfect because file sizes may differ.
Post by Aitor Cedres
Hi Susheel,
Adding a new directory to “dfs.datanode.data.dir” will not balance your
disks straightforward. Eventually, by HDFS activity (deleting/invalidating
some block, writing new ones), the disks will become balanced. If you want
to balance them right after adding the new disk and changing the
“dfs.datanode.data.dir”
value, you have to shutdown the DN and manually move (mv) some files in the
old directory to the new one.
The balancer will try to balance the usage between HDFS nodes, but it won't
care about "internal" node disks utilization. For your particular case, the
balancer won't fix your issue.
Hope it helps,
Aitor
Post by Susheel Kumar Gadalay
You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.
But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.
In this case how can we balance the distribution of files?
One way is to list the files and move.
Will start balance script will work?
Post by Alexander Pivovarov
It can read/write in parallel to all drives. More hdd more io speed.
Post by Susheel Kumar Gadalay
Correct me if I am wrong.
Adding multiple directories will not balance the files distributions
across these locations.
Hadoop will add exhaust the first directory and then start using the
next, next ..
How can I tell Hadoop to evenly balance across these directories.
Post by Matt Narrell
You can add a comma separated list of paths to the
“dfs.datanode.data.dir”
Post by Matt Narrell
property in your hdfs-site.xml
mn
Post by Abdul Navaz
Hi
I am facing some space issue when I saving file into HDFS and/or
running
map reduce job.
Filesystem Size Used Avail
Use%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
Mounted on
/dev/xvda2 5.9G 5.9G 0
100%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/
udev 98M 4.0K 98M
1%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/dev
tmpfs 48M 192K 48M
1%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/run
none 5.0M 0 5.0M
0%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/run/lock
none 120M 0 120M
0%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/run/shm
overflow 1.0M 4.0K 1020K
1%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/tmp
/dev/xvda4 7.9G 147M 7.4G
2%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G
59%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G
59%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
to
move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
Aitor Cedres
2014-09-29 12:53:43 UTC
Permalink
I think they way it works when HDFS has a list in dfs.datanode.data.dir,
it's basically a round robin between disks. And yes, it may not be perfect
balanced cause of different file sizes.
Post by Susheel Kumar Gadalay
Thank Aitor.
That is what is my observation too.
I added a new disk location and manually moved some files.
But if 2 locations are given at the beginning itself for
dfs.datanode.data.dir, will hadoop balance the disks usage, if not
perfect because file sizes may differ.
Post by Aitor Cedres
Hi Susheel,
Adding a new directory to “dfs.datanode.data.dir” will not balance your
disks straightforward. Eventually, by HDFS activity
(deleting/invalidating
Post by Aitor Cedres
some block, writing new ones), the disks will become balanced. If you
want
Post by Aitor Cedres
to balance them right after adding the new disk and changing the
“dfs.datanode.data.dir”
value, you have to shutdown the DN and manually move (mv) some files in
the
Post by Aitor Cedres
old directory to the new one.
The balancer will try to balance the usage between HDFS nodes, but it
won't
Post by Aitor Cedres
care about "internal" node disks utilization. For your particular case,
the
Post by Aitor Cedres
balancer won't fix your issue.
Hope it helps,
Aitor
Post by Susheel Kumar Gadalay
You mean if multiple directory locations are given, Hadoop will
balance the distribution of files across these different directories.
But normally we start with 1 directory location and once it is
reaching the maximum, we add new directory.
In this case how can we balance the distribution of files?
One way is to list the files and move.
Will start balance script will work?
Post by Alexander Pivovarov
It can read/write in parallel to all drives. More hdd more io speed.
On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" <
Post by Susheel Kumar Gadalay
Correct me if I am wrong.
Adding multiple directories will not balance the files distributions
across these locations.
Hadoop will add exhaust the first directory and then start using the
next, next ..
How can I tell Hadoop to evenly balance across these directories.
Post by Matt Narrell
You can add a comma separated list of paths to the
“dfs.datanode.data.dir”
Post by Matt Narrell
property in your hdfs-site.xml
mn
Post by Abdul Navaz
Hi
I am facing some space issue when I saving file into HDFS and/or
running
map reduce job.
Filesystem Size Used Avail
Use%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
Mounted on
/dev/xvda2 5.9G 5.9G 0
100%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/
udev 98M 4.0K 98M
1%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/dev
tmpfs 48M 192K 48M
1%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/run
none 5.0M 0 5.0M
0%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/run/lock
none 120M 0 120M
0%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/run/shm
overflow 1.0M 4.0K 1020K
1%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/tmp
/dev/xvda4 7.9G 147M 7.4G
2%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/mnt
172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G
59%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/groups/ch-geni-net/Hadoop-NET
172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G
59%
Post by Alexander Pivovarov
Post by Susheel Kumar Gadalay
Post by Matt Narrell
Post by Abdul Navaz
/proj/ch-geni-net
I can see there is no space left on /dev/xvda2.
How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I need
to
move the file manually from /dev/xvda2 to xvda4 ?
Thanks & Regards,
Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388
Loading...