Discussion:
final the dfs.replication and fsck
(too old to reply)
Patai Sangbutsarakum
2012-10-15 19:01:32 UTC
Permalink
Hi Hadoopers,

I have
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>

set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.

Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.

I am on cdh3u4.

please suggest.
Patai
Chris Nauroth
2012-10-15 19:18:19 UTC
Permalink
Hello Patai,

Has your configuration file change been copied to all nodes in the cluster?

Are there applications connecting from outside of the cluster? If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties). These would not be
limited by final declarations in the cluster's configuration files.
<final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.

Hope this helps,
--Chris

On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum <
Post by Patai Sangbutsarakum
Hi Hadoopers,
I have
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.
Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.
I am on cdh3u4.
please suggest.
Patai
Harsh J
2012-10-15 19:23:41 UTC
Permalink
Hey Chris,

The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.

The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.

On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
Post by Chris Nauroth
Hello Patai,
Has your configuration file change been copied to all nodes in the cluster?
Are there applications connecting from outside of the cluster? If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties). These would not be
limited by final declarations in the cluster's configuration files.
<final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.
Hope this helps,
--Chris
On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Hi Hadoopers,
I have
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.
Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.
I am on cdh3u4.
please suggest.
Patai
--
Harsh J
Chris Nauroth
2012-10-15 20:19:49 UTC
Permalink
Thank you, Harsh. I did not know about dfs.replication.max.
Post by Harsh J
Hey Chris,
The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.
The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.
On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
Post by Chris Nauroth
Hello Patai,
Has your configuration file change been copied to all nodes in the
cluster?
Post by Chris Nauroth
Are there applications connecting from outside of the cluster? If so,
then
Post by Chris Nauroth
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties). These would not be
limited by final declarations in the cluster's configuration files.
<final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.
Hope this helps,
--Chris
On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Hi Hadoopers,
I have
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.
Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.
I am on cdh3u4.
please suggest.
Patai
--
Harsh J
Patai Sangbutsarakum
2012-10-15 20:57:13 UTC
Permalink
Thanks Harsh, dfs.replication.max does do the magic!!
Post by Chris Nauroth
Thank you, Harsh. I did not know about dfs.replication.max.
Post by Harsh J
Hey Chris,
The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.
The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.
On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
Post by Chris Nauroth
Hello Patai,
Has your configuration file change been copied to all nodes in the cluster?
Are there applications connecting from outside of the cluster? If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties). These would not be
limited by final declarations in the cluster's configuration files.
<final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.
Hope this helps,
--Chris
On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Hi Hadoopers,
I have
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.
Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.
I am on cdh3u4.
please suggest.
Patai
--
Harsh J
Patai Sangbutsarakum
2012-10-16 00:02:08 UTC
Permalink
Just want to share & check if this is make sense.

Job was failed to run after i restarted the namenode and the cluster
stopped complain about under-replication.

this is what i found in log file

Requested replication 10 exceeds maximum 2
java.io.IOException: file
/tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
Requested replication 10 exceeds maximum 2
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143


So, i scanned though those xml config files, and guess to change
<name>mapred.submit.replication</name> from 10 to 2, and restarted again.

That's when jobs can start running again.
Hopefully that change is make sense.


Thanks
Patai

On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Thanks Harsh, dfs.replication.max does do the magic!!
Post by Chris Nauroth
Thank you, Harsh. I did not know about dfs.replication.max.
Post by Harsh J
Hey Chris,
The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.
The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.
On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
Post by Chris Nauroth
Hello Patai,
Has your configuration file change been copied to all nodes in the cluster?
Are there applications connecting from outside of the cluster? If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties). These would not be
limited by final declarations in the cluster's configuration files.
<final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.
Hope this helps,
--Chris
On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Hi Hadoopers,
I have
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.
Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.
I am on cdh3u4.
please suggest.
Patai
--
Harsh J
Harsh J
2012-10-16 04:25:24 UTC
Permalink
Patai,

My bad - that was on my mind but I missed noting it down on my earlier
reply. Yes you'd have to control that as well. 2 should be fine for
smaller clusters.

On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Just want to share & check if this is make sense.
Job was failed to run after i restarted the namenode and the cluster
stopped complain about under-replication.
this is what i found in log file
Requested replication 10 exceeds maximum 2
java.io.IOException: file
/tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
Requested replication 10 exceeds maximum 2
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
So, i scanned though those xml config files, and guess to change
<name>mapred.submit.replication</name> from 10 to 2, and restarted again.
That's when jobs can start running again.
Hopefully that change is make sense.
Thanks
Patai
On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Thanks Harsh, dfs.replication.max does do the magic!!
Post by Chris Nauroth
Thank you, Harsh. I did not know about dfs.replication.max.
Post by Harsh J
Hey Chris,
The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.
The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.
On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
Post by Chris Nauroth
Hello Patai,
Has your configuration file change been copied to all nodes in the cluster?
Are there applications connecting from outside of the cluster? If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties). These would not be
limited by final declarations in the cluster's configuration files.
<final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.
Hope this helps,
--Chris
On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Hi Hadoopers,
I have
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.
Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.
I am on cdh3u4.
please suggest.
Patai
--
Harsh J
--
Harsh J
Patai Sangbutsarakum
2012-10-16 07:27:40 UTC
Permalink
Thanks you so much for confirming that.
Post by Chris Nauroth
Patai,
My bad - that was on my mind but I missed noting it down on my earlier
reply. Yes you'd have to control that as well. 2 should be fine for
smaller clusters.
On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Just want to share & check if this is make sense.
Job was failed to run after i restarted the namenode and the cluster
stopped complain about under-replication.
this is what i found in log file
Requested replication 10 exceeds maximum 2
java.io.IOException: file
/tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
Requested replication 10 exceeds maximum 2
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
So, i scanned though those xml config files, and guess to change
<name>mapred.submit.replication</name> from 10 to 2, and restarted again.
That's when jobs can start running again.
Hopefully that change is make sense.
Thanks
Patai
On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Thanks Harsh, dfs.replication.max does do the magic!!
Post by Chris Nauroth
Thank you, Harsh. I did not know about dfs.replication.max.
Post by Harsh J
Hey Chris,
The dfs.replication param is an exception to the <final> config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.
The right way for an administrator to enforce a "max" replication
value at a create/setRep level, would be to set
the dfs.replication.max to a desired value at the NameNode and restart
it.
On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
Post by Chris Nauroth
Hello Patai,
Has your configuration file change been copied to all nodes in the cluster?
Are there applications connecting from outside of the cluster? If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties). These would not be
limited by final declarations in the cluster's configuration files.
<final>true</final> controls configuration file resource loading, but it
does not necessarily block different nodes or different applications from
running with completely different configurations.
Hope this helps,
--Chris
On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Hi Hadoopers,
I have
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.
Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.
I am on cdh3u4.
please suggest.
Patai
--
Harsh J
--
Harsh J
Harsh J
2012-10-15 19:18:21 UTC
Permalink
Hi Patai,

Set the dfs.replication.max parameter to 2 to achieve what you want.

On Tue, Oct 16, 2012 at 12:31 AM, Patai Sangbutsarakum
Post by Patai Sangbutsarakum
Hi Hadoopers,
I have
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
set in hdfs-site.xml in staging environment cluster. while the staging
cluster is running the code that will later be deployed in production,
those code is trying to have dfs.replication of 3, 10, 50, other than
2; the number that developer thought that will fit in production
environment.
Even though I final the property dfs.replication in staging cluster
already. every time i run fsck on the staging cluster i still see it
said under replication.
I thought final keyword will not honor value in job config, but it
doesn't seem so when i run fsck.
I am on cdh3u4.
please suggest.
Patai
--
Harsh J
Continue reading on narkive:
Loading...