Tucker
2013-03-25 18:14:48 UTC
Does anyone have a working example of a node manager health checker scipt
using "yarn.nodemanager.health-checker.script.opts"? I wrote a health
checker that works fine but one of the items being checked is a little too
sensitive. Since I wrote it to be able to load and unload modules by
passing various flags. Unfortunately, adding these flags to my config
doesn't seem to have had any affect and we've had to disable the health
check entirely.
For reference:
$ health_checker -h
Usage: health_checker [options]
--default-disabled Default all checks disabled.
-e, --enable-checks CHECKS Command separated list of checks to
enable.
-d, --disable-checks CHECKS Command separated list of checks to
disable.
-l, --list List available checks.
Settings used:
<property>
<name>yarn.nodemanager.health-checker.script.path</name>
<value>/usr/bin/health_checker</value>
</property>
...
<property>
<name>yarn.nodemanager.health-checker.script.opts</name>
<value>-d Network</value>
</property>
If the flag were actually being passed, I would expect the output to be
return healthy. This is what I see on a command line:
# health_checker
ERROR(s): ["Errors found on interface eth2."]
# health_checker -d Network
Healthy
# echo $?
0
Unfortunately, even with opts set, I continue to get the interface errors
warning after cluster start and beyond the run interval. I assume I'm
missing something but I can't seem to find any good docs on the matter.
using "yarn.nodemanager.health-checker.script.opts"? I wrote a health
checker that works fine but one of the items being checked is a little too
sensitive. Since I wrote it to be able to load and unload modules by
passing various flags. Unfortunately, adding these flags to my config
doesn't seem to have had any affect and we've had to disable the health
check entirely.
For reference:
$ health_checker -h
Usage: health_checker [options]
--default-disabled Default all checks disabled.
-e, --enable-checks CHECKS Command separated list of checks to
enable.
-d, --disable-checks CHECKS Command separated list of checks to
disable.
-l, --list List available checks.
Settings used:
<property>
<name>yarn.nodemanager.health-checker.script.path</name>
<value>/usr/bin/health_checker</value>
</property>
...
<property>
<name>yarn.nodemanager.health-checker.script.opts</name>
<value>-d Network</value>
</property>
If the flag were actually being passed, I would expect the output to be
return healthy. This is what I see on a command line:
# health_checker
ERROR(s): ["Errors found on interface eth2."]
# health_checker -d Network
Healthy
# echo $?
0
Unfortunately, even with opts set, I continue to get the interface errors
warning after cluster start and beyond the run interval. I assume I'm
missing something but I can't seem to find any good docs on the matter.
--
--tucker
--tucker