mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2025-04-01 22:48:25 +00:00
MEDIUM: checks: Add supplementary agent checks
Allow an auxiliary agent check to be run independently of the regular a regular health check. This is enabled by the agent-check server setting. The agent-port, which specifies the TCP port to use for the agent's connections, is required. The agent-inter, which specifies the interval between agent checks and timeout of agent checks, is optional. If not set the value for regular checks is used. e.g. server web1_1 127.0.0.1:80 check agent-port 10000 If either the health or agent check determines that a server is down then it is marked as being down, otherwise it is marked as being up. An agent health check performed by opening a TCP socket and reading an ASCII string. The string should have one of the following forms: * An ASCII representation of an positive integer percentage. e.g. "75%" Values in this format will set the weight proportional to the initial weight of a server as configured when haproxy starts. * The string "drain". This will cause the weight of a server to be set to 0, and thus it will not accept any new connections other than those that are accepted via persistence. * The string "down", optionally followed by a description string. Mark the server as down and log the description string as the reason. * The string "stopped", optionally followed by a description string. This currently has the same behaviour as "down". * The string "fail", optionally followed by a description string. This currently has the same behaviour as "down". Signed-off-by: Simon Horman <horms@verge.net.au>
This commit is contained in:
parent
afc47ee7fc
commit
d60d69138b
@ -777,11 +777,12 @@ nosplice
|
||||
"option splice-response".
|
||||
|
||||
spread-checks <0..50, in percent>
|
||||
Sometimes it is desirable to avoid sending health checks to servers at exact
|
||||
intervals, for instance when many logical servers are located on the same
|
||||
physical server. With the help of this parameter, it becomes possible to add
|
||||
some randomness in the check interval between 0 and +/- 50%. A value between
|
||||
2 and 5 seems to show good results. The default value remains at 0.
|
||||
Sometimes it is desirable to avoid sending agent and health checks to
|
||||
servers at exact intervals, for instance when many logical servers are
|
||||
located on the same physical server. With the help of this parameter, it
|
||||
becomes possible to add some randomness in the check interval between 0
|
||||
and +/- 50%. A value between 2 and 5 seems to show good results. The
|
||||
default value remains at 0.
|
||||
|
||||
tune.bufsize <number>
|
||||
Sets the buffer size to this size (in bytes). Lower values allow more
|
||||
@ -7669,6 +7670,66 @@ addr <ipv4|ipv6>
|
||||
|
||||
Supported in default-server: No
|
||||
|
||||
agent-check
|
||||
Enable an auxiliary agent check which is run independently of a regular
|
||||
health check. An agent health check is performed by making a TCP
|
||||
connection to the port set by the "agent-port" parameter" and reading
|
||||
an ASCII string. The string should have one of the following forms:
|
||||
|
||||
* An ASCII representation of an positive integer percentage.
|
||||
e.g. "75%"
|
||||
|
||||
Values in this format will set the weight proportional to the initial
|
||||
weight of a server as configured when haproxy starts.
|
||||
|
||||
* The string "drain".
|
||||
|
||||
This will cause the weight of a server to be set to 0, and thus it will
|
||||
not accept any new connections other than those that are accepted via
|
||||
persistence.
|
||||
|
||||
* The string "down", optionally followed by a description string.
|
||||
|
||||
Mark the server as down and log the description string as the reason.
|
||||
|
||||
* The string "stopped", optionally followed by a description string.
|
||||
|
||||
This currently has the same behaviour as "down".
|
||||
|
||||
* The string "fail", optionally followed by a description string.
|
||||
|
||||
This currently has the same behaviour as "down".
|
||||
|
||||
Requires the ""agent-port" parameter to be set.
|
||||
See also the "agent-check" parameter.
|
||||
|
||||
Supported in default-server: No
|
||||
|
||||
agent-inter <delay>
|
||||
The "agent-inter" parameter sets the interval between two agent checks
|
||||
to <delay> milliseconds. If left unspecified, the delay defaults to 2000 ms.
|
||||
|
||||
Just as with every other time-based parameter, it may be entered in any
|
||||
other explicit unit among { us, ms, s, m, h, d }. The "agent-inter"
|
||||
parameter also serves as a timeout for agent checks "timeout check" is
|
||||
not set. In order to reduce "resonance" effects when multiple servers are
|
||||
hosted on the same hardware, the agent and health checks of all servers
|
||||
are started with a small time offset between them. It is also possible to
|
||||
add some random noise in the agent and health checks interval using the
|
||||
global "spread-checks" keyword. This makes sense for instance when a lot
|
||||
of backends use the same servers.
|
||||
|
||||
See also the "agent-check" and "agent-port" parameters.
|
||||
|
||||
Supported in default-server: Yes
|
||||
|
||||
agent-port <port>
|
||||
The "agent-port" parameter sets the TCP port used for agent checks.
|
||||
|
||||
See also the "agent-check" and "agent-inter" parameters.
|
||||
|
||||
Supported in default-server: Yes
|
||||
|
||||
backup
|
||||
When "backup" is present on a server line, the server is only used in load
|
||||
balancing when all other non-backup servers are unavailable. Requests coming
|
||||
@ -7844,11 +7905,11 @@ downinter <delay>
|
||||
other explicit unit among { us, ms, s, m, h, d }. The "inter" parameter also
|
||||
serves as a timeout for health checks sent to servers if "timeout check" is
|
||||
not set. In order to reduce "resonance" effects when multiple servers are
|
||||
hosted on the same hardware, the health-checks of all servers are started
|
||||
with a small time offset between them. It is also possible to add some random
|
||||
noise in the health checks interval using the global "spread-checks"
|
||||
keyword. This makes sense for instance when a lot of backends use the same
|
||||
servers.
|
||||
hosted on the same hardware, the agent and health checks of all servers
|
||||
are started with a small time offset between them. It is also possible to
|
||||
add some random noise in the agent and health checks interval using the
|
||||
global "spread-checks" keyword. This makes sense for instance when a lot
|
||||
of backends use the same servers.
|
||||
|
||||
Supported in default-server: Yes
|
||||
|
||||
|
@ -55,6 +55,9 @@
|
||||
/* unused: 0x0100, 0x0200, 0x0400 */
|
||||
#define SRV_SEND_PROXY 0x0800 /* this server talks the PROXY protocol */
|
||||
#define SRV_NON_STICK 0x1000 /* never add connections allocated to this server to a stick table */
|
||||
#define SRV_AGENT_CHECKED 0x2000 /* this server needs to be checked using an agent check.
|
||||
* This is run independently of the main check whose
|
||||
* presence is indicated by the SRV_CHECKED flag */
|
||||
|
||||
/* function which act on servers need to return various errors */
|
||||
#define SRV_STATUS_OK 0 /* everything is OK. */
|
||||
@ -190,6 +193,7 @@ struct server {
|
||||
} check_common;
|
||||
|
||||
struct check check; /* health-check specific configuration */
|
||||
struct check agent; /* agent specific configuration */
|
||||
|
||||
#ifdef USE_OPENSSL
|
||||
int use_ssl; /* ssl enabled */
|
||||
|
@ -1325,9 +1325,13 @@ void init_default_instance()
|
||||
defproxy.defsrv.check.inter = DEF_CHKINTR;
|
||||
defproxy.defsrv.check.fastinter = 0;
|
||||
defproxy.defsrv.check.downinter = 0;
|
||||
defproxy.defsrv.agent.inter = DEF_CHKINTR;
|
||||
defproxy.defsrv.agent.fastinter = 0;
|
||||
defproxy.defsrv.agent.downinter = 0;
|
||||
defproxy.defsrv.rise = DEF_RISETIME;
|
||||
defproxy.defsrv.fall = DEF_FALLTIME;
|
||||
defproxy.defsrv.check.port = 0;
|
||||
defproxy.defsrv.agent.port = 0;
|
||||
defproxy.defsrv.maxqueue = 0;
|
||||
defproxy.defsrv.minconn = 0;
|
||||
defproxy.defsrv.maxconn = 0;
|
||||
@ -4172,7 +4176,7 @@ stats_error_parsing:
|
||||
else if (!strcmp(args[0], "server") || !strcmp(args[0], "default-server")) { /* server address */
|
||||
int cur_arg;
|
||||
short realport = 0;
|
||||
int do_check = 0, defsrv = (*args[0] == 'd');
|
||||
int do_agent = 0, do_check = 0, defsrv = (*args[0] == 'd');
|
||||
|
||||
if (!defsrv && curproxy == &defproxy) {
|
||||
Alert("parsing [%s:%d] : '%s' not allowed in 'defaults' section.\n", file, linenum, args[0]);
|
||||
@ -4219,6 +4223,7 @@ stats_error_parsing:
|
||||
LIST_INIT(&newsrv->actconns);
|
||||
LIST_INIT(&newsrv->pendconns);
|
||||
do_check = 0;
|
||||
do_agent = 0;
|
||||
newsrv->state = SRV_RUNNING; /* early server setup */
|
||||
newsrv->last_change = now.tv_sec;
|
||||
newsrv->id = strdup(args[1]);
|
||||
@ -4272,11 +4277,16 @@ stats_error_parsing:
|
||||
goto out;
|
||||
}
|
||||
|
||||
newsrv->check.use_ssl = curproxy->defsrv.check.use_ssl;
|
||||
newsrv->check.use_ssl = curproxy->defsrv.check.use_ssl;
|
||||
newsrv->check.port = curproxy->defsrv.check.port;
|
||||
newsrv->check.inter = curproxy->defsrv.check.inter;
|
||||
newsrv->check.fastinter = curproxy->defsrv.check.fastinter;
|
||||
newsrv->check.downinter = curproxy->defsrv.check.downinter;
|
||||
newsrv->agent.use_ssl = curproxy->defsrv.agent.use_ssl;
|
||||
newsrv->agent.port = curproxy->defsrv.agent.port;
|
||||
newsrv->agent.inter = curproxy->defsrv.agent.inter;
|
||||
newsrv->agent.fastinter = curproxy->defsrv.agent.fastinter;
|
||||
newsrv->agent.downinter = curproxy->defsrv.agent.downinter;
|
||||
newsrv->rise = curproxy->defsrv.rise;
|
||||
newsrv->fall = curproxy->defsrv.fall;
|
||||
newsrv->maxqueue = curproxy->defsrv.maxqueue;
|
||||
@ -4296,6 +4306,10 @@ stats_error_parsing:
|
||||
newsrv->check.health = newsrv->rise; /* up, but will fall down at first failure */
|
||||
newsrv->check.server = newsrv;
|
||||
|
||||
newsrv->agent.status = HCHK_STATUS_INI;
|
||||
newsrv->agent.health = newsrv->rise; /* up, but will fall down at first failure */
|
||||
newsrv->agent.server = newsrv;
|
||||
|
||||
cur_arg = 3;
|
||||
} else {
|
||||
newsrv = &curproxy->defsrv;
|
||||
@ -4303,7 +4317,33 @@ stats_error_parsing:
|
||||
}
|
||||
|
||||
while (*args[cur_arg]) {
|
||||
if (!defsrv && !strcmp(args[cur_arg], "cookie")) {
|
||||
if (!strcmp(args[cur_arg], "agent-check")) {
|
||||
global.maxsock++;
|
||||
do_agent = 1;
|
||||
cur_arg += 1;
|
||||
} else if (!strcmp(args[cur_arg], "agent-inter")) {
|
||||
const char *err = parse_time_err(args[cur_arg + 1], &val, TIME_UNIT_MS);
|
||||
if (err) {
|
||||
Alert("parsing [%s:%d] : unexpected character '%c' in 'agent-inter' argument of server %s.\n",
|
||||
file, linenum, *err, newsrv->id);
|
||||
err_code |= ERR_ALERT | ERR_FATAL;
|
||||
goto out;
|
||||
}
|
||||
if (val <= 0) {
|
||||
Alert("parsing [%s:%d]: invalid value %d for argument '%s' of server %s.\n",
|
||||
file, linenum, val, args[cur_arg], newsrv->id);
|
||||
err_code |= ERR_ALERT | ERR_FATAL;
|
||||
goto out;
|
||||
}
|
||||
newsrv->agent.inter = val;
|
||||
cur_arg += 2;
|
||||
}
|
||||
else if (!strcmp(args[cur_arg], "agent-port")) {
|
||||
global.maxsock++;
|
||||
newsrv->agent.port = atol(args[cur_arg + 1]);
|
||||
cur_arg += 2;
|
||||
}
|
||||
else if (!defsrv && !strcmp(args[cur_arg], "cookie")) {
|
||||
newsrv->cookie = strdup(args[cur_arg + 1]);
|
||||
newsrv->cklen = strlen(args[cur_arg + 1]);
|
||||
cur_arg += 2;
|
||||
@ -4331,6 +4371,8 @@ stats_error_parsing:
|
||||
|
||||
if (newsrv->check.health)
|
||||
newsrv->check.health = newsrv->rise;
|
||||
if (newsrv->agent.health)
|
||||
newsrv->agent.health = newsrv->rise;
|
||||
cur_arg += 2;
|
||||
}
|
||||
else if (!strcmp(args[cur_arg], "fall")) {
|
||||
@ -4512,6 +4554,7 @@ stats_error_parsing:
|
||||
newsrv->state |= SRV_MAINTAIN;
|
||||
newsrv->state &= ~SRV_RUNNING;
|
||||
newsrv->check.health = 0;
|
||||
newsrv->agent.health = 0;
|
||||
cur_arg += 1;
|
||||
}
|
||||
else if (!defsrv && !strcmp(args[cur_arg], "observe")) {
|
||||
@ -4913,6 +4956,28 @@ stats_error_parsing:
|
||||
newsrv->state |= SRV_CHECKED;
|
||||
}
|
||||
|
||||
if (do_agent) {
|
||||
int ret;
|
||||
|
||||
if (!newsrv->agent.port) {
|
||||
Alert("parsing [%s:%d] : server %s does not have agent port. Agent check has been disabled.\n",
|
||||
file, linenum, newsrv->id);
|
||||
err_code |= ERR_ALERT | ERR_FATAL;
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (!newsrv->agent.inter)
|
||||
newsrv->agent.inter = newsrv->check.inter;
|
||||
|
||||
ret = init_check(&newsrv->agent, PR_O2_LB_AGENT_CHK, file, linenum);
|
||||
if (ret) {
|
||||
err_code |= ret;
|
||||
goto out;
|
||||
}
|
||||
|
||||
newsrv->state |= SRV_AGENT_CHECKED;
|
||||
}
|
||||
|
||||
if (!defsrv) {
|
||||
if (newsrv->state & SRV_BACKUP)
|
||||
curproxy->srv_bck++;
|
||||
@ -6802,6 +6867,7 @@ out_uri_auth_compat:
|
||||
newsrv->state |= SRV_MAINTAIN;
|
||||
newsrv->state &= ~SRV_RUNNING;
|
||||
newsrv->check.health = 0;
|
||||
newsrv->agent.health = 0;
|
||||
}
|
||||
|
||||
newsrv->track = srv;
|
||||
|
30
src/checks.c
30
src/checks.c
@ -398,7 +398,7 @@ void set_server_down(struct check *check)
|
||||
check->health = s->rise;
|
||||
}
|
||||
|
||||
if (check->health == s->rise || s->track) {
|
||||
if ((s->state & SRV_RUNNING && check->health == s->rise) || s->track) {
|
||||
int srv_was_paused = s->state & SRV_GOINGDOWN;
|
||||
int prev_srv_count = s->proxy->srv_bck + s->proxy->srv_act;
|
||||
|
||||
@ -465,7 +465,8 @@ void set_server_up(struct check *check) {
|
||||
check->health = s->rise;
|
||||
}
|
||||
|
||||
if (check->health == s->rise || s->track) {
|
||||
if ((s->check.health >= s->rise && s->agent.health >= s->rise &&
|
||||
check->health == s->rise) || s->track) {
|
||||
if (s->proxy->srv_bck == 0 && s->proxy->srv_act == 0) {
|
||||
if (s->proxy->last_change < now.tv_sec) // ignore negative times
|
||||
s->proxy->down_time += now.tv_sec - s->proxy->last_change;
|
||||
@ -1314,8 +1315,11 @@ static struct task *process_chk(struct task *t)
|
||||
check->bo->p = check->bo->data;
|
||||
check->bo->o = 0;
|
||||
|
||||
/* prepare the check buffer */
|
||||
if (check->type) {
|
||||
/* prepare the check buffer
|
||||
* This should not be used if check is the secondary agent check
|
||||
* of a server as s->proxy->check_req will relate to the
|
||||
* configuration of the primary check */
|
||||
if (check->type && check != &s->agent) {
|
||||
bo_putblk(check->bo, s->proxy->check_req, s->proxy->check_len);
|
||||
|
||||
/* we want to check if this host replies to HTTP or SSLv3 requests
|
||||
@ -1584,12 +1588,20 @@ int start_checks() {
|
||||
*/
|
||||
for (px = proxy; px; px = px->next) {
|
||||
for (s = px->srv; s; s = s->next) {
|
||||
if (!(s->state & SRV_CHECKED))
|
||||
continue;
|
||||
/* A task for the main check */
|
||||
if (s->state & SRV_CHECKED) {
|
||||
if (!start_check_task(&s->check, mininter, nbcheck, srvpos))
|
||||
return -1;
|
||||
srvpos++;
|
||||
}
|
||||
|
||||
if (!start_check_task(&s->check, mininter, nbcheck, srvpos))
|
||||
return -1;
|
||||
srvpos++;
|
||||
/* A task for a auxiliary agent check */
|
||||
if (s->state & SRV_AGENT_CHECKED) {
|
||||
if (!start_check_task(&s->agent, mininter, nbcheck, srvpos)) {
|
||||
return -1;
|
||||
}
|
||||
srvpos++;
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
|
@ -1120,6 +1120,10 @@ void deinit(void)
|
||||
task_delete(s->check.task);
|
||||
task_free(s->check.task);
|
||||
}
|
||||
if (s->agent.task) {
|
||||
task_delete(s->agent.task);
|
||||
task_free(s->agent.task);
|
||||
}
|
||||
|
||||
if (s->warmup) {
|
||||
task_delete(s->warmup);
|
||||
@ -1130,6 +1134,8 @@ void deinit(void)
|
||||
free(s->cookie);
|
||||
free(s->check.bi);
|
||||
free(s->check.bo);
|
||||
free(s->agent.bi);
|
||||
free(s->agent.bo);
|
||||
free(s);
|
||||
s = s_next;
|
||||
}/* end while(s) */
|
||||
|
Loading…
Reference in New Issue
Block a user