srv_is_usable() is broader than srv_is_usable() as it not only considers
the weight but the server's state as well. Future changes will allow a
server to be in drain mode with a non-zero weight, so we should migrate
to use that function instead.
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.
The encoding from the previous to the new states can be represented
this way :
SRV_STF_RUNNING
| SRV_STF_GOINGDOWN
| | SRV_STF_WARMINGUP
| | |
0 x x SRV_ST_STOPPED
1 0 0 SRV_ST_RUNNING
1 0 1 SRV_ST_STARTING
1 1 x SRV_ST_STOPPING
Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.
It's worth noting that the internal function names are still quite confusing.
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
We used to call srv_is_usable() with either the current state and weights
or the previous ones. This causes trouble for future changes, so let's first
split it in two variants :
- srv_is_usable(srv) considers the current status
- srv_was_usable(srv) considers the previous status
Detecting that a server's status has changed is a bit messy, as well
as it is to commit the status changes. We'll have to add new conditions
soon and we'd better avoid to multiply the number of touched locations
with the high risk of forgetting them.
This commit introduces :
- srv_lb_status_changed() to report if the status changed from the
previously committed one ;
- svr_lb_commit_status() to commit the current status
The function is now used by all load-balancing algorithms.
A crash was reported by Igor at owind when changing a server's weight
on the CLI. Lukas Tribus could reproduce a related bug where setting
a server's weight would result in the new weight being multiplied by
the initial one. The two bugs are the same.
The incorrect weight calculation results in the total farm weight being
larger than what was initially allocated, causing the map index to be out
of bounds on some hashes. It's easy to reproduce using "balance url_param"
with a variable param, or with "balance static-rr".
It appears that the calculation is made at many places and is not always
right and not always wrong the same way. Thus, this patch introduces a
new function "server_recalc_eweight()" which is dedicated to this task
of computing ->eweight from many other elements including uweight and
current time (for slowstart), and all users now switch to use this
function.
The patch is a bit large but the code was not trivially fixable in a way
that could guarantee this situation would not occur anymore. The fix is
much more readable and has been verified to work with all algorithms,
with both consistent and map-based hashes, and even with static-rr.
Slowstart was tested as well, just like enable/disable server.
The same bug is very likely present in 1.4 as well, so the patch will
probably need to be backported eventhough it will not apply as-is.
Thanks to Lukas and Igor for the information they provided to reproduce it.
Such load balance algorithms as roundrobin, leastconn and first will check the
server after being selected with the following condition:
if (!s->maxconn || (!s->nbpend && s->served < srv_dynamic_maxconn(s)))
But static-rr uses the different one in map_get_server_rr() as below:
if (!srv->maxconn || srv->cur_sess < srv_dynamic_maxconn(srv))
After viewing this difference, it is a better choice for static-rr to use the
same check condition as other algorithms.
This change will only affect static-rr. Though all hash algorithms with type
map-based will use the same server map as static-rr, they call another function
map_get_server_hash() to get server.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
Jozef Hovan reported a bug sometimes causing a down server to be
used in url_param hashing mode.
This happens if the following conditions are met :
- the backend contains more than one server with at least two
of different weights
- all servers but one are down
- the server which is not down has a weight which does not divide
all the other ones
Example: 3 servers with 20,20,10, the first one remains up.
The problem is caused by an optimisation in recalc_server_map()
which only fills the first map slot when only one server is up,
because all LB algorithms are optimized to use entry zero when
only one server is up... All but url_param. When doing the modulus,
we can return a position which is greater than zero and use an
entry which still refers to a server which has since been stopped.
One solution could be to optimize the url_param algo to proceed
as the other ones, but the fact that was wrong implies that we
can repeat the same bug later. So let's first correctly initialize
the map in order to avoid that trap.
All files referencing the previous ebtree code were changed to point
to the new one in the ebtree directory. A makefile variable (EBTREE_DIR)
is also available to use files from another directory.
The ability to build the libebtree library temporarily remains disabled
because it can have an impact on some existing toolchains and does not
appear worth it in the medium term if we add support for multi-criteria
stickiness for instance.
The lbprm structure has moved to backend.h, where it should be, and
all algo-specific types and declarations have moved to their specific
files. The proxy struct is now much more readable.
We need to remove hash map accesses out of backend.c if we want to
later support new hash methods. This patch separates the hash computation
method from the server lookup. It leaves the lookup function to lb_map.c
and calls it with the result of the hash.
It was becoming painful to have all the LB algos in backend.c.
Let's move them to their own files. A few hashing functions still
need be broken in two parts, one for the contents and one for the
map position.