I'm working on helping Arnaud update haproxy in Debian, and one of the
package build warnings I received was about "hyphen where a minus sign
was intended" in the man page - details:
http://lintian.debian.org/tags/hyphen-used-as-minus-sign.html
Patch included in my 1.3.20 Debian package is attached.
This can ensure that data is readily available on a socket when
we accept it, but a bug in the kernel ignores the timeout so the
socket can remain pending as long as the client does not talk.
Use with care.
This patch makes stats page about 30% smaller and
"CSS 2.1" + "HTML 4.01 Transitional" compliant.
There should be no visible differences.
Changes:
- add DOCTYPE for HTML 4.01 Transitional
- add missing </ul>
- remove cols=, AFAIK no modern browser support this property and
it prevents validation to pass.
- remove "align: center": there is no such property in css. There is
however "text-align: center" but it is definitely not what we would
like to see here.
- by default align .titre to center
- by default align .td to right
- remove all align=right, no longer necessary
- add class=ac (align center): shorter than "align=center" and use it when
necessary
- remove nowrap from td, instead use "white-space: nowrap" in css
Now stats page passes W3C validators for HTML & CSS. We may consider adding
"validated" icons from www.w3.org. ;)
Released version 1.4-dev4 with the following main changes :
- [DOC] add missing rate_lim and rate_max
- [MAJOR] struct chunk rework
- [MEDIUM] Health check reporting code rework + health logging, v3
- [BUG] check if rise/fall has an argument and it is > 0
- [MINOR] health checks logging unification
- [MINOR] add "description", "node" and show-node"/"show-desc", remove "node-name", v2
- [MINOR] Allow dots in show-node & add "white-space: nowrap" in th.pxname.
- [DOC] Add information about http://haproxy.1wt.eu/contrib.html
- [MINOR] Introduce include/types/counters.h
- [CLEANUP] Move counters to dedicated structures
- [MINOR] Add "clear counters" to clear statistics counters
- [MEDIUM] Collect & provide separate statistics for sockets, v2
- [BUG] Fix NULL pointer dereference in stats_check_uri_auth(), v2
- [MINOR] acl: don't report valid acls as potential mistakes
- [MINOR] Add cut_crlf(), ltrim(), rtrim() and alltrim()
- [MINOR] Add chunk_htmlencode and chunk_asciiencode
- [MINOR] Capture & display more data from health checks, v2
- [BUG] task.c: don't assing last_timer to node-less entries
- [BUG] http stats: large outputs sometimes got some parts chopped off
- [MINOR] backend: export some functions to recount servers
- [MINOR] backend: uninline some LB functions
- [MINOR] include time.h from freq_ctr.h as is uses "now".
- [CLEANUP] backend: move LB algos to individual files
- [MINOR] lb_map: reorder code in order to ease integration of new hash functions
- [CLEANUP] proxy: move last lb-specific bits to their respective files
- [MINOR] backend: separate declarations of LB algos from their lookup method
- [MINOR] backend: reorganize the LB algorithm selection
- [MEDIUM] backend: introduce the "static-rr" LB algorithm
- [MINOR] report list of supported pollers with -vv
- [DOC] log-health-checks is an option, not a directive
- [MEDIUM] new option "independant-streams" to stop updating read timeout on writes
- [BUG] stats: don't call buffer_shutw(), but ->shutw() instead
- [MINOR] stats: strip CR and LF from the input command line
- [BUG] don't refresh timeouts late after detected activity
- [MINOR] stats_dump_errors_to_buffer: use buffer_feed_chunk()
- [MINOR] stats_dump_sess_to_buffer: use buffer_feed_chunk()
- [MINOR] stats: make stats_dump_raw_to_buffer() use buffer_feed_chunk
- [MEDIUM] stats: don't use s->ana_state anymore
- [MINOR] remove now obsolete ana_state from the session struct
- [MEDIUM] stats: make HTTP stats use an I/O handler
- [MEDIUM] stream_int: adjust WAIT_ROOM handling
- [BUG] config: look for ID conflicts in all sockets, not only last ones.
- [MINOR] config: reference file and line with any listener/proxy/server declaration
- [MINOR] config: report places of duplicate names or IDs
- [MINOR] config: add pointer to file name in block/redirect/use_backend/monitor rules
- [MINOR] tools: add a new get_next_id() function
- [MEDIUM] config: automatically find unused IDs for proxies, servers and listeners
- [OPTIM] counters: move some max numbers to the counters struct
- [BUG] counters: fix segfault on missing counters for a listener
- [MEDIUM] backend: implement consistent hashing variation
- [MINOR] acl: add fe_conn, be_conn, queue, avg_queue
- [MINOR] stats: use 'clear counters all' to clear all values
- [MEDIUM] add access restrictions to the stats socket
- [MINOR] buffers: add buffer_feed2() and make buffer_feed() measure string length
- [MINOR] proxy: provide function to retrieve backend/server pointers
- [MINOR] add the "initial weight" to the server struct.
- [MEDIUM] stats: add the "get weight" command to report a server's weight
- [MEDIUM] stats: add the "set weight" command
- [BUILD] add a 'make tags' target
- [MINOR] stats: add support for numeric IDs in set weight/get weight
- [MINOR] stats: use a dedicated state to output static data
- [OPTIM] stats: check free space before trying to print
This alone makes a typical HTML stats dump consume 10% CPU less,
because we avoid doing complex printf calls to drop them later.
Only a few common cases have been checked, those which are very
likely to run for nothing.
It is a bit expensive and complex to use to call buffer_feed()
directly from the request parser, and there are risks that some
output messages are lost in case of buffer full. Since most of
these messages are static, let's have a state dedicated to print
these messages and store them in a specific area shared with the
stats in the session. This both reduces code size and risks of
losing output data.
Krzysztof reported that using names only for get weight/set weight
was not enough because it's still possible to have multiple servers
with the same name (and my test config is one of those). He suggested
to be able to designate them by their unique numeric IDs by prefixing
the ID with a dash.
That way we can have :
set weight #120/#2
as well as
get weight static/srv1 10
Capture & display more data from health checks, like
strerror(errno) for L4 failed checks or a first line
from a response for L7 successes/failed checks.
Non ascii or control characters are masked with
chunk_htmlencode() (html stats) or chunk_asciiencode() (logs).
Add two functions to encode input chunk replacing
non-printable, non ascii or special characters
with:
"&#%u;" - chunk_htmlencode
"<%02X>" - chunk_asciiencode
Above functions should be used when adding strings, received
from possible unsafe sources, to html stats or logs.
int get_backend_server(const char *bk_name, const char *sv_name,
struct proxy **bk, struct server **sv);
This function scans the list of backends and servers to retrieve the first
backend and the first server with the given names, and sets them in both
parameters. It returns zero if either is not found, or non-zero and sets
the ones it did not found to NULL. If a NULL pointer is passed for the
backend, only the pointer to the server will be updated.
The stats socket can now run at 3 different levels :
- user
- operator (default one)
- admin
These levels are used to restrict access to some information
and commands. Only the admin can clear all stats. A user cannot
clear anything nor access sensible data such as sessions or
errors.
The most common use of "clear counters" should be to only clear
max values without affecting cumulated values, for instance,
after an incident. So we change "clear counters" to only clear
max values, and add "clear counters all" to clear all counters.
I noticed that in __eb32_insert , if the tree is empty
(root->b[EB_LEFT] == NULL) , the node.bit is not defined.
However in __task_queue there are checks:
- if (last_timer->node.bit < 0)
- if (task->wq.node.bit < last_timer->node.bit)
which might rely upon an undefined value.
This is how I see it:
1. We insert eb32_node in an empty wait queue tree for a task (called by
process_runnable_tasks() ):
Inserting into empty wait queue &task->wq = 0x72a87c8, last_timer
pointer: (nil)
2. Then, we set the last timer to the same address:
Setting last_timer: (nil) to: 0x72a87c8
3. We get a new task to be inserted in the queue (again called by
process_runnable_tasks()) , before the __task_unlink_wq() is called for
the previous task.
4. At this point, we still have last_timer set to 0x72a87c8 , but since
it was inserted in an empty tree, it doesn't have node.bit and the
values above get dereferenced with undefined value.
The bug has no effect right now because the check for equality is still
made, so the next timer will still be queued at the right place anyway,
without any possible side-effect. But it's a pending bug waiting for a
small change somewhere to strike.
Iliya Polihronov
These ACLs are used to check the number of active connections on the
frontend, backend or in a backend's queue. The avg_queue returns the
average number of queued connections per server, and for this, divides
the total number of queued connections by the number of alive servers.
The dst_conn ACL has been slightly changed to more reflect its name and
original usage, which is to return the number of connections on the
destination address/port (the socket) and not the whole frontend.
Consistent hashing provides some interesting advantages over common
hashing. It avoids full redistribution in case of a server failure,
or when expanding the farm. This has a cost however, the hashing is
far from being perfect, as we associate a server to a request by
searching the server with the closest key in a tree. Since servers
appear multiple times based on their weights, it is recommended to
use weights larger than approximately 10-20 in order to smoothen
the distribution a bit.
In some cases, playing with weights will be the only solution to
make a server appear more often and increase chances of being picked,
so stats are very important with consistent hashing.
In order to indicate the type of hashing, use :
hash-type map-based (default, old one)
hash-type consistent (new one)
Consistent hashing can make sense in a cache farm, in order not
to redistribute everyone when a cache changes state. It could also
probably be used for long sessions such as terminal sessions, though
that has not be attempted yet.
More details on this method of hashing here :
http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
Commit 404e8ab461 introduced
smart checking for stupid acl typos. However, now haproxy shows
the warning even for valid acls, like this one:
acl Cookie-X-NoAccel hdr_reg(cookie) (^|\ |;)X-NoAccel=1(;|$)
If a frontend does not set 'option socket-stats', a 'clear counters'
on the stats socket could segfault because li->counters is NULL. The
correct fix is to check for NULL before as this is a valid situation.
Recent "struct chunk rework" introduced a NULL pointer dereference
and now haproxy segfaults if auth is required for stats but not found.
The reason is that size_t cannot store negative values, but current
code assumes that "len < 0" == uninitialized.
This patch fixes it.
There are a few remaining max values that need to move to counters.
Also, the counters are more often used than some config information,
so get them closer to the other useful struct members for better cache
efficiency.
Until now it was required that every custom ID was above 1000 in order to
avoid conflicts. Now we have the list of all assigned IDs and can automatically
pick the first unused one. This means that it is perfectly possible to interleave
automatic IDs with persistent IDs and the parser will automatically allocate
unused values starting with 1.
When a name or ID conflict is detected, it is sometimes useful to know
where the other one was declared. Now that we have this information,
report it in error messages.
This patch allows to collect & provide separate statistics for each socket.
It can be very useful if you would like to distinguish between traffic
generate by local and remote users or between different types of remote
clients (peerings, domestic, foreign).
Currently no "Session rate" is supported, but adding it should be possible
if we found it useful.
Doing this, we can remove the last BF_HIJACK user and remove
produce_content(). s->data_source could also be removed but
it is currently used to detect if the stats or a server was
used.
The stats handler used to store internal states in s->ana_state. Now
we only rely on si->st0 in which we can store as many states as we
have possible outputs. This cleans up the stats code a lot and makes
it more maintainable. It has also reduced code size by a few hundred
bytes.
We can simplify the code in the stats functions using buffer_feed_chunk()
instead of buffer_write_chunk(). Let's start with this function. This
patch also fixed an issue where we could dump past the end of the capture
buffer if it is shorter than the captured request.
In old versions, before 1.3.16, we had to refresh the timeouts after
each call to process_session() because the stream socket handler did
not do it. Now that the sockets can exchange data for a long period
without calling process_session(), we can detect an old activity and
refresh a timeout long after the last activity, causing too late a
detection of some timeouts.
The fix simply consists in not checking for activity anymore in
stream_sock_data_finish() but only set a timeout if it was not
previously set.