mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2024-12-29 08:02:08 +00:00
BUG/MINOR: stream: make the call_rate only count the no-progress calls
We have an anti-looping protection in process_stream() that detects bugs that used to affect a few filters like compression in the past which sometimes forgot to handle a read0 or a particular error, leaving a thread looping at 100% CPU forever. When such a condition is detected, an alert it emitted and the process is killed so that it can be replaced by a sane one: [ALERT] (19061) : A bogus STREAM [0x274abe0] is spinning at 2057156 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x274abe0,3 src=unix fe=MASTER be=MASTER dst=<MCLI> txn=(nil),0 txn.req=-,0 txn.rsp=-,0 rqf=c02000 rqa=10000 rpf=88000021 rpa=8000000 sif=EST,40008 sib=DIS,84018 af=(nil),0 csf=0x274ab90,8600 ab=0x272fd40,1 csb=(nil),0 cof=0x25d5d80,1300:PASS(0x274aaf0)/RAW((nil))/unix_stream(9) cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0) filters={}] call trace(11): | 0x4dbaab [c7 04 25 01 00 00 00 00]: stream_dump_and_crash+0x17b/0x1b4 | 0x4df31f [e9 bd c8 ff ff 49 83 7c]: process_stream+0x382f/0x53a3 (...) One problem with this detection is that it used to only count the call rate because we weren't sure how to make it more accurate, but the threshold was high enough to prevent accidental false positives. There is actually one case that manages to trigger it, which is when sending huge amounts of requests pipelined on the master CLI. Some short requests such as "show version" are sufficient to be handled extremely fast and to cause a wake up of an analyser to parse the next request, then an applet to handle it, back and forth. But this condition is not an error, since some data are being forwarded by the stream, and it's easy to detect it. This patch modifies the detection so that update_freq_ctr() only applies to calls made without CF_READ_PARTIAL nor CF_WRITE_PARTIAL set on any of the channels, which really indicates that nothing is happening at all. This is greatly sufficient and extremely effective, as the call above is still caught (shutr being ignored by an analyser) while a loop on the master CLI now has no effect. The "call_rate" field in the detailed "show sess" output will now be much lower, except for bogus streams, which may help spot them. This field is only there for developers anyway so it's pretty fine to slightly adjust its meaning. This patch could be backported to stable versions in case of reports of such an issue, but as that's unlikely, it's not really needed.
This commit is contained in:
parent
a4e4d66f70
commit
6c539c4b8c
@ -143,7 +143,7 @@ struct stream {
|
||||
struct list back_refs; /* list of users tracking this stream */
|
||||
struct buffer_wait buffer_wait; /* position in the list of objects waiting for a buffer */
|
||||
|
||||
struct freq_ctr call_rate; /* stream task call rate */
|
||||
struct freq_ctr call_rate; /* stream task call rate without making progress */
|
||||
|
||||
short store_count;
|
||||
enum obj_type obj_type; /* object type == OBJ_TYPE_STREAM */
|
||||
|
11
src/stream.c
11
src/stream.c
@ -1628,9 +1628,14 @@ struct task *process_stream(struct task *t, void *context, unsigned int state)
|
||||
si_sync_recv(si_f);
|
||||
si_sync_recv(si_b);
|
||||
|
||||
rate = update_freq_ctr(&s->call_rate, 1);
|
||||
if (rate >= 100000 && s->call_rate.prev_ctr) { // make sure to wait at least a full second
|
||||
stream_dump_and_crash(&s->obj_type, read_freq_ctr(&s->call_rate));
|
||||
/* Let's check if we're looping without making any progress, e.g. due
|
||||
* to a bogus analyser or the fact that we're ignoring a read0. The
|
||||
* call_rate counter only counts calls with no progress made.
|
||||
*/
|
||||
if (!((req->flags | res->flags) & (CF_READ_PARTIAL|CF_WRITE_PARTIAL))) {
|
||||
rate = update_freq_ctr(&s->call_rate, 1);
|
||||
if (rate >= 100000 && s->call_rate.prev_ctr) // make sure to wait at least a full second
|
||||
stream_dump_and_crash(&s->obj_type, read_freq_ctr(&s->call_rate));
|
||||
}
|
||||
|
||||
/* this data may be no longer valid, clear it */
|
||||
|
Loading…
Reference in New Issue
Block a user