haproxy/include/proto/session.h

301 lines
9.4 KiB
C
Raw Normal View History

2006-06-15 19:48:13 +00:00
/*
* include/proto/session.h
* This file defines everything related to sessions.
*
* Copyright (C) 2000-2010 Willy Tarreau - w@1wt.eu
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation, version 2.1
* exclusively.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
2006-06-15 19:48:13 +00:00
#ifndef _PROTO_SESSION_H
#define _PROTO_SESSION_H
#include <common/config.h>
#include <common/memory.h>
#include <types/session.h>
MAJOR: session: only wake up as many sessions as available buffers permit We've already experimented with three wake up algorithms when releasing buffers : the first naive one used to wake up far too many sessions, causing many of them not to get any buffer. The second approach which was still in use prior to this patch consisted in waking up either 1 or 2 sessions depending on the number of FDs we had released. And this was still inaccurate. The third one tried to cover the accuracy issues of the second and took into consideration the number of FDs the sessions would be willing to use, but most of the time we ended up waking up too many of them for nothing, or deadlocking by lack of buffers. This patch completely removes the need to allocate two buffers at once. Instead it splits allocations into critical and non-critical ones and implements a reserve in the pool for this. The deadlock situation happens when all buffers are be allocated for requests pending in a maxconn-limited server queue, because then there's no more way to allocate buffers for responses, and these responses are critical to release the servers's connection in order to release the pending requests. In fact maxconn on a server creates a dependence between sessions and particularly between oldest session's responses and latest session's requests. Thus, it is mandatory to get a free buffer for a response in order to release a server connection which will permit to release a request buffer. Since we definitely have non-symmetrical buffers, we need to implement this logic in the buffer allocation mechanism. What this commit does is implement a reserve of buffers which can only be allocated for responses and that will never be allocated for requests. This is made possible by the requester indicating how much margin it wants to leave after the allocation succeeds. Thus it is a cooperative allocation mechanism : the requester (process_session() in general) prefers not to get a buffer in order to respect other's need for response buffers. The session management code always knows if a buffer will be used for requests or responses, so that is not difficult : - either there's an applet on the initiator side and we really need the request buffer (since currently the applet is called in the context of the session) - or we have a connection and we really need the response buffer (in order to support building and sending an error message back) This reserve ensures that we don't take all allocatable buffers for requests waiting in a queue. The downside is that all the extra buffers are really allocated to ensure they can be allocated. But with small values it is not an issue. With this change, we don't observe any more deadlocks even when running with maxconn 1 on a server under severely constrained memory conditions. The code becomes a bit tricky, it relies on the scheduler's run queue to estimate how many sessions are already expected to run so that it doesn't wake up everyone with too few resources. A better solution would probably consist in having two queues, one for urgent requests and one for normal requests. A failed allocation for a session dealing with an error, a connection event, or the need for a response (or request when there's an applet on the left) would go to the urgent request queue, while other requests would go to the other queue. Urgent requests would be served from 1 entry in the pool, while the regular ones would be served only according to the reserve. Despite not yet having this, it works remarkably well. This mechanism is quite efficient, we don't perform too many wake up calls anymore. For 1 million sessions elapsed during massive memory contention, we observe about 4.5M calls to process_session() compared to 4.0M without memory constraints. Previously we used to observe up to 16M calls, which rougly means 12M failures. During a test run under high memory constraints (limit enforced to 27 MB instead of the 58 MB normally needed), performance used to drop by 53% prior to this patch. Now with this patch instead it *increases* by about 1.5%. The best effect of this change is that by limiting the memory usage to about 2/3 to 3/4 of what is needed by default, it's possible to increase performance by up to about 18% mainly due to the fact that pools are reused more often and remain hot in the CPU cache (observed on regular HTTP traffic with 20k objects, buffers.limit = maxconn/10, buffers.reserve = limit/2). Below is an example of scenario which used to cause a deadlock previously : - connection is received - two buffers are allocated in process_session() then released - one is allocated when receiving an HTTP request - the second buffer is allocated then released in process_session() for request parsing then connection establishment. - poll() says we can send, so the request buffer is sent and released - process session gets notified that the connection is now established and allocates two buffers then releases them - all other sessions do the same till one cannot get the request buffer without hitting the margin - and now the server responds. stream_interface allocates the response buffer and manages to get it since it's higher priority being for a response. - but process_session() cannot allocate the request buffer anymore => We could end up with all buffers used by responses so that none may be allocated for a request in process_session(). When the applet processing leaves the session context, the test will have to be changed so that we always allocate a response buffer regardless of the left side (eg: H2->H1 gateway). A final improvement would consists in being able to only retry the failed I/O operation without waking up a task, but to date all experiments to achieve this have proven not to be reliable enough.
2014-11-27 00:11:56 +00:00
#include <proto/fd.h>
#include <proto/freq_ctr.h>
#include <proto/stick_table.h>
MAJOR: session: only wake up as many sessions as available buffers permit We've already experimented with three wake up algorithms when releasing buffers : the first naive one used to wake up far too many sessions, causing many of them not to get any buffer. The second approach which was still in use prior to this patch consisted in waking up either 1 or 2 sessions depending on the number of FDs we had released. And this was still inaccurate. The third one tried to cover the accuracy issues of the second and took into consideration the number of FDs the sessions would be willing to use, but most of the time we ended up waking up too many of them for nothing, or deadlocking by lack of buffers. This patch completely removes the need to allocate two buffers at once. Instead it splits allocations into critical and non-critical ones and implements a reserve in the pool for this. The deadlock situation happens when all buffers are be allocated for requests pending in a maxconn-limited server queue, because then there's no more way to allocate buffers for responses, and these responses are critical to release the servers's connection in order to release the pending requests. In fact maxconn on a server creates a dependence between sessions and particularly between oldest session's responses and latest session's requests. Thus, it is mandatory to get a free buffer for a response in order to release a server connection which will permit to release a request buffer. Since we definitely have non-symmetrical buffers, we need to implement this logic in the buffer allocation mechanism. What this commit does is implement a reserve of buffers which can only be allocated for responses and that will never be allocated for requests. This is made possible by the requester indicating how much margin it wants to leave after the allocation succeeds. Thus it is a cooperative allocation mechanism : the requester (process_session() in general) prefers not to get a buffer in order to respect other's need for response buffers. The session management code always knows if a buffer will be used for requests or responses, so that is not difficult : - either there's an applet on the initiator side and we really need the request buffer (since currently the applet is called in the context of the session) - or we have a connection and we really need the response buffer (in order to support building and sending an error message back) This reserve ensures that we don't take all allocatable buffers for requests waiting in a queue. The downside is that all the extra buffers are really allocated to ensure they can be allocated. But with small values it is not an issue. With this change, we don't observe any more deadlocks even when running with maxconn 1 on a server under severely constrained memory conditions. The code becomes a bit tricky, it relies on the scheduler's run queue to estimate how many sessions are already expected to run so that it doesn't wake up everyone with too few resources. A better solution would probably consist in having two queues, one for urgent requests and one for normal requests. A failed allocation for a session dealing with an error, a connection event, or the need for a response (or request when there's an applet on the left) would go to the urgent request queue, while other requests would go to the other queue. Urgent requests would be served from 1 entry in the pool, while the regular ones would be served only according to the reserve. Despite not yet having this, it works remarkably well. This mechanism is quite efficient, we don't perform too many wake up calls anymore. For 1 million sessions elapsed during massive memory contention, we observe about 4.5M calls to process_session() compared to 4.0M without memory constraints. Previously we used to observe up to 16M calls, which rougly means 12M failures. During a test run under high memory constraints (limit enforced to 27 MB instead of the 58 MB normally needed), performance used to drop by 53% prior to this patch. Now with this patch instead it *increases* by about 1.5%. The best effect of this change is that by limiting the memory usage to about 2/3 to 3/4 of what is needed by default, it's possible to increase performance by up to about 18% mainly due to the fact that pools are reused more often and remain hot in the CPU cache (observed on regular HTTP traffic with 20k objects, buffers.limit = maxconn/10, buffers.reserve = limit/2). Below is an example of scenario which used to cause a deadlock previously : - connection is received - two buffers are allocated in process_session() then released - one is allocated when receiving an HTTP request - the second buffer is allocated then released in process_session() for request parsing then connection establishment. - poll() says we can send, so the request buffer is sent and released - process session gets notified that the connection is now established and allocates two buffers then releases them - all other sessions do the same till one cannot get the request buffer without hitting the margin - and now the server responds. stream_interface allocates the response buffer and manages to get it since it's higher priority being for a response. - but process_session() cannot allocate the request buffer anymore => We could end up with all buffers used by responses so that none may be allocated for a request in process_session(). When the applet processing leaves the session context, the test will have to be changed so that we always allocate a response buffer regardless of the left side (eg: H2->H1 gateway). A final improvement would consists in being able to only retry the failed I/O operation without waking up a task, but to date all experiments to achieve this have proven not to be reliable enough.
2014-11-27 00:11:56 +00:00
#include <proto/task.h>
2006-06-15 19:48:13 +00:00
extern struct pool_head *pool2_session;
extern struct list sessions;
extern struct list buffer_wq;
extern struct data_cb sess_conn_cb;
int session_accept(struct listener *l, int cfd, struct sockaddr_storage *addr);
2006-06-15 19:48:13 +00:00
/* perform minimal intializations, report 0 in case of error, 1 if OK. */
int init_session();
/* kill a session and set the termination flags to <why> (one of SN_ERR_*) */
void session_shutdown(struct session *session, int why);
void session_process_counters(struct session *s);
void sess_change_server(struct session *sess, struct server *newsrv);
struct task *process_session(struct task *t);
void default_srv_error(struct session *s, struct stream_interface *si);
struct stkctr *smp_fetch_sc_stkctr(struct session *l4, const struct arg *args, const char *kw);
int parse_track_counters(char **args, int *arg,
int section_type, struct proxy *curpx,
struct track_ctr_prm *prm,
struct proxy *defpx, char **err);
/* Update the session's backend and server time stats */
void session_update_time_stats(struct session *s);
MAJOR: session: only wake up as many sessions as available buffers permit We've already experimented with three wake up algorithms when releasing buffers : the first naive one used to wake up far too many sessions, causing many of them not to get any buffer. The second approach which was still in use prior to this patch consisted in waking up either 1 or 2 sessions depending on the number of FDs we had released. And this was still inaccurate. The third one tried to cover the accuracy issues of the second and took into consideration the number of FDs the sessions would be willing to use, but most of the time we ended up waking up too many of them for nothing, or deadlocking by lack of buffers. This patch completely removes the need to allocate two buffers at once. Instead it splits allocations into critical and non-critical ones and implements a reserve in the pool for this. The deadlock situation happens when all buffers are be allocated for requests pending in a maxconn-limited server queue, because then there's no more way to allocate buffers for responses, and these responses are critical to release the servers's connection in order to release the pending requests. In fact maxconn on a server creates a dependence between sessions and particularly between oldest session's responses and latest session's requests. Thus, it is mandatory to get a free buffer for a response in order to release a server connection which will permit to release a request buffer. Since we definitely have non-symmetrical buffers, we need to implement this logic in the buffer allocation mechanism. What this commit does is implement a reserve of buffers which can only be allocated for responses and that will never be allocated for requests. This is made possible by the requester indicating how much margin it wants to leave after the allocation succeeds. Thus it is a cooperative allocation mechanism : the requester (process_session() in general) prefers not to get a buffer in order to respect other's need for response buffers. The session management code always knows if a buffer will be used for requests or responses, so that is not difficult : - either there's an applet on the initiator side and we really need the request buffer (since currently the applet is called in the context of the session) - or we have a connection and we really need the response buffer (in order to support building and sending an error message back) This reserve ensures that we don't take all allocatable buffers for requests waiting in a queue. The downside is that all the extra buffers are really allocated to ensure they can be allocated. But with small values it is not an issue. With this change, we don't observe any more deadlocks even when running with maxconn 1 on a server under severely constrained memory conditions. The code becomes a bit tricky, it relies on the scheduler's run queue to estimate how many sessions are already expected to run so that it doesn't wake up everyone with too few resources. A better solution would probably consist in having two queues, one for urgent requests and one for normal requests. A failed allocation for a session dealing with an error, a connection event, or the need for a response (or request when there's an applet on the left) would go to the urgent request queue, while other requests would go to the other queue. Urgent requests would be served from 1 entry in the pool, while the regular ones would be served only according to the reserve. Despite not yet having this, it works remarkably well. This mechanism is quite efficient, we don't perform too many wake up calls anymore. For 1 million sessions elapsed during massive memory contention, we observe about 4.5M calls to process_session() compared to 4.0M without memory constraints. Previously we used to observe up to 16M calls, which rougly means 12M failures. During a test run under high memory constraints (limit enforced to 27 MB instead of the 58 MB normally needed), performance used to drop by 53% prior to this patch. Now with this patch instead it *increases* by about 1.5%. The best effect of this change is that by limiting the memory usage to about 2/3 to 3/4 of what is needed by default, it's possible to increase performance by up to about 18% mainly due to the fact that pools are reused more often and remain hot in the CPU cache (observed on regular HTTP traffic with 20k objects, buffers.limit = maxconn/10, buffers.reserve = limit/2). Below is an example of scenario which used to cause a deadlock previously : - connection is received - two buffers are allocated in process_session() then released - one is allocated when receiving an HTTP request - the second buffer is allocated then released in process_session() for request parsing then connection establishment. - poll() says we can send, so the request buffer is sent and released - process session gets notified that the connection is now established and allocates two buffers then releases them - all other sessions do the same till one cannot get the request buffer without hitting the margin - and now the server responds. stream_interface allocates the response buffer and manages to get it since it's higher priority being for a response. - but process_session() cannot allocate the request buffer anymore => We could end up with all buffers used by responses so that none may be allocated for a request in process_session(). When the applet processing leaves the session context, the test will have to be changed so that we always allocate a response buffer regardless of the left side (eg: H2->H1 gateway). A final improvement would consists in being able to only retry the failed I/O operation without waking up a task, but to date all experiments to achieve this have proven not to be reliable enough.
2014-11-27 00:11:56 +00:00
void __session_offer_buffers(int rqlimit);
static inline void session_offer_buffers();
int session_alloc_work_buffer(struct session *s);
void session_release_buffers(struct session *s);
int session_alloc_recv_buffer(struct channel *chn);
/* sets the stick counter's entry pointer */
static inline void stkctr_set_entry(struct stkctr *stkctr, struct stksess *entry)
{
stkctr->entry = caddr_from_ptr(entry, 0);
}
/* returns the entry pointer from a stick counter */
static inline struct stksess *stkctr_entry(struct stkctr *stkctr)
{
return caddr_to_ptr(stkctr->entry);
}
/* returns the two flags from a stick counter */
static inline unsigned int stkctr_flags(struct stkctr *stkctr)
{
return caddr_to_data(stkctr->entry);
}
/* sets up to two flags at a time on a composite address */
static inline void stkctr_set_flags(struct stkctr *stkctr, unsigned int flags)
{
stkctr->entry = caddr_set_flags(stkctr->entry, flags);
}
/* returns the two flags from a stick counter */
static inline void stkctr_clr_flags(struct stkctr *stkctr, unsigned int flags)
{
stkctr->entry = caddr_clr_flags(stkctr->entry, flags);
}
/* Remove the refcount from the session to the tracked counters, and clear the
* pointer to ensure this is only performed once. The caller is responsible for
* ensuring that the pointer is valid first.
*/
static inline void session_store_counters(struct session *s)
{
void *ptr;
int i;
for (i = 0; i < MAX_SESS_STKCTR; i++) {
if (!stkctr_entry(&s->stkctr[i]))
continue;
ptr = stktable_data_ptr(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]), STKTABLE_DT_CONN_CUR);
if (ptr)
stktable_data_cast(ptr, conn_cur)--;
stkctr_entry(&s->stkctr[i])->ref_cnt--;
stksess_kill_if_expired(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]));
stkctr_set_entry(&s->stkctr[i], NULL);
}
}
BUG/MEDIUM: counters: flush content counters after each request One year ago, commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking L7 information") brought support for tracking L7 information in tcp-request content rules. Two years earlier, commit 0a4838c ("[MEDIUM] session-counters: correctly unbind the counters tracked by the backend") used to flush the backend counters after processing a request. While that earliest patch was correct at the time, it became wrong after the second patch was merged. The code does what it says, but the concept is flawed. "TCP request content" rules are evaluated for each HTTP request over a single connection. So if such a rule in the frontend decides to track any L7 information or to track L4 information when an L7 condition matches, then it is applied to all requests over the same connection even if they don't match. This means that a rule such as : tcp-request content track-sc0 src if { path /index.html } will count one request for index.html, and another one for each of the objects present on this page that are fetched over the same connection which sent the initial matching request. Worse, it is possible to make the code do stupid things by using multiple counters: tcp-request content track-sc0 src if { path /foo } tcp-request content track-sc1 src if { path /bar } Just sending two requests first, one with /foo, one with /bar, shows twice the number of requests for all subsequent requests. Just because both of them persist after the end of the request. So the decision to flush backend-tracked counters was not the correct one. In practice, what is important is to flush countent-based rules since they are the ones evaluated for each request. Doing so requires new flags in the session however, to keep track of which stick-counter was tracked by what ruleset. A later change might make this easier to maintain over time. This bug is 1.5-specific, no backport to stable is needed.
2014-01-28 20:40:28 +00:00
/* Remove the refcount from the session counters tracked at the content level if
* any, and clear the pointer to ensure this is only performed once. The caller
* is responsible for ensuring that the pointer is valid first.
*/
BUG/MEDIUM: counters: flush content counters after each request One year ago, commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking L7 information") brought support for tracking L7 information in tcp-request content rules. Two years earlier, commit 0a4838c ("[MEDIUM] session-counters: correctly unbind the counters tracked by the backend") used to flush the backend counters after processing a request. While that earliest patch was correct at the time, it became wrong after the second patch was merged. The code does what it says, but the concept is flawed. "TCP request content" rules are evaluated for each HTTP request over a single connection. So if such a rule in the frontend decides to track any L7 information or to track L4 information when an L7 condition matches, then it is applied to all requests over the same connection even if they don't match. This means that a rule such as : tcp-request content track-sc0 src if { path /index.html } will count one request for index.html, and another one for each of the objects present on this page that are fetched over the same connection which sent the initial matching request. Worse, it is possible to make the code do stupid things by using multiple counters: tcp-request content track-sc0 src if { path /foo } tcp-request content track-sc1 src if { path /bar } Just sending two requests first, one with /foo, one with /bar, shows twice the number of requests for all subsequent requests. Just because both of them persist after the end of the request. So the decision to flush backend-tracked counters was not the correct one. In practice, what is important is to flush countent-based rules since they are the ones evaluated for each request. Doing so requires new flags in the session however, to keep track of which stick-counter was tracked by what ruleset. A later change might make this easier to maintain over time. This bug is 1.5-specific, no backport to stable is needed.
2014-01-28 20:40:28 +00:00
static inline void session_stop_content_counters(struct session *s)
{
void *ptr;
int i;
for (i = 0; i < MAX_SESS_STKCTR; i++) {
if (!stkctr_entry(&s->stkctr[i]))
continue;
if (!(stkctr_flags(&s->stkctr[i]) & STKCTR_TRACK_CONTENT))
continue;
ptr = stktable_data_ptr(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]), STKTABLE_DT_CONN_CUR);
if (ptr)
stktable_data_cast(ptr, conn_cur)--;
stkctr_entry(&s->stkctr[i])->ref_cnt--;
stksess_kill_if_expired(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]));
stkctr_set_entry(&s->stkctr[i], NULL);
}
}
/* Increase total and concurrent connection count for stick entry <ts> of table
* <t>. The caller is responsible for ensuring that <t> and <ts> are valid
* pointers, and for calling this only once per connection.
*/
static inline void session_start_counters(struct stktable *t, struct stksess *ts)
{
void *ptr;
ptr = stktable_data_ptr(t, ts, STKTABLE_DT_CONN_CUR);
if (ptr)
stktable_data_cast(ptr, conn_cur)++;
ptr = stktable_data_ptr(t, ts, STKTABLE_DT_CONN_CNT);
if (ptr)
stktable_data_cast(ptr, conn_cnt)++;
ptr = stktable_data_ptr(t, ts, STKTABLE_DT_CONN_RATE);
if (ptr)
update_freq_ctr_period(&stktable_data_cast(ptr, conn_rate),
t->data_arg[STKTABLE_DT_CONN_RATE].u, 1);
if (tick_isset(t->expire))
ts->expire = tick_add(now_ms, MS_TO_TICKS(t->expire));
}
/* Enable tracking of session counters as <stkctr> on stksess <ts>. The caller is
* responsible for ensuring that <t> and <ts> are valid pointers. Some controls
* are performed to ensure the state can still change.
*/
static inline void session_track_stkctr(struct stkctr *ctr, struct stktable *t, struct stksess *ts)
{
if (stkctr_entry(ctr))
return;
ts->ref_cnt++;
ctr->table = t;
stkctr_set_entry(ctr, ts);
session_start_counters(t, ts);
}
/* Increase the number of cumulated HTTP requests in the tracked counters */
static void inline session_inc_http_req_ctr(struct session *s)
{
void *ptr;
int i;
for (i = 0; i < MAX_SESS_STKCTR; i++) {
if (!stkctr_entry(&s->stkctr[i]))
continue;
ptr = stktable_data_ptr(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]), STKTABLE_DT_HTTP_REQ_CNT);
if (ptr)
stktable_data_cast(ptr, http_req_cnt)++;
ptr = stktable_data_ptr(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]), STKTABLE_DT_HTTP_REQ_RATE);
if (ptr)
update_freq_ctr_period(&stktable_data_cast(ptr, http_req_rate),
s->stkctr[i].table->data_arg[STKTABLE_DT_HTTP_REQ_RATE].u, 1);
}
}
/* Increase the number of cumulated HTTP requests in the backend's tracked counters */
static void inline session_inc_be_http_req_ctr(struct session *s)
{
void *ptr;
int i;
for (i = 0; i < MAX_SESS_STKCTR; i++) {
if (!stkctr_entry(&s->stkctr[i]))
continue;
if (!(stkctr_flags(&s->stkctr[i]) & STKCTR_TRACK_BACKEND))
continue;
ptr = stktable_data_ptr(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]), STKTABLE_DT_HTTP_REQ_CNT);
if (ptr)
stktable_data_cast(ptr, http_req_cnt)++;
ptr = stktable_data_ptr(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]), STKTABLE_DT_HTTP_REQ_RATE);
if (ptr)
update_freq_ctr_period(&stktable_data_cast(ptr, http_req_rate),
s->stkctr[i].table->data_arg[STKTABLE_DT_HTTP_REQ_RATE].u, 1);
}
}
/* Increase the number of cumulated failed HTTP requests in the tracked
* counters. Only 4xx requests should be counted here so that we can
* distinguish between errors caused by client behaviour and other ones.
* Note that even 404 are interesting because they're generally caused by
* vulnerability scans.
*/
static void inline session_inc_http_err_ctr(struct session *s)
{
void *ptr;
int i;
for (i = 0; i < MAX_SESS_STKCTR; i++) {
if (!stkctr_entry(&s->stkctr[i]))
continue;
ptr = stktable_data_ptr(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]), STKTABLE_DT_HTTP_ERR_CNT);
if (ptr)
stktable_data_cast(ptr, http_err_cnt)++;
ptr = stktable_data_ptr(s->stkctr[i].table, stkctr_entry(&s->stkctr[i]), STKTABLE_DT_HTTP_ERR_RATE);
if (ptr)
update_freq_ctr_period(&stktable_data_cast(ptr, http_err_rate),
s->stkctr[i].table->data_arg[STKTABLE_DT_HTTP_ERR_RATE].u, 1);
}
}
static void inline session_add_srv_conn(struct session *sess, struct server *srv)
{
sess->srv_conn = srv;
LIST_ADD(&srv->actconns, &sess->by_srv);
}
static void inline session_del_srv_conn(struct session *sess)
{
if (!sess->srv_conn)
return;
sess->srv_conn = NULL;
LIST_DEL(&sess->by_srv);
}
static void inline session_init_srv_conn(struct session *sess)
{
sess->srv_conn = NULL;
LIST_INIT(&sess->by_srv);
}
MAJOR: session: only wake up as many sessions as available buffers permit We've already experimented with three wake up algorithms when releasing buffers : the first naive one used to wake up far too many sessions, causing many of them not to get any buffer. The second approach which was still in use prior to this patch consisted in waking up either 1 or 2 sessions depending on the number of FDs we had released. And this was still inaccurate. The third one tried to cover the accuracy issues of the second and took into consideration the number of FDs the sessions would be willing to use, but most of the time we ended up waking up too many of them for nothing, or deadlocking by lack of buffers. This patch completely removes the need to allocate two buffers at once. Instead it splits allocations into critical and non-critical ones and implements a reserve in the pool for this. The deadlock situation happens when all buffers are be allocated for requests pending in a maxconn-limited server queue, because then there's no more way to allocate buffers for responses, and these responses are critical to release the servers's connection in order to release the pending requests. In fact maxconn on a server creates a dependence between sessions and particularly between oldest session's responses and latest session's requests. Thus, it is mandatory to get a free buffer for a response in order to release a server connection which will permit to release a request buffer. Since we definitely have non-symmetrical buffers, we need to implement this logic in the buffer allocation mechanism. What this commit does is implement a reserve of buffers which can only be allocated for responses and that will never be allocated for requests. This is made possible by the requester indicating how much margin it wants to leave after the allocation succeeds. Thus it is a cooperative allocation mechanism : the requester (process_session() in general) prefers not to get a buffer in order to respect other's need for response buffers. The session management code always knows if a buffer will be used for requests or responses, so that is not difficult : - either there's an applet on the initiator side and we really need the request buffer (since currently the applet is called in the context of the session) - or we have a connection and we really need the response buffer (in order to support building and sending an error message back) This reserve ensures that we don't take all allocatable buffers for requests waiting in a queue. The downside is that all the extra buffers are really allocated to ensure they can be allocated. But with small values it is not an issue. With this change, we don't observe any more deadlocks even when running with maxconn 1 on a server under severely constrained memory conditions. The code becomes a bit tricky, it relies on the scheduler's run queue to estimate how many sessions are already expected to run so that it doesn't wake up everyone with too few resources. A better solution would probably consist in having two queues, one for urgent requests and one for normal requests. A failed allocation for a session dealing with an error, a connection event, or the need for a response (or request when there's an applet on the left) would go to the urgent request queue, while other requests would go to the other queue. Urgent requests would be served from 1 entry in the pool, while the regular ones would be served only according to the reserve. Despite not yet having this, it works remarkably well. This mechanism is quite efficient, we don't perform too many wake up calls anymore. For 1 million sessions elapsed during massive memory contention, we observe about 4.5M calls to process_session() compared to 4.0M without memory constraints. Previously we used to observe up to 16M calls, which rougly means 12M failures. During a test run under high memory constraints (limit enforced to 27 MB instead of the 58 MB normally needed), performance used to drop by 53% prior to this patch. Now with this patch instead it *increases* by about 1.5%. The best effect of this change is that by limiting the memory usage to about 2/3 to 3/4 of what is needed by default, it's possible to increase performance by up to about 18% mainly due to the fact that pools are reused more often and remain hot in the CPU cache (observed on regular HTTP traffic with 20k objects, buffers.limit = maxconn/10, buffers.reserve = limit/2). Below is an example of scenario which used to cause a deadlock previously : - connection is received - two buffers are allocated in process_session() then released - one is allocated when receiving an HTTP request - the second buffer is allocated then released in process_session() for request parsing then connection establishment. - poll() says we can send, so the request buffer is sent and released - process session gets notified that the connection is now established and allocates two buffers then releases them - all other sessions do the same till one cannot get the request buffer without hitting the margin - and now the server responds. stream_interface allocates the response buffer and manages to get it since it's higher priority being for a response. - but process_session() cannot allocate the request buffer anymore => We could end up with all buffers used by responses so that none may be allocated for a request in process_session(). When the applet processing leaves the session context, the test will have to be changed so that we always allocate a response buffer regardless of the left side (eg: H2->H1 gateway). A final improvement would consists in being able to only retry the failed I/O operation without waking up a task, but to date all experiments to achieve this have proven not to be reliable enough.
2014-11-27 00:11:56 +00:00
static inline void session_offer_buffers()
{
int avail;
if (LIST_ISEMPTY(&buffer_wq))
return;
/* all sessions will need 1 buffer, so we can stop waking up sessions
* once we have enough of them to eat all the buffers. Note that we
* don't really know if they are sessions or just other tasks, but
* that's a rough estimate. Similarly, for each cached event we'll need
* 1 buffer. If no buffer is currently used, always wake up the number
* of tasks we can offer a buffer based on what is allocated, and in
* any case at least one task per two reserved buffers.
*/
avail = pool2_buffer->allocated - pool2_buffer->used - global.tune.reserved_bufs / 2;
if (avail > (int)run_queue)
__session_offer_buffers(avail);
}
#endif /* _PROTO_SESSION_H */
/*
* Local variables:
* c-indent-level: 8
* c-basic-offset: 8
* End:
*/