Commit Graph

10527 Commits

Author SHA1 Message Date
Willy Tarreau
4ac9d064d2 MEDIUM: fd: mark the FD as ready when it's inserted
Given that all our I/Os are now directed from top to bottom and not the
opposite way around, and the FD cache was removed, it doesn't make sense
anymore to create FDs that are marked not ready since this would prevent
the first accesses unless the caller explicitly does an fd_may_recv()
which is not expected to be its job (which conn_ctrl_init() has to do
by the way). Let's move this into fd_insert() instead, and have a single
atomic operation for both directions via fd_may_both().
2019-09-06 17:50:36 +02:00
Emeric Brun
5762a0db0a BUG/MAJOR: ssl: ssl_sock was not fully initialized.
'ssl_sock' wasn't fully initialized so a new session can inherit some
flags from an old one.

This causes some fetches, related to client's certificate presence or
its verify status and errors, returning erroneous values.

This issue could generate other unexpected behaviors because a new
session could also inherit other flags such as SSL_SOCK_ST_FL_16K_WBFSIZE,
SSL_SOCK_SEND_UNLIMITED, or SSL_SOCK_RECV_HEARTBEAT from an old session.

This must be backported to 2.0 but it's useless for previous.
2019-09-06 17:33:33 +02:00
Willy Tarreau
ed5ac9c786 BUG/MINOR: lb/leastconn: ignore the server weights for empty servers
As discussed in issue #178, the change brought around 1.9-dev11 by commit
1eb6c55808 ("MINOR: lb: make the leastconn algorithm more accurate")
causes some harm in the situation it tried to improve. By always applying
the server's weight even for no connection, we end up always picking the
same servers for the first connections, so under a low load, if servers
only have either 0 or 1 connections, in practice the same servers will
always be picked.

This patch partially restores the original behaviour but still keeping
the spirit of the aforementioned patch. Now what is done is that servers
with no connections will always be picked first, regardless of their
weight, so they will effectively follow round-robin. Only servers with
one connection or more will see an accurate weight applied.

This patch was developed and tested by @malsumis and @jaroslawr who
reported the initial issue. It should be backported to 2.0 and 1.9.
2019-09-06 17:13:44 +02:00
Christopher Faulet
d45d105428 MINOR: contrib/prometheus-exporter: Report DRAIN/MAINT/NOLB status for servers
Now, following status are reported for servers:0=DOWN, 1=UP, 2=MAINT, 3=DRAIN,
4=NOLB.

It is linked to the github issue #255. Thanks to Mickaël Martin. If needed, this
patch may be backported to 2.0.
2019-09-06 16:15:07 +02:00
Ilya Shipitsin
c886e5911b BUILD: CI: add basic CentOS 6 cirrus build 2019-09-06 11:44:08 +02:00
Christopher Faulet
cac5c094d1 BUG/MINOR: mux-h1: Fix a UAF in cfg_h1_headers_case_adjust_postparser()
When an error occurs in the post-parser callback which checks configuration
validity of the option outgoing-headers-case-adjust-file, the error message is
freed too early, before being used.

No backport needed. It fixes the github issue #258.
2019-09-06 08:59:23 +02:00
Willy Tarreau
c594039225 BUG/MINOR: checks: do not uselessly poll for reads before the connection is up
It's pointless to start to perform a recv() call on a connection that is
not yet established. The only purpose used to be to subscribe but that
causes many extra syscalls when we know we can do it later.

This patch only attempts a read if the connection is established or if
there is no write planed, since we want to be certain to be called. And
in wake_srv_chk() we continue to attempt to read if the reader was not
subscribed, so as to perform the first read attempt. In case a first
result is provided, __event_srv_chk_r() will not do anything anyway so
this is totally harmless in this case.

This fix requires that commit "BUG/MINOR: checks: make __event_chk_srv_r()
report success before closing" is applied before, otherwise it will break
some checks (notably SSL) by doing them again after the connection is shut
down. This completes the fixes on the checks described in issue #253 by
roughly cutting the number of syscalls in half. It must be backported to
2.0.
2019-09-06 08:13:15 +02:00
Willy Tarreau
4c1a2b30a3 BUG/MINOR: checks: make __event_chk_srv_r() report success before closing
On a plain TCP check, this function will do nothing except shutting the
connection down and will not even update the status. This prevents it
from being called again, which is the reason why we attempt to do it
once too early. Let's first fix this function to make it report success
on plain TCP checks before closing, as it does for all other ones.

This must be backported to 2.0. It should be safe to backport to older
versions but it doesn't seem it would fix anything there.
2019-09-06 08:13:15 +02:00
Willy Tarreau
cc705a6b61 BUG/MINOR: checks: start sending the request right after connect()
Since the change of I/O direction, we must not wait for an empty connect
callback before sending the request, we must attempt to send it as soon
as possible so that we don't uselessly poll. This is what this patch
does. This reduces the total check duration by a complete poll loop
compared to what is described in issue #253.

This must be backported to 2.0.
2019-09-06 08:13:15 +02:00
Willy Tarreau
5909380c05 BUG/MINOR: checks: stop polling for write when we have nothing left to send
Since the change of I/O direction, we perform the connect() call and
the send() call together from the top. But the send call must at least
disable polling for writes once it does not have anything left to send.

This bug is partially responsible for the waste of resources described
in issue #253.

This must be backported to 2.0.
2019-09-06 08:13:15 +02:00
Willy Tarreau
616c1cf774 CONTRIB: debug: add new program "poll" to test poll() events
This simple program prepares a TCP connection between two ends and
allows to perform various operations on them such as send, recv, poll,
shutdown, close, reset, etc. It takes care of remaining particularly
silent to help inspection via strace, though it can also be verbose
and report status, errno, and poll events. It delays acceptation of
the incoming server-side connection so that it's even possible to
test the poll status on a listener with a pending connection, or
to close the connection without accepting it and inspect the effect
on the client.

Actions are executed in the command line order as they are parsed,
they may be grouped using commas when they are performed on the same
socket.

Example showing a successful recv() of pending data before a pending error:
   $ ./poll -v -l pol,acc,pol -c snd,shw -s pol,rcv,pol,rcv,pol,snd,lin,clo -c pol,rcv,pol,rcv,pol

   #### BEGIN ####
   cmd #1 stp #1: do_pol(3): ret=1 ev=0x1 (IN)
   cmd #1 stp #2: do_acc(3): ret=5
   cmd #1 stp #3: do_pol(3): ret=0 ev=0
   cmd #2 stp #1: do_snd(4): ret=3
   cmd #2 stp #2: do_shw(4): ret=0
   cmd #3 stp #1: do_pol(5): ret=1 ev=0x2005 (IN OUT RDHUP)
   cmd #3 stp #2: do_rcv(5): ret=3
   cmd #3 stp #3: do_pol(5): ret=1 ev=0x2005 (IN OUT RDHUP)
   cmd #3 stp #4: do_rcv(5): ret=0
   cmd #3 stp #5: do_pol(5): ret=1 ev=0x2005 (IN OUT RDHUP)
   cmd #3 stp #6: do_snd(5): ret=3
   cmd #3 stp #7: do_lin(5): ret=0
   cmd #3 stp #8: do_clo(5): ret=0
   cmd #4 stp #1: do_pol(4): ret=1 ev=0x201d (IN OUT ERR HUP RDHUP)
   cmd #4 stp #2: do_rcv(4): ret=3
   cmd #4 stp #3: do_pol(4): ret=1 ev=0x201d (IN OUT ERR HUP RDHUP)
   cmd #4 stp #4: do_rcv(4): ret=-1 (Connection reset by peer)
   cmd #4 stp #5: do_pol(4): ret=1 ev=0x2015 (IN OUT HUP RDHUP)
   #### END ####
2019-09-05 09:31:18 +02:00
Willy Tarreau
dbe3060e81 MINOR: fd: make updt_fd_polling() a normal function
It's called from many places, better use a real function than an inline.
2019-09-05 09:31:18 +02:00
Willy Tarreau
f8ecc7f667 MEDIUM: fd: simplify the fd_*_{recv,send} functions using BTS/BTR
Now that we don't have to update FD_EV_POLLED_* at the same time as
FD_EV_ACTIVE_*, we don't need to use a CAS anymore, a bit-test-and-set
operation is enough. Doing so reduces the code size by a bit more than
1 kB. One function was special, fd_done_recv(), whose comments and doc
were inaccurate for the part related to the lack of polling.
2019-09-05 09:31:18 +02:00
Willy Tarreau
5bee3e2f47 MEDIUM: fd: remove the FD_EV_POLLED status bit
Since commit 7ac0e35f2 in 1.9-dev1 ("MAJOR: fd: compute the new fd polling
state out of the fd lock") we've started to update the FD POLLED bit a
bit more aggressively. Lately with the removal of the FD cache, this bit
is always equal to the ACTIVE bit. There's no point continuing to watch
it and update it anymore, all it does is create confusion and complicate
the code. One interesting side effect is that it now becomes visible that
all fd_*_{send,recv}() operations systematically call updt_fd_polling(),
except fd_cant_recv()/fd_cant_send() which never saw it change.
2019-09-05 09:31:18 +02:00
Christopher Faulet
51bb185618 BUG/MINOR: mux-h1: Fix a possible null pointer dereference in h1_subscribe()
This patch fixes the github issue #243. No backport needed.
2019-09-04 10:30:11 +02:00
Christopher Faulet
b066747107 BUG/MEDIUM: cache: Don't cache objects if the size of headers is too big
HTTP responses with headers than impinge upon the reserve must not be
cached. Otherwise, there is no warranty to have enough space to add the header
"Age" when such cached responses are delivered.

This patch must be backported to 2.0 and 1.9. For these versions, the same must
be done for the legacy HTTP mode.
2019-09-04 10:30:11 +02:00
Christopher Faulet
15a4ce870a BUG/MEDIUM: cache: Properly copy headers splitted on several shctx blocks
In the cache, huge HTTP headers will use several shctx blocks. When a response
is returned from the cache, these headers must be properly copied in the
corresponding HTX message by updating the pointer where to copied a header
part.

This patch must be backported to 2.0 and 1.9.
2019-09-04 10:30:11 +02:00
Christopher Faulet
f1ef7f641d BUG/MINOR: mux-h1: Be sure to update the count before adding EOM after trailers
Otherwise, an EOM may be added in a full buffer.

This patch must be backported to 2.0.
2019-09-04 10:30:11 +02:00
Christopher Faulet
6b32192cfb BUG/MINOR: mux-h1: Don't stop anymore input processing when the max is reached
The loop is now stopped only when nothing else is consumed from the input buffer
or if a parsing error is encountered. This will let a chance to detect cases
when we fail to add the EOM.

For instance, when the max is reached after the headers parsing and all the
message is received. In this case, we may have the flag H1S_F_REOS set without
the flag H1S_F_APPEND_EOM and no pending input data, leading to an error because
we think it is an abort.

This patch must be backported to 2.0. This bug does not affect 1.9.
2019-09-04 10:30:11 +02:00
Christopher Faulet
8427d0d6f8 BUG/MINOR: mux-h1: Fix size evaluation of HTX messages after headers parsing
The block size of the start-line was not counted.

This patch must be backported to 2.0.
2019-09-04 10:30:11 +02:00
Christopher Faulet
84f06533e1 BUG/MINOR: h1: Properly reset h1m when parsing is restarted
Otherwise some processing may be performed twice. For instance, if the header
"Content-Length" is parsed on the first pass, when the parsing is restarted, we
skip it because we think another header with the same value was already seen. In
fact, it is currently the only existing bug that can be encountered. But it is
safer to reset all the h1m on restart to avoid any future bugs.

This patch must be backported to 2.0 and 1.9
2019-09-04 10:30:11 +02:00
Christopher Faulet
3499f62b59 BUG/MINOR: http-ana: Reset response flags when 1xx messages are handled
Otherwise, the following final response could inherit of some of these
flags. For instance, because informational responses have no body, the flag
HTTP_MSGF_BODYLESS is set for 1xx messages. If it is not reset, this flag will
be kept for the final response.

One of visible effect of this bug concerns the HTTP compression. When the final
response is preceded by an 1xx message, the compression is not performed. This
was reported in github issue #229.

This patch must be backported to 2.0 and 1.9. Note that the file http_ana.c does
not exist for these branches, the patch must be applied on proto_htx.c instead.
2019-09-04 10:29:55 +02:00
Jerome Magnin
78891c7e71 BUILD: connection: silence gcc warning with extra parentheses
Commit 8a4ffa0a ("MINOR: send-proxy-v2: sends authority TLV according
to TLV received") is missing parentheses around a variable assignment
used as condition in an if statement, and gcc isn't happy about it.
2019-09-02 16:59:32 +02:00
Frdric Lcaille
9c3a0ceeac BUG/MEDIUM: peers: local peer socket not bound.
This bug came with 015e4d7 commit: "MINOR: stick-tables: Add peers process
binding computing" where the "stick" rules cases were missing when computing
the peer local listener process binding. At parsing time we store in the
stick-table struct ->proxies_list the proxies which refer to this stick-table.
The process binding is computed after having parsed the entire configuration file
with this simple loop in cfgparse.c:

     /* compute the required process bindings for the peers from <stktables_list>
      * for all the stick-tables, the ones coming with "peers" sections included.
      */
     for (t = stktables_list; t; t = t->next) {
             struct proxy *p;

             for (p = t->proxies_list; p; p = p->next_stkt_ref) {
                     if (t->peers.p && t->peers.p->peers_fe) {
                             t->peers.p->peers_fe->bind_proc |= p->bind_proc;
                     }
             }
     }

Note that if this process binding is not correctly initialized, the child forked
by the master-worker stops the peer local listener. Should be also the case
when daemonizing haproxy.

Must be backported to 2.0.
2019-09-02 14:39:38 +02:00
Lukas Tribus
cc1eb1619f MINOR: build: add linux-glibc-legacy build TARGET
As discussed in issue #128, introduce a new build TARGET
linux-glibc-legacy to allow the build on old, legacy OS.

Should be backported to 2.0.
2019-09-01 17:28:10 +02:00
Emmanuel Hocdet
8a4ffa0aab MINOR: send-proxy-v2: sends authority TLV according to TLV received
Since patch "7185b789", the authority TLV in a PROXYv2 header from a
client connection is stored. Authority TLV sends in PROXYv2 should be
taken into account to allow chaining PROXYv2 without droping it.
2019-08-31 12:28:33 +02:00
Willy Tarreau
c046d167e4 MEDIUM: log: add support for logging to a ring buffer
Now by prefixing a log server with "ring@<name>" it's possible to send
the logs to a ring buffer. One nice thing is that it allows multiple
sessions to consult the logs in real time in parallel over the CLI, and
without requiring file system access. At the moment, ring0 is created as
a default sink for tracing purposes and is available. No option is
provided to create new rings though this is trivial to add to the global
section.
2019-08-30 15:24:59 +02:00
Willy Tarreau
f3dc30f6de MINOR: log: add a target type instead of hacking the address family
Instead of detecting an AF_UNSPEC address family for a log server and
to deduce a file descriptor, let's create a target type field and
explicitly mention that the socket is of type FD.
2019-08-30 15:07:25 +02:00
Willy Tarreau
d52a7f8c8d MEDIUM: log: use the new generic fd_write_frag_line() function
When logging to a file descriptor, we'd rather use the unified
fd_write_frag_line() which uses the FD's lock than perform the
writev() ourselves and use a per-server lock, because if several
loggers point to the same output (e.g. stdout) they are still
not locked and their logs may interleave. The function above
instead relies on the fd's lock so this is safer and will even
protect against concurrent accesses from other areas (e.g traces).
The function also deals with the FD's non-blocking mode so we do
not have to keep specific code for this anymore in the logs.
2019-08-30 15:07:25 +02:00
Willy Tarreau
7e9776ad7b MINOR: fd/log/sink: make the non-blocking initialization depend on the initialized bit
Logs and sinks were resorting to dirty hacks to initialize an FD to
non-blocking mode. Now we have a bit for this in the fd tab so we can
do it on the fly on first use of the file descriptor. Previously it was
set per log server by writing value 1 to the port, or during a sink
initialization regardless of the usage of the fd.
2019-08-30 15:07:25 +02:00
Willy Tarreau
d660990cee MINOR: fd: add a new "initialized" bit in the fdtab struct
The purpose is to be able to remember that initialization was already
done for a file descriptor. This will allow to get rid of some dirty
hacks performed in the logs or fd sinks where the init state of the
fd has to be guessed.
2019-08-30 15:07:25 +02:00
Willy Tarreau
76913d3ef4 CLEANUP: fd: remove leftovers of the fdcache
The "cache" entry was still present in the fdtab struct and it was
reported in "show sess". Removing it broke the cache-line alignment
on 64-bit machines which is important for threads, so it was fixed
by adding an attribute(aligned()) when threads are in use. Doing it
only in this case allows 32-bit thread-less platforms to see the
struct fit into 32 bytes.
2019-08-30 15:07:25 +02:00
Willy Tarreau
30362908d8 BUG/MINOR: ring: b_peek_varint() returns a uint64_t, not a size_t
The difference matters when building on 32-bit architectures and a
warning was rightfully emitted.

No backport is needed.
2019-08-30 15:07:25 +02:00
Willy Tarreau
e7bbbca781 BUG/MEDIUM: mux-h2/trace: fix missing braces added with traces
Ilya reported in issue #242 that h2c_handle_priority() was having
unreachable code...  Obviously, I missed the braces around the "if",
leaving an unconditional return.

No backport is needed.
2019-08-30 15:03:58 +02:00
Willy Tarreau
fe1c908744 BUG/MEDIUM: mux-h2/trace: do not dereference h2c->conn after failed idle
In h2_detach(), if session_check_idle_conn() returns <0 we must not
dereference it since it has been freed.

No backport is needed.
2019-08-30 15:00:42 +02:00
Willy Tarreau
1d181e489c MEDIUM: ring: implement a wait mode for watchers
Now it is possible for a reader to subscribe and wait for new events
sent to a ring buffer. When new events are written to a ring buffer,
the applets that are subscribed are woken up to display new events.
For now we only support this with the CLI applet called by "show events"
since the I/O handler is indeed a CLI I/O handler. But it's not
complicated to add other mechanisms to consume events and forward them
to external log servers for example. The wait mode is enabled by adding
"-w" after "show events <sink>". An extra "-n" was added to directly
seek to new events only.
2019-08-30 11:58:58 +02:00
Willy Tarreau
70b1e50feb MINOR: mux-h2/trace: report the connection pointer and state before FRAME_H
Initially we didn't report anything before FRAME_H but at least the
connection's pointer and its state are desirable.
2019-08-30 11:58:58 +02:00
Willy Tarreau
300decc8d9 MINOR: cli: extend the CLI context with a list and two offsets
Some CLI parsers are currently abusing the CLI context types such as
pointers to stuff longs into them by lack of room. But the context is
80 bytes while cli is only 48, thus there's some room left. This patch
adds a list element and two size_t usable as various offsets. The list
element is initialized.
2019-08-30 11:58:58 +02:00
Willy Tarreau
13696ffba2 BUG/MINOR: ring: fix the way watchers are counted
There are two problems with the way we count watchers on a ring:
  - the test for >=255 was accidently kept to 1 used during QA
  - if the producer deletes all data up to the reader's position
    and the reader is called, cannot write, and is called again,
    it will have a zero offset, causing it to initialize itself
    multiple times and each time adding a new refcount.

Let's change this and instead use ~0 as the initial offset as it's
not possible to have it as a valid one. We now store it into the
CLI context's size_t o0 instead of casting it to a void*.

No backport needed.
2019-08-30 11:58:58 +02:00
Willy Tarreau
99282ddb2c MINOR: trace: extend default event names to 12 chars
With "tx_settings" the 10-chars limit was already passed, thus it sounds
reasonable to push this slightly.
2019-08-30 07:39:59 +02:00
Willy Tarreau
8795194f79 CLEANUP: mux-h2/trace: lower-case event names
I wanted to do it before pushing and forgot. It's easier to type lower-
case event names and more consistent with the "none" and "any" keywords.
2019-08-30 07:39:59 +02:00
Willy Tarreau
8fecec2839 CLEANUP: mux-h2/trace: reformat the "received" messages for better alignment
user-level traces are more readable when visually aligned. This is easily
done by writing "rcvd" instead of "received" to align with "sent" :

  $ socat - /tmp/sock1 <<< "show events buf0"
  [00|h2|0|mux_h2.c:2465] rcvd H2 request  : [1] H2 REQ: GET /?s=10k HTTP/2.0
  [00|h2|0|mux_h2.c:4563] sent H2 response : [1] H2 RES: HTTP/1.1 200
2019-08-30 07:39:59 +02:00
Willy Tarreau
c067a3ac8f MINOR: mux-h2/trace: report h2s->id before h2c->dsi for the stream ID
h2c->dsi is only for demuxing, and needed while decoding a new request.
But if we already have a valid stream ID (e.g. response or outgoing
request), we should use it instead. This avoids seeing [0] in front of
the responses at user level.
2019-08-30 07:39:59 +02:00
Willy Tarreau
17104d46be MINOR: mux-h2/trace: always report the h2c/h2s state and flags
There's no limitation to just "state" trace level anymore, we're
expected to always show these internal states at verbosity levels
above "clean".
2019-08-30 07:39:59 +02:00
Willy Tarreau
94f1dcf119 MINOR: mux-h2/trace: only decode the start-line at verbosity other than "minimal"
This is as documented in "trace h2 verbosity", level "minimal" only
features flags and doesn't perform any decoding at all, "simple" does,
just like "clean" which is the default for end uesrs.
2019-08-30 07:39:59 +02:00
Willy Tarreau
f7dd5191cd MINOR: mux-h2/trace: add a new verbosity level "clean"
The "clean" output will be suitable for user and proto-level output
where the internal stuff (state, pointers, etc) is not desired but
just the basic protocol elements.
2019-08-30 07:38:42 +02:00
Willy Tarreau
ab2ec45403 MINOR: mux-h2: add functions to convert an h2c/h2s state to a string
We need this all the time in traces, let's have it now. For the sake
of compact outputs, the strings are all 3-chars long. The "show fd"
output was improved to make use of this.
2019-08-30 07:10:46 +02:00
Willy Tarreau
7838a79bac MEDIUM: mux-h2/trace: add lots of traces all over the code
All functions of the h2 data path were updated to receive one or multiple
TRACE() calls, at least one pair of TRACE_ENTER()/TRACE_LEAVE(), and those
manipulating protocol elements have been improved to report frame types,
special state transitions or detected errors. Even with careful tests, no
performance impact was measured when traces are disabled.

They are not completely exploited yet, the callback function tries to
dump a lot about them, but still doesn't provide buffer dumps, nor does
it indicate the stream or connection error codes.

The first argument is always set to the connection when known. The second
argument is set to the h2s when known, sometimes a 3rd argument is set to
a buffer, generally the rxbuf or htx, and occasionally the 4th argument
points to an integer (number of bytes read/sent, error code).

Retrieving a 10kB object produces roughly 240 lines when at developer
level, 35 lines at data level, 27 at state level, and 10 at proto level
and 2 at user level.

For now the headers are not dumped, but the start line are emitted in
each direction at user level.

The patch is marked medium because it touches lots of places, though
it takes care not to change the execution path.
2019-08-29 18:22:12 +02:00
Willy Tarreau
db3cfff200 MINOR: mux-h2/trace: add the default decoding callback
The new function h2_trace() is called when relevant by the trace subsystem
in order to provide extra information about the trace being produced. It
can for example display the connection pointer, the stream pointer, etc.
It is declared in the trace source as the default callback as we expect
it to be versatile enough to enrich most traces.

In addition, for requests and responses, if we have a buffer and we can
decode it as an HTX buffer, we can extract user-friendly information from
the start line.
2019-08-29 18:19:11 +02:00
Willy Tarreau
12ae212837 MINOR: mux-h2/trace: register a new trace source with its events
For now the traces are not used. Supported events are categorized by
where the activity comes from (h2c, h2s, stream, etc), a direction
(send/recv/wake), and a list of possibilities for each of them (frame
types, errors, shut, ...). This results in ~50 different events that
try to cover a lot of possibilities when it's needed to filter on
something specific. Special events like protocol error are handled.
A few aggregate events like "rx_frame" or "tx_frame" are planed to
cover all frame types at once by being placed all the time with any
of the other ones.

We also state that the first argument is always the connection. This way
the trace subsystem will be able to safely retrieve some useful info, and
we'll still be able to get the h2c from there (conn->ctx) in a pretty
print function. The second argument will always be an h2s, and in order
to propose it for tracking, we add its description. We also define 4
verbosity levels, which seems more than enough.
2019-08-29 17:14:35 +02:00