From f909c91e8a739b9ef7409b399259201fe883771c Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Thu, 22 Aug 2019 20:06:04 +0200
Subject: [PATCH] DOC: management: document the "trace" and "show trace"
 commands

At the moment the subsystem is still not complete and the various modules
do not yet produce traces (some dirty experimental code for H2 exists) but
this aims at easing a broad adoption.

Among the missing elements, we can enumerate the lack of configuration
of the sinks (e.g. it's still not possible to change their output format
nor enable/disable timestamps) and since timestamps are not availalbe in
the sinks, they are not collected nor passed by the traces.
---
 doc/management.txt | 155 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 155 insertions(+)
diff --git a/doc/management.txt b/doc/management.txt
index 41d0e82a43..53d8f1d9e1 100644
--- a/doc/management.txt
+++ b/doc/management.txt
@@ -2563,6 +2563,16 @@ show schema json
   verifiers may be used to verify the output of "show info json" and "show
   stat json" against the schema.
 
+show trace [<source>]
+  Show the current trace status. For each source a line is displayed with a
+  single-character status indicating if the trace is stopped, waiting, or
+  running. The output sink used by the trace is indicated (or "none" if none
+  was set), as well as the number of dropped events in this sink, followed by a
+  brief description of the source. If a source name is specified, a detailed
+  list of all events supported by the source, and their status for each action
+  (report, start, pause, stop), indicated by a "+" if they are enabled, or a
+  "-" otherwise. All these events are independent and an event might trigger
+  a start without being reported and conversely.
 
 shutdown frontend <frontend>
   Completely delete the specified frontend. All the ports it was bound to will
@@ -2593,6 +2603,151 @@ shutdown sessions server <backend>/<server>
   maintenance mode, for instance. Such terminated sessions are reported with a
   'K' flag in the logs.
 
+trace
+  The "trace" command alone lists the trace sources, their current status, and
+  their brief descriptions. It is only meant as a menu to enter next levels,
+  see other "trace" commands below.
+
+trace 0
+  Immediately stops all traces. This is made to be used as a quick solution
+  to terminate a debugging session or as an emergency action to be used in case
+  complex traces were enabled on multiple sources and impact the service.
+
+trace <source> event [ [+|-|!]<name> ]
+  Without argument, this will list all the events supported by the designated
+  source. They are prefixed with a "-" if they are not enabled, or a "+" if
+  they are enabled. It is important to note that a single trace may be labelled
+  with multiple events, and as long as any of the enabled events matches one of
+  the events labelled on the trace, the event will be passed to the trace
+  subsystem. For example, receiving an HTTP/2 frame of type HEADERS may trigger
+  a frame event and a stream event since the frame creates a new stream. If
+  either the frame event or the stream event are enabled for this source, the
+  frame will be passed to the trace framework.
+
+  With an argument, it is possible to toggle the state of each event and
+  individually enable or disable them. Two special keywords are supported,
+  "none", which matches no event, and is used to disable all events at once,
+  and "any" which matches all events, and is used to enable all events at
+  once. Other events are specific to the event source. It is possible to
+  enable one event by specifying its name, optionally prefixed with '+' for
+  better readability. It is possible to disable one event by specifying its
+  name prefixed by a '-' or a '!'.
+
+  One way to completely disable a trace source is to pass "event none", and
+  this source will instantly be totally ignored.
+
+trace <source> level [<level>]
+  Without argument, this will list all detail levels for this source, and the
+  current one will be indicated by a star ('*') prepended in front of it. With
+  an argument, this will change the detail level to the specified level. Detail
+  levels are a form of filters that are applied before reporting the events.
+  These filters are used to report a level of detail suitable for the use case.
+  For example a developer might need to know precisely where in the code an
+  HTTP header was considered invalid while the end user may not even care about
+  this header's validity at all. There are currently 5 distinct levels for a
+  trace :
+
+      user         this will report information that are suitable for use by a
+                   regular haproxy user who wants to observe his traffic.
+                   Typically some HTTP requests and responses will be reported
+                   without much detail. Most sources will set this as the
+                   default level to ease operations.
+
+      payload      in addition to what is reported at the "user" level, it will
+                   also display more detailed information about the contents,
+                   which may be HTTP headers, or unencoded contents.
+
+      proto        in addition to what is reported at the "payload" level, it
+                   also display protocol-level information. This can for
+                   example be the raw data exchanged over the wire after
+                   encoding or frames received before decoding.
+
+      state        in addition to what is reported at the "proto" level, it
+                   will also display state transitions (or failed transitions)
+                   which happen in parsers, so this will show attempts to
+                   perform an operation while the "proto" level only shows
+                   the final operation.
+
+      developer    it reports everything available, which can include advanced
+                   information such as "breaking out of this loop" that are
+                   only relevant to a developer trying to understand a bug that
+                   only happens once in a while in field.
+
+  It is highly recommended to always use the "user" level only and switch to
+  other levels only if instructed to do so by a developer. Also it is a good
+  idea to first configure the events before switching to higher levels, as it
+  may save from dumping many lines if no filter is applied.
+
+trace <source> lock [criterion]
+  Without argument, this will list all the criteria supported by this source
+  for lock-on processing, and display the current choice by a star ('*') in
+  front of it. Lock-on means that the source will focus on the first matching
+  event and only stick to the criterion which triggered this event, and ignore
+  all other ones until the trace stops. This allows for example to take a trace
+  on a single connection or on a single stream. The following criteria are
+  supported by some traces, though not necessarily all, since some of them
+  might not be available to the source :
+
+      backend      lock on the backend that started the trace
+      connection   lock on the connection that started the trace
+      frontend     lock on the frontend that started the trace
+      listener     lock on the listener that started the trace
+      nothing      do not lock on anything
+      server       lock on the server that started the trace
+      session      lock on the session that started the trace
+      thread       lock on the thread that started the trace
+
+  In addition to this, each source may provide up to 4 specific criteria such
+  as internal states or connection IDs. For example in HTTP/2 it is possible
+  to lock on the H2 stream and ignore other streams once a strace starts.
+
+  When a criterion is passed in argument, this one is used instead of the
+  other ones and any existing tracking is immediately terminated so that it can
+  restart with the new criterion. The special keyword "nothing" is supported by
+  all sources to permanently disable tracking.
+
+trace <source> { pause | start | stop } [ [+|-|!]event]
+  Without argument, this will list the events enabled to automatically pause,
+  start, or stop a trace for this source. These events are specific to each
+  trace source. With an argument, this will either enable the event for the
+  specified action (if optionally prefixed by a '+') or disable it (if
+  prefixed by a '-' or '!'). The special keyword "now" is not an event and
+  requests to take the action immediately. The keywords "none" and "any" are
+  supported just like in "trace event".
+
+  The 3 supported actions are respectively "pause", "start" and "stop". The
+  "pause" action enumerates events which will cause a running trace to stop and
+  wait for a new start event to restart it. The "start" action enumerates the
+  events which switch the trace into the waiting mode until one of the start
+  events appears. And the "stop" action enumerates the events which definitely
+  stop the trace until it is manually enabled again. In practice it makes sense
+  to manually start a trace using "start now" without caring about events, and
+  to stop it using "stop now". In order to capture more subtle event sequences,
+  setting "start" to a normal event (like receiving an HTTP request) and "stop"
+  to a very rare event like emitting a certain error, will ensure that the last
+  captured events will match the desired criteria. And the pause event is
+  useful to detect the end of a sequence, disable the lock-on and wait for
+  another opportunity to take a capture. In this case it can make sense to
+  enable lock-on to spot only one specific criterion (e.g. a stream), and have
+  "start" set to anything that starts this criterion (e.g. all events which
+  create a stream), "stop" set to the expected anomaly, and "pause" to anything
+  that ends that criterion (e.g. any end of stream event). In this case the
+  trace log will contain complete sequences of perfectly clean series affecting
+  a single object, until the last sequence containing everything from the
+  beginning to the anomaly.
+
+trace <source> sink [<sink>]
+   Without argument, this will list all event sinks available for this source,
+   and the currently configured one will have a star ('*') prepended in front
+   of it. Sink "none" is always available and means that all events are simply
+   dropped, though their processing is not ignored (e.g. lock-on does occur).
+   Other sinks are available depending on configuration and build options, but
+   typically "stdout" and "stderr" will be usable in debug mode, and in-memory
+   ring buffers should be available as well. When a name is specified, the sink
+   instantly changes for the specified source. Events are not changed during a
+   sink change. In the worst case some may be lost if an invalid sink is used
+   (or "none"), but operations do continue to a different destination.
+
 
 9.4. Master CLI
 ---------------