ceph/qa/suites/rgw/multisite
Sage Weil cf352c3ac0 osd: add osd_fast_shutdown option (default true)
If we get a SIGINT or SIGTERM or are deleted from the OSDMap, do a fast
shutdown by exiting immediately.  This has a few important benefits:

 - We immediately stop responding (binding) to any sockets, which means
   other OSDs will immediately decide we are down (and dead!).  This
   minimizes IO interruption.
 - We avoid the complex "clean" shutdown process, which is historically a
   source of bugs.

In reality, the only purpose of the "clean" shutdown is to try to tear down
everything in memory so we can do memory leak checking with valgrind.  Set
this option to false for valgrind QA runs so we can still do that.

Not that with the new read leases in octopus, we rely on the default
behavior that a ECONNREFUSED is taken to mean that the OSD is fully dead,
so that we don't have to wait for any leases to time out.  This works in
sane environments with normal IP networks, but that behavior could
conceivably be a bad idea if there are some weird network shenanigans
going on.  If osd_fast_fail_on_connection_refused were disabled, then this
fast shutdown procedure might be *worse* than the clean shutdown because
we would have to wait for the heartbeat timeout.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-15 09:31:50 -06:00
..
realms
tasks Added radosgw_admin_rest task to multisite yaml. 2019-08-22 13:44:48 +05:30
.qa
%
clusters.yaml
frontend
omap_limits.yaml
overrides.yaml qa/rgw: use 'testing' kms backend for multisite tests 2019-11-04 12:49:07 -05:00
valgrind.yaml osd: add osd_fast_shutdown option (default true) 2019-11-15 09:31:50 -06:00