mirror of
https://github.com/ceph/ceph
synced 2025-03-11 02:39:05 +00:00
Merge pull request #13968 from dzafman/wip-15912-followon
osd,mon: misc full fixes and cleanups Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: John Spray <john.spray@redhat.com> Reviewed-by: Kefu Chai <kchai@redhat.com>
This commit is contained in:
commit
37e9a874af
@ -34,8 +34,8 @@ the typical process.
|
||||
|
||||
Once the primary has its local reservation, it requests a remote
|
||||
reservation from the backfill target. This reservation CAN be rejected,
|
||||
for instance if the OSD is too full (osd_backfill_full_ratio config
|
||||
option). If the reservation is rejected, the primary drops its local
|
||||
for instance if the OSD is too full (backfillfull_ratio osd setting).
|
||||
If the reservation is rejected, the primary drops its local
|
||||
reservation, waits (osd_backfill_retry_interval), and then retries. It
|
||||
will retry indefinitely.
|
||||
|
||||
@ -62,9 +62,10 @@ to the monitor. The state chart can set:
|
||||
|
||||
- recovery_wait: waiting for local/remote reservations
|
||||
- recovering: recovering
|
||||
- recovery_toofull: recovery stopped, OSD(s) above full ratio
|
||||
- backfill_wait: waiting for remote backfill reservations
|
||||
- backfilling: backfilling
|
||||
- backfill_toofull: backfill reservation rejected, OSD too full
|
||||
- backfill_toofull: backfill stopped, OSD(s) above backfillfull ratio
|
||||
|
||||
|
||||
--------
|
||||
|
@ -1166,6 +1166,12 @@ Usage::
|
||||
|
||||
ceph pg set_full_ratio <float[0.0-1.0]>
|
||||
|
||||
Subcommand ``set_backfillfull_ratio`` sets ratio at which pgs are considered too full to backfill.
|
||||
|
||||
Usage::
|
||||
|
||||
ceph pg set_backfillfull_ratio <float[0.0-1.0]>
|
||||
|
||||
Subcommand ``set_nearfull_ratio`` sets ratio at which pgs are considered nearly
|
||||
full.
|
||||
|
||||
|
@ -400,6 +400,7 @@ a reasonable number for a near full ratio.
|
||||
[global]
|
||||
|
||||
mon osd full ratio = .80
|
||||
mon osd backfillfull ratio = .75
|
||||
mon osd nearfull ratio = .70
|
||||
|
||||
|
||||
@ -412,6 +413,15 @@ a reasonable number for a near full ratio.
|
||||
:Default: ``.95``
|
||||
|
||||
|
||||
``mon osd backfillfull ratio``
|
||||
|
||||
:Description: The percentage of disk space used before an OSD is
|
||||
considered too ``full`` to backfill.
|
||||
|
||||
:Type: Float
|
||||
:Default: ``.90``
|
||||
|
||||
|
||||
``mon osd nearfull ratio``
|
||||
|
||||
:Description: The percentage of disk space used before an OSD is
|
||||
|
@ -560,15 +560,6 @@ priority than requests to read or write data.
|
||||
:Default: ``512``
|
||||
|
||||
|
||||
``osd backfill full ratio``
|
||||
|
||||
:Description: Refuse to accept backfill requests when the Ceph OSD Daemon's
|
||||
full ratio is above this value.
|
||||
|
||||
:Type: Float
|
||||
:Default: ``0.85``
|
||||
|
||||
|
||||
``osd backfill retry interval``
|
||||
|
||||
:Description: The number of seconds to wait before retrying backfill requests.
|
||||
@ -673,13 +664,6 @@ perform well in a degraded state.
|
||||
:Default: ``8 << 20``
|
||||
|
||||
|
||||
``osd recovery threads``
|
||||
|
||||
:Description: The number of threads for recovering data.
|
||||
:Type: 32-bit Integer
|
||||
:Default: ``1``
|
||||
|
||||
|
||||
``osd recovery thread timeout``
|
||||
|
||||
:Description: The maximum time in seconds before timing out a recovery thread.
|
||||
|
@ -468,8 +468,7 @@ Ceph provides a number of settings to balance the resource contention between
|
||||
new service requests and the need to recover data objects and restore the
|
||||
placement groups to the current state. The ``osd recovery delay start`` setting
|
||||
allows an OSD to restart, re-peer and even process some replay requests before
|
||||
starting the recovery process. The ``osd recovery threads`` setting limits the
|
||||
number of threads for the recovery process (1 thread by default). The ``osd
|
||||
starting the recovery process. The ``osd
|
||||
recovery thread timeout`` sets a thread timeout, because multiple OSDs may fail,
|
||||
restart and re-peer at staggered rates. The ``osd recovery max active`` setting
|
||||
limits the number of recovery requests an OSD will entertain simultaneously to
|
||||
@ -497,8 +496,9 @@ placement group can't be backfilled, it may be considered ``incomplete``.
|
||||
Ceph provides a number of settings to manage the load spike associated with
|
||||
reassigning placement groups to an OSD (especially a new OSD). By default,
|
||||
``osd_max_backfills`` sets the maximum number of concurrent backfills to or from
|
||||
an OSD to 10. The ``osd backfill full ratio`` enables an OSD to refuse a
|
||||
backfill request if the OSD is approaching its full ratio (85%, by default).
|
||||
an OSD to 10. The ``backfill full ratio`` enables an OSD to refuse a
|
||||
backfill request if the OSD is approaching its full ratio (90%, by default) and
|
||||
change with ``ceph osd set-backfillfull-ratio`` comand.
|
||||
If an OSD refuses a backfill request, the ``osd backfill retry interval``
|
||||
enables an OSD to retry the request (after 10 seconds, by default). OSDs can
|
||||
also set ``osd backfill scan min`` and ``osd backfill scan max`` to manage scan
|
||||
|
@ -206,7 +206,9 @@ Ceph prevents you from writing to a full OSD so that you don't lose data.
|
||||
In an operational cluster, you should receive a warning when your cluster
|
||||
is getting near its full ratio. The ``mon osd full ratio`` defaults to
|
||||
``0.95``, or 95% of capacity before it stops clients from writing data.
|
||||
The ``mon osd nearfull ratio`` defaults to ``0.85``, or 85% of capacity
|
||||
The ``mon osd backfillfull ratio`` defaults to ``0.90``, or 90 % of
|
||||
capacity when it blocks backfills from starting. The
|
||||
``mon osd nearfull ratio`` defaults to ``0.85``, or 85% of capacity
|
||||
when it generates a health warning.
|
||||
|
||||
Full cluster issues usually arise when testing how Ceph handles an OSD
|
||||
@ -214,20 +216,21 @@ failure on a small cluster. When one node has a high percentage of the
|
||||
cluster's data, the cluster can easily eclipse its nearfull and full ratio
|
||||
immediately. If you are testing how Ceph reacts to OSD failures on a small
|
||||
cluster, you should leave ample free disk space and consider temporarily
|
||||
lowering the ``mon osd full ratio`` and ``mon osd nearfull ratio``.
|
||||
lowering the ``mon osd full ratio``, ``mon osd backfillfull ratio`` and
|
||||
``mon osd nearfull ratio``.
|
||||
|
||||
Full ``ceph-osds`` will be reported by ``ceph health``::
|
||||
|
||||
ceph health
|
||||
HEALTH_WARN 1 nearfull osds
|
||||
osd.2 is near full at 85%
|
||||
HEALTH_WARN 1 nearfull osd(s)
|
||||
|
||||
Or::
|
||||
|
||||
ceph health
|
||||
HEALTH_ERR 1 nearfull osds, 1 full osds
|
||||
osd.2 is near full at 85%
|
||||
ceph health detail
|
||||
HEALTH_ERR 1 full osd(s); 1 backfillfull osd(s); 1 nearfull osd(s)
|
||||
osd.3 is full at 97%
|
||||
osd.4 is backfill full at 91%
|
||||
osd.2 is near full at 87%
|
||||
|
||||
The best way to deal with a full cluster is to add new ``ceph-osds``, allowing
|
||||
the cluster to redistribute data to the newly available storage.
|
||||
|
@ -696,7 +696,7 @@ class Thrasher:
|
||||
"""
|
||||
Test backfills stopping when the replica fills up.
|
||||
|
||||
First, use osd_backfill_full_ratio to simulate a now full
|
||||
First, use injectfull admin command to simulate a now full
|
||||
osd by setting it to 0 on all of the OSDs.
|
||||
|
||||
Second, on a random subset, set
|
||||
@ -705,13 +705,14 @@ class Thrasher:
|
||||
|
||||
Then, verify that all backfills stop.
|
||||
"""
|
||||
self.log("injecting osd_backfill_full_ratio = 0")
|
||||
self.log("injecting backfill full")
|
||||
for i in self.live_osds:
|
||||
self.ceph_manager.set_config(
|
||||
i,
|
||||
osd_debug_skip_full_check_in_backfill_reservation=
|
||||
random.choice(['false', 'true']),
|
||||
osd_backfill_full_ratio=0)
|
||||
random.choice(['false', 'true']))
|
||||
self.ceph_manager.osd_admin_socket(i, command=['injectfull', 'backfillfull'],
|
||||
check_status=True, timeout=30, stdout=DEVNULL)
|
||||
for i in range(30):
|
||||
status = self.ceph_manager.compile_pg_status()
|
||||
if 'backfill' not in status.keys():
|
||||
@ -724,8 +725,9 @@ class Thrasher:
|
||||
for i in self.live_osds:
|
||||
self.ceph_manager.set_config(
|
||||
i,
|
||||
osd_debug_skip_full_check_in_backfill_reservation='false',
|
||||
osd_backfill_full_ratio=0.85)
|
||||
osd_debug_skip_full_check_in_backfill_reservation='false')
|
||||
self.ceph_manager.osd_admin_socket(i, command=['injectfull', 'none'],
|
||||
check_status=True, timeout=30, stdout=DEVNULL)
|
||||
|
||||
def test_map_discontinuity(self):
|
||||
"""
|
||||
|
@ -400,6 +400,7 @@ EOF
|
||||
if test -z "$(get_config mon $id mon_initial_members)" ; then
|
||||
ceph osd pool delete rbd rbd --yes-i-really-really-mean-it || return 1
|
||||
ceph osd pool create rbd $PG_NUM || return 1
|
||||
ceph osd set-backfillfull-ratio .99
|
||||
fi
|
||||
}
|
||||
|
||||
@ -634,7 +635,6 @@ function activate_osd() {
|
||||
ceph_disk_args+=" --prepend-to-path="
|
||||
|
||||
local ceph_args="$CEPH_ARGS"
|
||||
ceph_args+=" --osd-backfill-full-ratio=.99"
|
||||
ceph_args+=" --osd-failsafe-full-ratio=.99"
|
||||
ceph_args+=" --osd-journal-size=100"
|
||||
ceph_args+=" --osd-scrub-load-threshold=2000"
|
||||
|
@ -1419,9 +1419,44 @@ function test_mon_pg()
|
||||
|
||||
ceph osd set-full-ratio .962
|
||||
ceph osd dump | grep '^full_ratio 0.962'
|
||||
ceph osd set-backfillfull-ratio .912
|
||||
ceph osd dump | grep '^backfillfull_ratio 0.912'
|
||||
ceph osd set-nearfull-ratio .892
|
||||
ceph osd dump | grep '^nearfull_ratio 0.892'
|
||||
|
||||
# Check health status
|
||||
ceph osd set-nearfull-ratio .913
|
||||
ceph health | grep 'HEALTH_ERR Full ratio(s) out of order'
|
||||
ceph health detail | grep 'backfill_ratio (0.912) < nearfull_ratio (0.913), increased'
|
||||
ceph osd set-nearfull-ratio .892
|
||||
ceph osd set-backfillfull-ratio .963
|
||||
ceph health detail | grep 'full_ratio (0.962) < backfillfull_ratio (0.963), increased'
|
||||
ceph osd set-backfillfull-ratio .912
|
||||
|
||||
# Check injected full results
|
||||
WAITFORFULL=10
|
||||
ceph --admin-daemon $CEPH_OUT_DIR/osd.0.asok injectfull nearfull
|
||||
sleep $WAITFORFULL
|
||||
ceph health | grep "HEALTH_WARN.*1 nearfull osd(s)"
|
||||
ceph --admin-daemon $CEPH_OUT_DIR/osd.1.asok injectfull backfillfull
|
||||
sleep $WAITFORFULL
|
||||
ceph health | grep "HEALTH_WARN.*1 backfillfull osd(s)"
|
||||
ceph --admin-daemon $CEPH_OUT_DIR/osd.2.asok injectfull failsafe
|
||||
sleep $WAITFORFULL
|
||||
# failsafe and full are the same as far as the monitor is concerned
|
||||
ceph health | grep "HEALTH_ERR.*1 full osd(s)"
|
||||
ceph --admin-daemon $CEPH_OUT_DIR/osd.0.asok injectfull full
|
||||
sleep $WAITFORFULL
|
||||
ceph health | grep "HEALTH_ERR.*2 full osd(s)"
|
||||
ceph health detail | grep "osd.0 is full at.*%"
|
||||
ceph health detail | grep "osd.2 is full at.*%"
|
||||
ceph health detail | grep "osd.1 is backfill full at.*%"
|
||||
ceph --admin-daemon $CEPH_OUT_DIR/osd.0.asok injectfull none
|
||||
ceph --admin-daemon $CEPH_OUT_DIR/osd.1.asok injectfull none
|
||||
ceph --admin-daemon $CEPH_OUT_DIR/osd.2.asok injectfull none
|
||||
sleep $WAITFORFULL
|
||||
ceph health | grep HEALTH_OK
|
||||
|
||||
ceph pg stat | grep 'pgs:'
|
||||
ceph pg 0.0 query
|
||||
ceph tell 0.0 query
|
||||
|
@ -359,10 +359,14 @@ if __name__ == '__main__':
|
||||
r = expect('osd/dump', 'GET', 200, 'json', JSONHDR)
|
||||
assert(float(r.myjson['output']['full_ratio']) == 0.90)
|
||||
expect('osd/set-full-ratio?ratio=0.95', 'PUT', 200, '')
|
||||
expect('osd/set-backfillfull-ratio?ratio=0.88', 'PUT', 200, '')
|
||||
r = expect('osd/dump', 'GET', 200, 'json', JSONHDR)
|
||||
assert(float(r.myjson['output']['backfillfull_ratio']) == 0.88)
|
||||
expect('osd/set-backfillfull-ratio?ratio=0.90', 'PUT', 200, '')
|
||||
expect('osd/set-nearfull-ratio?ratio=0.90', 'PUT', 200, '')
|
||||
r = expect('osd/dump', 'GET', 200, 'json', JSONHDR)
|
||||
assert(float(r.myjson['output']['nearfull_ratio']) == 0.90)
|
||||
expect('osd/set-full-ratio?ratio=0.85', 'PUT', 200, '')
|
||||
expect('osd/set-nearfull-ratio?ratio=0.85', 'PUT', 200, '')
|
||||
|
||||
r = expect('pg/stat', 'GET', 200, 'json', JSONHDR)
|
||||
assert('num_pgs' in r.myjson['output'])
|
||||
|
@ -42,6 +42,8 @@ const char *ceph_osd_state_name(int s)
|
||||
return "full";
|
||||
case CEPH_OSD_NEARFULL:
|
||||
return "nearfull";
|
||||
case CEPH_OSD_BACKFILLFULL:
|
||||
return "backfillfull";
|
||||
default:
|
||||
return "???";
|
||||
}
|
||||
|
@ -308,6 +308,7 @@ OPTION(mon_pg_warn_min_pool_objects, OPT_INT, 1000) // do not warn on pools bel
|
||||
OPTION(mon_pg_check_down_all_threshold, OPT_FLOAT, .5) // threshold of down osds after which we check all pgs
|
||||
OPTION(mon_cache_target_full_warn_ratio, OPT_FLOAT, .66) // position between pool cache_target_full and max where we start warning
|
||||
OPTION(mon_osd_full_ratio, OPT_FLOAT, .95) // what % full makes an OSD "full"
|
||||
OPTION(mon_osd_backfillfull_ratio, OPT_FLOAT, .90) // what % full makes an OSD backfill full (backfill halted)
|
||||
OPTION(mon_osd_nearfull_ratio, OPT_FLOAT, .85) // what % full makes an OSD near full
|
||||
OPTION(mon_allow_pool_delete, OPT_BOOL, false) // allow pool deletion
|
||||
OPTION(mon_globalid_prealloc, OPT_U32, 10000) // how many globalids to prealloc
|
||||
@ -626,11 +627,11 @@ OPTION(osd_max_backfills, OPT_U64, 1)
|
||||
// Minimum recovery priority (255 = max, smaller = lower)
|
||||
OPTION(osd_min_recovery_priority, OPT_INT, 0)
|
||||
|
||||
// Refuse backfills when OSD full ratio is above this value
|
||||
OPTION(osd_backfill_full_ratio, OPT_FLOAT, 0.85)
|
||||
|
||||
// Seconds to wait before retrying refused backfills
|
||||
OPTION(osd_backfill_retry_interval, OPT_DOUBLE, 10.0)
|
||||
OPTION(osd_backfill_retry_interval, OPT_DOUBLE, 30.0)
|
||||
|
||||
// Seconds to wait before retrying refused recovery
|
||||
OPTION(osd_recovery_retry_interval, OPT_DOUBLE, 30.0)
|
||||
|
||||
// max agent flush ops
|
||||
OPTION(osd_agent_max_ops, OPT_INT, 4)
|
||||
@ -742,7 +743,6 @@ OPTION(osd_op_pq_min_cost, OPT_U64, 65536)
|
||||
OPTION(osd_disk_threads, OPT_INT, 1)
|
||||
OPTION(osd_disk_thread_ioprio_class, OPT_STR, "") // rt realtime be best effort idle
|
||||
OPTION(osd_disk_thread_ioprio_priority, OPT_INT, -1) // 0-7
|
||||
OPTION(osd_recovery_threads, OPT_INT, 1)
|
||||
OPTION(osd_recover_clone_overlap, OPT_BOOL, true) // preserve clone_overlap during recovery/migration
|
||||
OPTION(osd_op_num_threads_per_shard, OPT_INT, 2)
|
||||
OPTION(osd_op_num_shards, OPT_INT, 5)
|
||||
@ -871,6 +871,7 @@ OPTION(osd_debug_skip_full_check_in_backfill_reservation, OPT_BOOL, false)
|
||||
OPTION(osd_debug_reject_backfill_probability, OPT_DOUBLE, 0)
|
||||
OPTION(osd_debug_inject_copyfrom_error, OPT_BOOL, false) // inject failure during copyfrom completion
|
||||
OPTION(osd_debug_misdirected_ops, OPT_BOOL, false)
|
||||
OPTION(osd_debug_skip_full_check_in_recovery, OPT_BOOL, false)
|
||||
OPTION(osd_enxio_on_misdirected_op, OPT_BOOL, false)
|
||||
OPTION(osd_debug_verify_cached_snaps, OPT_BOOL, false)
|
||||
OPTION(osd_enable_op_tracker, OPT_BOOL, true) // enable/disable OSD op tracking
|
||||
|
@ -116,6 +116,7 @@ struct ceph_eversion {
|
||||
#define CEPH_OSD_NEW (1<<3) /* osd is new, never marked in */
|
||||
#define CEPH_OSD_FULL (1<<4) /* osd is at or above full threshold */
|
||||
#define CEPH_OSD_NEARFULL (1<<5) /* osd is at or above nearfull threshold */
|
||||
#define CEPH_OSD_BACKFILLFULL (1<<6) /* osd is at or above backfillfull threshold */
|
||||
|
||||
extern const char *ceph_osd_state_name(int s);
|
||||
|
||||
|
@ -592,6 +592,10 @@ COMMAND("osd set-full-ratio " \
|
||||
"name=ratio,type=CephFloat,range=0.0|1.0", \
|
||||
"set usage ratio at which OSDs are marked full",
|
||||
"osd", "rw", "cli,rest")
|
||||
COMMAND("osd set-backfillfull-ratio " \
|
||||
"name=ratio,type=CephFloat,range=0.0|1.0", \
|
||||
"set usage ratio at which OSDs are marked too full to backfill",
|
||||
"osd", "rw", "cli,rest")
|
||||
COMMAND("osd set-nearfull-ratio " \
|
||||
"name=ratio,type=CephFloat,range=0.0|1.0", \
|
||||
"set usage ratio at which OSDs are marked near-full",
|
||||
|
@ -164,7 +164,11 @@ void OSDMonitor::create_initial()
|
||||
if (!g_conf->mon_debug_no_require_luminous) {
|
||||
newmap.set_flag(CEPH_OSDMAP_REQUIRE_LUMINOUS);
|
||||
newmap.full_ratio = g_conf->mon_osd_full_ratio;
|
||||
if (newmap.full_ratio > 1.0) newmap.full_ratio /= 100;
|
||||
newmap.backfillfull_ratio = g_conf->mon_osd_backfillfull_ratio;
|
||||
if (newmap.backfillfull_ratio > 1.0) newmap.backfillfull_ratio /= 100;
|
||||
newmap.nearfull_ratio = g_conf->mon_osd_nearfull_ratio;
|
||||
if (newmap.nearfull_ratio > 1.0) newmap.nearfull_ratio /= 100;
|
||||
}
|
||||
|
||||
// encode into pending incremental
|
||||
@ -784,8 +788,17 @@ void OSDMonitor::create_pending()
|
||||
OSDMap::clean_temps(g_ceph_context, osdmap, &pending_inc);
|
||||
dout(10) << "create_pending did clean_temps" << dendl;
|
||||
|
||||
// On upgrade OSDMap has new field set by mon_osd_backfillfull_ratio config
|
||||
// instead of osd_backfill_full_ratio config
|
||||
if (osdmap.backfillfull_ratio <= 0) {
|
||||
pending_inc.new_backfillfull_ratio = g_conf->mon_osd_backfillfull_ratio;
|
||||
if (pending_inc.new_backfillfull_ratio > 1.0)
|
||||
pending_inc.new_backfillfull_ratio /= 100;
|
||||
dout(1) << __func__ << " setting backfillfull_ratio = "
|
||||
<< pending_inc.new_backfillfull_ratio << dendl;
|
||||
}
|
||||
if (!osdmap.test_flag(CEPH_OSDMAP_REQUIRE_LUMINOUS)) {
|
||||
// transition nearfull ratios from PGMap to OSDMap (on upgrade)
|
||||
// transition full ratios from PGMap to OSDMap (on upgrade)
|
||||
PGMap *pg_map = &mon->pgmon()->pg_map;
|
||||
if (osdmap.full_ratio != pg_map->full_ratio) {
|
||||
dout(10) << __func__ << " full_ratio " << osdmap.full_ratio
|
||||
@ -800,14 +813,18 @@ void OSDMonitor::create_pending()
|
||||
} else {
|
||||
// safety check (this shouldn't really happen)
|
||||
if (osdmap.full_ratio <= 0) {
|
||||
dout(1) << __func__ << " setting full_ratio = "
|
||||
<< g_conf->mon_osd_full_ratio << dendl;
|
||||
pending_inc.new_full_ratio = g_conf->mon_osd_full_ratio;
|
||||
if (pending_inc.new_full_ratio > 1.0)
|
||||
pending_inc.new_full_ratio /= 100;
|
||||
dout(1) << __func__ << " setting full_ratio = "
|
||||
<< pending_inc.new_full_ratio << dendl;
|
||||
}
|
||||
if (osdmap.nearfull_ratio <= 0) {
|
||||
dout(1) << __func__ << " setting nearfull_ratio = "
|
||||
<< g_conf->mon_osd_nearfull_ratio << dendl;
|
||||
pending_inc.new_nearfull_ratio = g_conf->mon_osd_nearfull_ratio;
|
||||
if (pending_inc.new_nearfull_ratio > 1.0)
|
||||
pending_inc.new_nearfull_ratio /= 100;
|
||||
dout(1) << __func__ << " setting nearfull_ratio = "
|
||||
<< pending_inc.new_nearfull_ratio << dendl;
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -1048,8 +1065,8 @@ void OSDMonitor::encode_pending(MonitorDBStore::TransactionRef t)
|
||||
tmp.apply_incremental(pending_inc);
|
||||
|
||||
if (tmp.test_flag(CEPH_OSDMAP_REQUIRE_LUMINOUS)) {
|
||||
int full, nearfull;
|
||||
tmp.count_full_nearfull_osds(&full, &nearfull);
|
||||
int full, backfill, nearfull;
|
||||
tmp.count_full_nearfull_osds(&full, &backfill, &nearfull);
|
||||
if (full > 0) {
|
||||
if (!tmp.test_flag(CEPH_OSDMAP_FULL)) {
|
||||
dout(10) << __func__ << " setting full flag" << dendl;
|
||||
@ -2287,7 +2304,7 @@ bool OSDMonitor::preprocess_full(MonOpRequestRef op)
|
||||
MOSDFull *m = static_cast<MOSDFull*>(op->get_req());
|
||||
int from = m->get_orig_source().num();
|
||||
set<string> state;
|
||||
unsigned mask = CEPH_OSD_NEARFULL | CEPH_OSD_FULL;
|
||||
unsigned mask = CEPH_OSD_NEARFULL | CEPH_OSD_BACKFILLFULL | CEPH_OSD_FULL;
|
||||
|
||||
// check permissions, ignore if failed
|
||||
MonSession *session = m->get_session();
|
||||
@ -2337,7 +2354,7 @@ bool OSDMonitor::prepare_full(MonOpRequestRef op)
|
||||
const MOSDFull *m = static_cast<MOSDFull*>(op->get_req());
|
||||
const int from = m->get_orig_source().num();
|
||||
|
||||
const unsigned mask = CEPH_OSD_NEARFULL | CEPH_OSD_FULL;
|
||||
const unsigned mask = CEPH_OSD_NEARFULL | CEPH_OSD_BACKFILLFULL | CEPH_OSD_FULL;
|
||||
const unsigned want_state = m->state & mask; // safety first
|
||||
|
||||
unsigned cur_state = osdmap.get_state(from);
|
||||
@ -3342,18 +3359,83 @@ void OSDMonitor::get_health(list<pair<health_status_t,string> >& summary,
|
||||
}
|
||||
|
||||
if (osdmap.test_flag(CEPH_OSDMAP_REQUIRE_LUMINOUS)) {
|
||||
int full, nearfull;
|
||||
osdmap.count_full_nearfull_osds(&full, &nearfull);
|
||||
if (full > 0) {
|
||||
// An osd could configure failsafe ratio, to something different
|
||||
// but for now assume it is the same here.
|
||||
float fsr = g_conf->osd_failsafe_full_ratio;
|
||||
if (fsr > 1.0) fsr /= 100;
|
||||
float fr = osdmap.get_full_ratio();
|
||||
float br = osdmap.get_backfillfull_ratio();
|
||||
float nr = osdmap.get_nearfull_ratio();
|
||||
|
||||
bool out_of_order = false;
|
||||
// These checks correspond to how OSDService::check_full_status() in an OSD
|
||||
// handles the improper setting of these values.
|
||||
if (br < nr) {
|
||||
out_of_order = true;
|
||||
if (detail) {
|
||||
ostringstream ss;
|
||||
ss << "backfill_ratio (" << br << ") < nearfull_ratio (" << nr << "), increased";
|
||||
detail->push_back(make_pair(HEALTH_ERR, ss.str()));
|
||||
}
|
||||
br = nr;
|
||||
}
|
||||
if (fr < br) {
|
||||
out_of_order = true;
|
||||
if (detail) {
|
||||
ostringstream ss;
|
||||
ss << "full_ratio (" << fr << ") < backfillfull_ratio (" << br << "), increased";
|
||||
detail->push_back(make_pair(HEALTH_ERR, ss.str()));
|
||||
}
|
||||
fr = br;
|
||||
}
|
||||
if (fsr < fr) {
|
||||
out_of_order = true;
|
||||
if (detail) {
|
||||
ostringstream ss;
|
||||
ss << "osd_failsafe_full_ratio (" << fsr << ") < full_ratio (" << fr << "), increased";
|
||||
detail->push_back(make_pair(HEALTH_ERR, ss.str()));
|
||||
}
|
||||
}
|
||||
if (out_of_order) {
|
||||
ostringstream ss;
|
||||
ss << full << " full osd(s)";
|
||||
ss << "Full ratio(s) out of order";
|
||||
summary.push_back(make_pair(HEALTH_ERR, ss.str()));
|
||||
}
|
||||
if (nearfull > 0) {
|
||||
|
||||
map<int, float> full, backfillfull, nearfull;
|
||||
osdmap.get_full_osd_util(mon->pgmon()->pg_map.osd_stat, &full, &backfillfull, &nearfull);
|
||||
if (full.size()) {
|
||||
ostringstream ss;
|
||||
ss << nearfull << " nearfull osd(s)";
|
||||
ss << full.size() << " full osd(s)";
|
||||
summary.push_back(make_pair(HEALTH_ERR, ss.str()));
|
||||
}
|
||||
if (backfillfull.size()) {
|
||||
ostringstream ss;
|
||||
ss << backfillfull.size() << " backfillfull osd(s)";
|
||||
summary.push_back(make_pair(HEALTH_WARN, ss.str()));
|
||||
}
|
||||
if (nearfull.size()) {
|
||||
ostringstream ss;
|
||||
ss << nearfull.size() << " nearfull osd(s)";
|
||||
summary.push_back(make_pair(HEALTH_WARN, ss.str()));
|
||||
}
|
||||
if (detail) {
|
||||
for (auto& i: full) {
|
||||
ostringstream ss;
|
||||
ss << "osd." << i.first << " is full at " << roundf(i.second * 100) << "%";
|
||||
detail->push_back(make_pair(HEALTH_ERR, ss.str()));
|
||||
}
|
||||
for (auto& i: backfillfull) {
|
||||
ostringstream ss;
|
||||
ss << "osd." << i.first << " is backfill full at " << roundf(i.second * 100) << "%";
|
||||
detail->push_back(make_pair(HEALTH_WARN, ss.str()));
|
||||
}
|
||||
for (auto& i: nearfull) {
|
||||
ostringstream ss;
|
||||
ss << "osd." << i.first << " is near full at " << roundf(i.second * 100) << "%";
|
||||
detail->push_back(make_pair(HEALTH_WARN, ss.str()));
|
||||
}
|
||||
}
|
||||
}
|
||||
// note: we leave it to ceph-mgr to generate details health warnings
|
||||
// with actual osd utilizations
|
||||
@ -6929,6 +7011,7 @@ bool OSDMonitor::prepare_command_impl(MonOpRequestRef op,
|
||||
return true;
|
||||
|
||||
} else if (prefix == "osd set-full-ratio" ||
|
||||
prefix == "osd set-backfillfull-ratio" ||
|
||||
prefix == "osd set-nearfull-ratio") {
|
||||
if (!osdmap.test_flag(CEPH_OSDMAP_REQUIRE_LUMINOUS)) {
|
||||
ss << "you must complete the upgrade and set require_luminous_osds before"
|
||||
@ -6945,6 +7028,8 @@ bool OSDMonitor::prepare_command_impl(MonOpRequestRef op,
|
||||
}
|
||||
if (prefix == "osd set-full-ratio")
|
||||
pending_inc.new_full_ratio = n;
|
||||
else if (prefix == "osd set-backfillfull-ratio")
|
||||
pending_inc.new_backfillfull_ratio = n;
|
||||
else if (prefix == "osd set-nearfull-ratio")
|
||||
pending_inc.new_nearfull_ratio = n;
|
||||
ss << prefix << " " << n;
|
||||
|
@ -1878,6 +1878,17 @@ int64_t PGMap::get_rule_avail(const OSDMap& osdmap, int ruleno) const
|
||||
return 0;
|
||||
}
|
||||
|
||||
float fratio;
|
||||
if (osdmap.test_flag(CEPH_OSDMAP_REQUIRE_LUMINOUS) && osdmap.get_full_ratio() > 0) {
|
||||
fratio = osdmap.get_full_ratio();
|
||||
} else if (full_ratio > 0) {
|
||||
fratio = full_ratio;
|
||||
} else {
|
||||
// this shouldn't really happen
|
||||
fratio = g_conf->mon_osd_full_ratio;
|
||||
if (fratio > 1.0) fratio /= 100;
|
||||
}
|
||||
|
||||
int64_t min = -1;
|
||||
for (map<int,float>::iterator p = wm.begin(); p != wm.end(); ++p) {
|
||||
ceph::unordered_map<int32_t,osd_stat_t>::const_iterator osd_info =
|
||||
@ -1892,7 +1903,7 @@ int64_t PGMap::get_rule_avail(const OSDMap& osdmap, int ruleno) const
|
||||
continue;
|
||||
}
|
||||
double unusable = (double)osd_info->second.kb *
|
||||
(1.0 - g_conf->mon_osd_full_ratio);
|
||||
(1.0 - fratio);
|
||||
double avail = MAX(0.0, (double)osd_info->second.kb_avail - unusable);
|
||||
avail *= 1024.0;
|
||||
int64_t proj = (int64_t)(avail / (double)p->second);
|
||||
|
@ -1316,6 +1316,8 @@ void PGMonitor::get_health(list<pair<health_status_t,string> >& summary,
|
||||
note["backfilling"] += p->second;
|
||||
if (p->first & PG_STATE_BACKFILL_TOOFULL)
|
||||
note["backfill_toofull"] += p->second;
|
||||
if (p->first & PG_STATE_RECOVERY_TOOFULL)
|
||||
note["recovery_toofull"] += p->second;
|
||||
}
|
||||
|
||||
ceph::unordered_map<pg_t, pg_stat_t> stuck_pgs;
|
||||
@ -1403,6 +1405,7 @@ void PGMonitor::get_health(list<pair<health_status_t,string> >& summary,
|
||||
PG_STATE_REPAIR |
|
||||
PG_STATE_RECOVERING |
|
||||
PG_STATE_RECOVERY_WAIT |
|
||||
PG_STATE_RECOVERY_TOOFULL |
|
||||
PG_STATE_INCOMPLETE |
|
||||
PG_STATE_BACKFILL_WAIT |
|
||||
PG_STATE_BACKFILL |
|
||||
|
@ -3952,7 +3952,7 @@ void FileStore::sync_entry()
|
||||
derr << "ioctl WAIT_SYNC got " << cpp_strerror(err) << dendl;
|
||||
assert(0 == "wait_sync got error");
|
||||
}
|
||||
dout(20) << " done waiting for checkpoint" << cid << " to complete" << dendl;
|
||||
dout(20) << " done waiting for checkpoint " << cid << " to complete" << dendl;
|
||||
}
|
||||
} else
|
||||
{
|
||||
|
@ -282,6 +282,11 @@ void ECBackend::handle_recovery_push(
|
||||
const PushOp &op,
|
||||
RecoveryMessages *m)
|
||||
{
|
||||
ostringstream ss;
|
||||
if (get_parent()->check_failsafe_full(ss)) {
|
||||
dout(10) << __func__ << " Out of space (failsafe) processing push request: " << ss.str() << dendl;
|
||||
ceph_abort();
|
||||
}
|
||||
|
||||
bool oneshot = op.before_progress.first && op.after_progress.data_complete;
|
||||
ghobject_t tobj;
|
||||
|
146
src/osd/OSD.cc
146
src/osd/OSD.cc
@ -255,8 +255,8 @@ OSDService::OSDService(OSD *osd) :
|
||||
watch_lock("OSDService::watch_lock"),
|
||||
watch_timer(osd->client_messenger->cct, watch_lock),
|
||||
next_notif_id(0),
|
||||
backfill_request_lock("OSDService::backfill_request_lock"),
|
||||
backfill_request_timer(cct, backfill_request_lock, false),
|
||||
recovery_request_lock("OSDService::recovery_request_lock"),
|
||||
recovery_request_timer(cct, recovery_request_lock, false),
|
||||
reserver_finisher(cct),
|
||||
local_reserver(&reserver_finisher, cct->_conf->osd_max_backfills,
|
||||
cct->_conf->osd_min_recovery_priority),
|
||||
@ -495,8 +495,8 @@ void OSDService::shutdown()
|
||||
objecter_finisher.stop();
|
||||
|
||||
{
|
||||
Mutex::Locker l(backfill_request_lock);
|
||||
backfill_request_timer.shutdown();
|
||||
Mutex::Locker l(recovery_request_lock);
|
||||
recovery_request_timer.shutdown();
|
||||
}
|
||||
|
||||
{
|
||||
@ -716,13 +716,7 @@ void OSDService::check_full_status(const osd_stat_t &osd_stat)
|
||||
{
|
||||
Mutex::Locker l(full_status_lock);
|
||||
|
||||
// We base ratio on kb_avail rather than kb_used because they can
|
||||
// differ significantly e.g. on btrfs volumes with a large number of
|
||||
// chunks reserved for metadata, and for our purposes (avoiding
|
||||
// completely filling the disk) it's far more important to know how
|
||||
// much space is available to use than how much we've already used.
|
||||
float ratio = ((float)(osd_stat.kb - osd_stat.kb_avail)) /
|
||||
((float)osd_stat.kb);
|
||||
float ratio = ((float)osd_stat.kb_used) / ((float)osd_stat.kb);
|
||||
cur_ratio = ratio;
|
||||
|
||||
// The OSDMap ratios take precendence. So if the failsafe is .95 and
|
||||
@ -735,28 +729,38 @@ void OSDService::check_full_status(const osd_stat_t &osd_stat)
|
||||
return;
|
||||
}
|
||||
float nearfull_ratio = osdmap->get_nearfull_ratio();
|
||||
float full_ratio = std::max(osdmap->get_full_ratio(), nearfull_ratio);
|
||||
float backfillfull_ratio = std::max(osdmap->get_backfillfull_ratio(), nearfull_ratio);
|
||||
float full_ratio = std::max(osdmap->get_full_ratio(), backfillfull_ratio);
|
||||
float failsafe_ratio = std::max(get_failsafe_full_ratio(), full_ratio);
|
||||
|
||||
if (!osdmap->test_flag(CEPH_OSDMAP_REQUIRE_LUMINOUS)) {
|
||||
// use the failsafe for nearfull and full; the mon isn't using the
|
||||
// flags anyway because we're mid-upgrade.
|
||||
full_ratio = failsafe_ratio;
|
||||
backfillfull_ratio = failsafe_ratio;
|
||||
nearfull_ratio = failsafe_ratio;
|
||||
} else if (full_ratio <= 0 ||
|
||||
backfillfull_ratio <= 0 ||
|
||||
nearfull_ratio <= 0) {
|
||||
derr << __func__ << " full_ratio or nearfull_ratio is <= 0" << dendl;
|
||||
derr << __func__ << " full_ratio, backfillfull_ratio or nearfull_ratio is <= 0" << dendl;
|
||||
// use failsafe flag. ick. the monitor did something wrong or the user
|
||||
// did something stupid.
|
||||
full_ratio = failsafe_ratio;
|
||||
backfillfull_ratio = failsafe_ratio;
|
||||
nearfull_ratio = failsafe_ratio;
|
||||
}
|
||||
|
||||
enum s_names new_state;
|
||||
if (ratio > failsafe_ratio) {
|
||||
string inject;
|
||||
s_names new_state;
|
||||
if (injectfull_state > NONE && injectfull) {
|
||||
new_state = injectfull_state;
|
||||
inject = "(Injected)";
|
||||
} else if (ratio > failsafe_ratio) {
|
||||
new_state = FAILSAFE;
|
||||
} else if (ratio > full_ratio) {
|
||||
new_state = FULL;
|
||||
} else if (ratio > backfillfull_ratio) {
|
||||
new_state = BACKFILLFULL;
|
||||
} else if (ratio > nearfull_ratio) {
|
||||
new_state = NEARFULL;
|
||||
} else {
|
||||
@ -764,9 +768,11 @@ void OSDService::check_full_status(const osd_stat_t &osd_stat)
|
||||
}
|
||||
dout(20) << __func__ << " cur ratio " << ratio
|
||||
<< ". nearfull_ratio " << nearfull_ratio
|
||||
<< ". backfillfull_ratio " << backfillfull_ratio
|
||||
<< ", full_ratio " << full_ratio
|
||||
<< ", failsafe_ratio " << failsafe_ratio
|
||||
<< ", new state " << get_full_state_name(new_state)
|
||||
<< " " << inject
|
||||
<< dendl;
|
||||
|
||||
// warn
|
||||
@ -791,6 +797,8 @@ bool OSDService::need_fullness_update()
|
||||
if (osdmap->exists(whoami)) {
|
||||
if (osdmap->get_state(whoami) & CEPH_OSD_FULL) {
|
||||
cur = FULL;
|
||||
} else if (osdmap->get_state(whoami) & CEPH_OSD_BACKFILLFULL) {
|
||||
cur = BACKFILLFULL;
|
||||
} else if (osdmap->get_state(whoami) & CEPH_OSD_NEARFULL) {
|
||||
cur = NEARFULL;
|
||||
}
|
||||
@ -798,41 +806,80 @@ bool OSDService::need_fullness_update()
|
||||
s_names want = NONE;
|
||||
if (is_full())
|
||||
want = FULL;
|
||||
else if (is_backfillfull())
|
||||
want = BACKFILLFULL;
|
||||
else if (is_nearfull())
|
||||
want = NEARFULL;
|
||||
return want != cur;
|
||||
}
|
||||
|
||||
bool OSDService::check_failsafe_full()
|
||||
bool OSDService::_check_full(s_names type, ostream &ss) const
|
||||
{
|
||||
Mutex::Locker l(full_status_lock);
|
||||
if (cur_state == FAILSAFE)
|
||||
|
||||
if (injectfull && injectfull_state >= type) {
|
||||
// injectfull is either a count of the number of times to return failsafe full
|
||||
// or if -1 then always return full
|
||||
if (injectfull > 0)
|
||||
--injectfull;
|
||||
ss << "Injected " << get_full_state_name(type) << " OSD ("
|
||||
<< (injectfull < 0 ? "set" : std::to_string(injectfull)) << ")";
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
ss << "current usage is " << cur_ratio;
|
||||
return cur_state >= type;
|
||||
}
|
||||
|
||||
bool OSDService::is_nearfull()
|
||||
bool OSDService::check_failsafe_full(ostream &ss) const
|
||||
{
|
||||
return _check_full(FAILSAFE, ss);
|
||||
}
|
||||
|
||||
bool OSDService::check_full(ostream &ss) const
|
||||
{
|
||||
return _check_full(FULL, ss);
|
||||
}
|
||||
|
||||
bool OSDService::check_backfill_full(ostream &ss) const
|
||||
{
|
||||
return _check_full(BACKFILLFULL, ss);
|
||||
}
|
||||
|
||||
bool OSDService::check_nearfull(ostream &ss) const
|
||||
{
|
||||
return _check_full(NEARFULL, ss);
|
||||
}
|
||||
|
||||
bool OSDService::is_failsafe_full() const
|
||||
{
|
||||
Mutex::Locker l(full_status_lock);
|
||||
return cur_state == NEARFULL;
|
||||
return cur_state == FAILSAFE;
|
||||
}
|
||||
|
||||
bool OSDService::is_full()
|
||||
bool OSDService::is_full() const
|
||||
{
|
||||
Mutex::Locker l(full_status_lock);
|
||||
return cur_state >= FULL;
|
||||
}
|
||||
|
||||
bool OSDService::too_full_for_backfill(double *_ratio, double *_max_ratio)
|
||||
bool OSDService::is_backfillfull() const
|
||||
{
|
||||
Mutex::Locker l(full_status_lock);
|
||||
double max_ratio;
|
||||
max_ratio = cct->_conf->osd_backfill_full_ratio;
|
||||
if (_ratio)
|
||||
*_ratio = cur_ratio;
|
||||
if (_max_ratio)
|
||||
*_max_ratio = max_ratio;
|
||||
return cur_ratio >= max_ratio;
|
||||
return cur_state >= BACKFILLFULL;
|
||||
}
|
||||
|
||||
bool OSDService::is_nearfull() const
|
||||
{
|
||||
Mutex::Locker l(full_status_lock);
|
||||
return cur_state >= NEARFULL;
|
||||
}
|
||||
|
||||
void OSDService::set_injectfull(s_names type, int64_t count)
|
||||
{
|
||||
Mutex::Locker l(full_status_lock);
|
||||
injectfull_state = type;
|
||||
injectfull = count;
|
||||
}
|
||||
|
||||
void OSDService::update_osd_stat(vector<int>& hb_peers)
|
||||
@ -868,6 +915,16 @@ void OSDService::update_osd_stat(vector<int>& hb_peers)
|
||||
check_full_status(osd_stat);
|
||||
}
|
||||
|
||||
bool OSDService::check_osdmap_full(const set<pg_shard_t> &missing_on)
|
||||
{
|
||||
OSDMapRef osdmap = get_osdmap();
|
||||
for (auto shard : missing_on) {
|
||||
if (osdmap->get_state(shard.osd) & CEPH_OSD_FULL)
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
void OSDService::send_message_osd_cluster(int peer, Message *m, epoch_t from_epoch)
|
||||
{
|
||||
OSDMapRef next_map = get_nextmap_reserved();
|
||||
@ -2147,7 +2204,7 @@ int OSD::init()
|
||||
|
||||
tick_timer.init();
|
||||
tick_timer_without_osd_lock.init();
|
||||
service.backfill_request_timer.init();
|
||||
service.recovery_request_timer.init();
|
||||
|
||||
// mount.
|
||||
dout(2) << "mounting " << dev_path << " "
|
||||
@ -2632,6 +2689,14 @@ void OSD::final_init()
|
||||
test_ops_hook,
|
||||
"Trigger a scheduled scrub ");
|
||||
assert(r == 0);
|
||||
r = admin_socket->register_command(
|
||||
"injectfull",
|
||||
"injectfull " \
|
||||
"name=type,type=CephString,req=false " \
|
||||
"name=count,type=CephInt,req=false ",
|
||||
test_ops_hook,
|
||||
"Inject a full disk (optional count times)");
|
||||
assert(r == 0);
|
||||
}
|
||||
|
||||
void OSD::create_logger()
|
||||
@ -2839,6 +2904,7 @@ void OSD::create_recoverystate_perf()
|
||||
rs_perf.add_time_avg(rs_down_latency, "down_latency", "Down recovery state latency");
|
||||
rs_perf.add_time_avg(rs_getmissing_latency, "getmissing_latency", "Getmissing recovery state latency");
|
||||
rs_perf.add_time_avg(rs_waitupthru_latency, "waitupthru_latency", "Waitupthru recovery state latency");
|
||||
rs_perf.add_time_avg(rs_notrecovering_latency, "notrecovering_latency", "Notrecovering recovery state latency");
|
||||
|
||||
recoverystate_perf = rs_perf.create_perf_counters();
|
||||
cct->get_perfcounters_collection()->add(recoverystate_perf);
|
||||
@ -4854,6 +4920,24 @@ void TestOpsSocketHook::test_ops(OSDService *service, ObjectStore *store,
|
||||
pg->unlock();
|
||||
return;
|
||||
}
|
||||
if (command == "injectfull") {
|
||||
int64_t count;
|
||||
string type;
|
||||
OSDService::s_names state;
|
||||
cmd_getval(service->cct, cmdmap, "type", type, string("full"));
|
||||
cmd_getval(service->cct, cmdmap, "count", count, (int64_t)-1);
|
||||
if (type == "none" || count == 0) {
|
||||
type = "none";
|
||||
count = 0;
|
||||
}
|
||||
state = service->get_full_state(type);
|
||||
if (state == OSDService::s_names::INVALID) {
|
||||
ss << "Invalid type use (none, nearfull, backfillfull, full, failsafe)";
|
||||
return;
|
||||
}
|
||||
service->set_injectfull(state, count);
|
||||
return;
|
||||
}
|
||||
ss << "Internal error - command=" << command;
|
||||
}
|
||||
|
||||
@ -5185,6 +5269,8 @@ void OSD::send_full_update()
|
||||
unsigned state = 0;
|
||||
if (service.is_full()) {
|
||||
state = CEPH_OSD_FULL;
|
||||
} else if (service.is_backfillfull()) {
|
||||
state = CEPH_OSD_BACKFILLFULL;
|
||||
} else if (service.is_nearfull()) {
|
||||
state = CEPH_OSD_NEARFULL;
|
||||
}
|
||||
|
@ -202,6 +202,7 @@ enum {
|
||||
rs_down_latency,
|
||||
rs_getmissing_latency,
|
||||
rs_waitupthru_latency,
|
||||
rs_notrecovering_latency,
|
||||
rs_last,
|
||||
};
|
||||
|
||||
@ -917,9 +918,9 @@ public:
|
||||
return (((uint64_t)cur_epoch) << 32) | ((uint64_t)(next_notif_id++));
|
||||
}
|
||||
|
||||
// -- Backfill Request Scheduling --
|
||||
Mutex backfill_request_lock;
|
||||
SafeTimer backfill_request_timer;
|
||||
// -- Recovery/Backfill Request Scheduling --
|
||||
Mutex recovery_request_lock;
|
||||
SafeTimer recovery_request_timer;
|
||||
|
||||
// -- tids --
|
||||
// for ops i issue
|
||||
@ -1025,7 +1026,7 @@ public:
|
||||
Mutex::Locker l(recovery_lock);
|
||||
_maybe_queue_recovery();
|
||||
}
|
||||
void clear_queued_recovery(PG *pg, bool front = false) {
|
||||
void clear_queued_recovery(PG *pg) {
|
||||
Mutex::Locker l(recovery_lock);
|
||||
for (list<pair<epoch_t, PGRef> >::iterator i = awaiting_throttle.begin();
|
||||
i != awaiting_throttle.end();
|
||||
@ -1137,26 +1138,51 @@ public:
|
||||
|
||||
// -- OSD Full Status --
|
||||
private:
|
||||
Mutex full_status_lock;
|
||||
enum s_names { NONE, NEARFULL, FULL, FAILSAFE } cur_state; // ascending
|
||||
const char *get_full_state_name(s_names s) {
|
||||
friend TestOpsSocketHook;
|
||||
mutable Mutex full_status_lock;
|
||||
enum s_names { INVALID = -1, NONE, NEARFULL, BACKFILLFULL, FULL, FAILSAFE } cur_state; // ascending
|
||||
const char *get_full_state_name(s_names s) const {
|
||||
switch (s) {
|
||||
case NONE: return "none";
|
||||
case NEARFULL: return "nearfull";
|
||||
case BACKFILLFULL: return "backfillfull";
|
||||
case FULL: return "full";
|
||||
case FAILSAFE: return "failsafe";
|
||||
default: return "???";
|
||||
}
|
||||
}
|
||||
s_names get_full_state(string type) const {
|
||||
if (type == "none")
|
||||
return NONE;
|
||||
else if (type == "failsafe")
|
||||
return FAILSAFE;
|
||||
else if (type == "full")
|
||||
return FULL;
|
||||
else if (type == "backfillfull")
|
||||
return BACKFILLFULL;
|
||||
else if (type == "nearfull")
|
||||
return NEARFULL;
|
||||
else
|
||||
return INVALID;
|
||||
}
|
||||
double cur_ratio; ///< current utilization
|
||||
mutable int64_t injectfull = 0;
|
||||
s_names injectfull_state = NONE;
|
||||
float get_failsafe_full_ratio();
|
||||
void check_full_status(const osd_stat_t &stat);
|
||||
bool _check_full(s_names type, ostream &ss) const;
|
||||
public:
|
||||
bool check_failsafe_full();
|
||||
bool is_nearfull();
|
||||
bool is_full();
|
||||
bool too_full_for_backfill(double *ratio, double *max_ratio);
|
||||
bool check_failsafe_full(ostream &ss) const;
|
||||
bool check_full(ostream &ss) const;
|
||||
bool check_backfill_full(ostream &ss) const;
|
||||
bool check_nearfull(ostream &ss) const;
|
||||
bool is_failsafe_full() const;
|
||||
bool is_full() const;
|
||||
bool is_backfillfull() const;
|
||||
bool is_nearfull() const;
|
||||
bool need_fullness_update(); ///< osdmap state needs update
|
||||
void set_injectfull(s_names type, int64_t count);
|
||||
bool check_osdmap_full(const set<pg_shard_t> &missing_on);
|
||||
|
||||
|
||||
// -- epochs --
|
||||
|
@ -450,7 +450,7 @@ void OSDMap::Incremental::encode(bufferlist& bl, uint64_t features) const
|
||||
}
|
||||
|
||||
{
|
||||
uint8_t target_v = 3;
|
||||
uint8_t target_v = 4;
|
||||
if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
|
||||
target_v = 2;
|
||||
}
|
||||
@ -470,6 +470,7 @@ void OSDMap::Incremental::encode(bufferlist& bl, uint64_t features) const
|
||||
if (target_v >= 3) {
|
||||
::encode(new_nearfull_ratio, bl);
|
||||
::encode(new_full_ratio, bl);
|
||||
::encode(new_backfillfull_ratio, bl);
|
||||
}
|
||||
ENCODE_FINISH(bl); // osd-only data
|
||||
}
|
||||
@ -654,7 +655,7 @@ void OSDMap::Incremental::decode(bufferlist::iterator& bl)
|
||||
}
|
||||
|
||||
{
|
||||
DECODE_START(3, bl); // extended, osd-only data
|
||||
DECODE_START(4, bl); // extended, osd-only data
|
||||
::decode(new_hb_back_up, bl);
|
||||
::decode(new_up_thru, bl);
|
||||
::decode(new_last_clean_interval, bl);
|
||||
@ -677,6 +678,11 @@ void OSDMap::Incremental::decode(bufferlist::iterator& bl)
|
||||
new_nearfull_ratio = -1;
|
||||
new_full_ratio = -1;
|
||||
}
|
||||
if (struct_v >= 4) {
|
||||
::decode(new_backfillfull_ratio, bl);
|
||||
} else {
|
||||
new_backfillfull_ratio = -1;
|
||||
}
|
||||
DECODE_FINISH(bl); // osd-only data
|
||||
}
|
||||
|
||||
@ -720,6 +726,7 @@ void OSDMap::Incremental::dump(Formatter *f) const
|
||||
f->dump_int("new_flags", new_flags);
|
||||
f->dump_float("new_full_ratio", new_full_ratio);
|
||||
f->dump_float("new_nearfull_ratio", new_nearfull_ratio);
|
||||
f->dump_float("new_backfillfull_ratio", new_backfillfull_ratio);
|
||||
|
||||
if (fullmap.length()) {
|
||||
f->open_object_section("full_map");
|
||||
@ -1022,20 +1029,57 @@ int OSDMap::calc_num_osds()
|
||||
return num_osd;
|
||||
}
|
||||
|
||||
void OSDMap::count_full_nearfull_osds(int *full, int *nearfull) const
|
||||
void OSDMap::count_full_nearfull_osds(int *full, int *backfill, int *nearfull) const
|
||||
{
|
||||
*full = 0;
|
||||
*backfill = 0;
|
||||
*nearfull = 0;
|
||||
for (int i = 0; i < max_osd; ++i) {
|
||||
if (exists(i) && is_up(i) && is_in(i)) {
|
||||
if (osd_state[i] & CEPH_OSD_FULL)
|
||||
++(*full);
|
||||
else if (osd_state[i] & CEPH_OSD_BACKFILLFULL)
|
||||
++(*backfill);
|
||||
else if (osd_state[i] & CEPH_OSD_NEARFULL)
|
||||
++(*nearfull);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static bool get_osd_utilization(const ceph::unordered_map<int32_t,osd_stat_t> &osd_stat,
|
||||
int id, int64_t* kb, int64_t* kb_used, int64_t* kb_avail) {
|
||||
auto p = osd_stat.find(id);
|
||||
if (p == osd_stat.end())
|
||||
return false;
|
||||
*kb = p->second.kb;
|
||||
*kb_used = p->second.kb_used;
|
||||
*kb_avail = p->second.kb_avail;
|
||||
return *kb > 0;
|
||||
}
|
||||
|
||||
void OSDMap::get_full_osd_util(const ceph::unordered_map<int32_t,osd_stat_t> &osd_stat,
|
||||
map<int, float> *full, map<int, float> *backfill, map<int, float> *nearfull) const
|
||||
{
|
||||
full->clear();
|
||||
backfill->clear();
|
||||
nearfull->clear();
|
||||
for (int i = 0; i < max_osd; ++i) {
|
||||
if (exists(i) && is_up(i) && is_in(i)) {
|
||||
int64_t kb, kb_used, kb_avail;
|
||||
if (osd_state[i] & CEPH_OSD_FULL) {
|
||||
if (get_osd_utilization(osd_stat, i, &kb, &kb_used, &kb_avail))
|
||||
full->emplace(i, (float)kb_used / (float)kb);
|
||||
} else if (osd_state[i] & CEPH_OSD_BACKFILLFULL) {
|
||||
if (get_osd_utilization(osd_stat, i, &kb, &kb_used, &kb_avail))
|
||||
backfill->emplace(i, (float)kb_used / (float)kb);
|
||||
} else if (osd_state[i] & CEPH_OSD_NEARFULL) {
|
||||
if (get_osd_utilization(osd_stat, i, &kb, &kb_used, &kb_avail))
|
||||
nearfull->emplace(i, (float)kb_used / (float)kb);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void OSDMap::get_all_osds(set<int32_t>& ls) const
|
||||
{
|
||||
for (int i=0; i<max_osd; i++)
|
||||
@ -1575,6 +1619,9 @@ int OSDMap::apply_incremental(const Incremental &inc)
|
||||
if (inc.new_nearfull_ratio >= 0) {
|
||||
nearfull_ratio = inc.new_nearfull_ratio;
|
||||
}
|
||||
if (inc.new_backfillfull_ratio >= 0) {
|
||||
backfillfull_ratio = inc.new_backfillfull_ratio;
|
||||
}
|
||||
if (inc.new_full_ratio >= 0) {
|
||||
full_ratio = inc.new_full_ratio;
|
||||
}
|
||||
@ -2148,7 +2195,7 @@ void OSDMap::encode(bufferlist& bl, uint64_t features) const
|
||||
}
|
||||
|
||||
{
|
||||
uint8_t target_v = 2;
|
||||
uint8_t target_v = 3;
|
||||
if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
|
||||
target_v = 1;
|
||||
}
|
||||
@ -2173,6 +2220,7 @@ void OSDMap::encode(bufferlist& bl, uint64_t features) const
|
||||
if (target_v >= 2) {
|
||||
::encode(nearfull_ratio, bl);
|
||||
::encode(full_ratio, bl);
|
||||
::encode(backfillfull_ratio, bl);
|
||||
}
|
||||
ENCODE_FINISH(bl); // osd-only data
|
||||
}
|
||||
@ -2390,7 +2438,7 @@ void OSDMap::decode(bufferlist::iterator& bl)
|
||||
}
|
||||
|
||||
{
|
||||
DECODE_START(2, bl); // extended, osd-only data
|
||||
DECODE_START(3, bl); // extended, osd-only data
|
||||
::decode(osd_addrs->hb_back_addr, bl);
|
||||
::decode(osd_info, bl);
|
||||
::decode(blacklist, bl);
|
||||
@ -2407,6 +2455,11 @@ void OSDMap::decode(bufferlist::iterator& bl)
|
||||
nearfull_ratio = 0;
|
||||
full_ratio = 0;
|
||||
}
|
||||
if (struct_v >= 3) {
|
||||
::decode(backfillfull_ratio, bl);
|
||||
} else {
|
||||
backfillfull_ratio = 0;
|
||||
}
|
||||
DECODE_FINISH(bl); // osd-only data
|
||||
}
|
||||
|
||||
@ -2480,6 +2533,7 @@ void OSDMap::dump(Formatter *f) const
|
||||
f->dump_stream("modified") << get_modified();
|
||||
f->dump_string("flags", get_flag_string());
|
||||
f->dump_float("full_ratio", full_ratio);
|
||||
f->dump_float("backfillfull_ratio", backfillfull_ratio);
|
||||
f->dump_float("nearfull_ratio", nearfull_ratio);
|
||||
f->dump_string("cluster_snapshot", get_cluster_snapshot());
|
||||
f->dump_int("pool_max", get_pool_max());
|
||||
@ -2701,6 +2755,7 @@ void OSDMap::print(ostream& out) const
|
||||
|
||||
out << "flags " << get_flag_string() << "\n";
|
||||
out << "full_ratio " << full_ratio << "\n";
|
||||
out << "backfillfull_ratio " << backfillfull_ratio << "\n";
|
||||
out << "nearfull_ratio " << nearfull_ratio << "\n";
|
||||
if (get_cluster_snapshot().length())
|
||||
out << "cluster_snapshot " << get_cluster_snapshot() << "\n";
|
||||
|
@ -155,6 +155,7 @@ public:
|
||||
string cluster_snapshot;
|
||||
|
||||
float new_nearfull_ratio = -1;
|
||||
float new_backfillfull_ratio = -1;
|
||||
float new_full_ratio = -1;
|
||||
|
||||
mutable bool have_crc; ///< crc values are defined
|
||||
@ -254,7 +255,7 @@ private:
|
||||
string cluster_snapshot;
|
||||
bool new_blacklist_entries;
|
||||
|
||||
float full_ratio = 0, nearfull_ratio = 0;
|
||||
float full_ratio = 0, backfillfull_ratio = 0, nearfull_ratio = 0;
|
||||
|
||||
mutable uint64_t cached_up_osd_features;
|
||||
|
||||
@ -336,10 +337,15 @@ public:
|
||||
float get_full_ratio() const {
|
||||
return full_ratio;
|
||||
}
|
||||
float get_backfillfull_ratio() const {
|
||||
return backfillfull_ratio;
|
||||
}
|
||||
float get_nearfull_ratio() const {
|
||||
return nearfull_ratio;
|
||||
}
|
||||
void count_full_nearfull_osds(int *full, int *nearfull) const;
|
||||
void count_full_nearfull_osds(int *full, int *backfill, int *nearfull) const;
|
||||
void get_full_osd_util(const ceph::unordered_map<int32_t,osd_stat_t> &osd_stat,
|
||||
map<int, float> *full, map<int, float> *backfill, map<int, float> *nearfull) const;
|
||||
|
||||
/***** cluster state *****/
|
||||
/* osds */
|
||||
|
@ -3809,14 +3809,24 @@ void PG::reject_reservation()
|
||||
|
||||
void PG::schedule_backfill_full_retry()
|
||||
{
|
||||
Mutex::Locker lock(osd->backfill_request_lock);
|
||||
osd->backfill_request_timer.add_event_after(
|
||||
Mutex::Locker lock(osd->recovery_request_lock);
|
||||
osd->recovery_request_timer.add_event_after(
|
||||
cct->_conf->osd_backfill_retry_interval,
|
||||
new QueuePeeringEvt<RequestBackfill>(
|
||||
this, get_osdmap()->get_epoch(),
|
||||
RequestBackfill()));
|
||||
}
|
||||
|
||||
void PG::schedule_recovery_full_retry()
|
||||
{
|
||||
Mutex::Locker lock(osd->recovery_request_lock);
|
||||
osd->recovery_request_timer.add_event_after(
|
||||
cct->_conf->osd_recovery_retry_interval,
|
||||
new QueuePeeringEvt<DoRecovery>(
|
||||
this, get_osdmap()->get_epoch(),
|
||||
DoRecovery()));
|
||||
}
|
||||
|
||||
void PG::clear_scrub_reserved()
|
||||
{
|
||||
scrubber.reserved_peers.clear();
|
||||
@ -5237,6 +5247,7 @@ void PG::start_peering_interval(
|
||||
state_clear(PG_STATE_PEERED);
|
||||
state_clear(PG_STATE_DOWN);
|
||||
state_clear(PG_STATE_RECOVERY_WAIT);
|
||||
state_clear(PG_STATE_RECOVERY_TOOFULL);
|
||||
state_clear(PG_STATE_RECOVERING);
|
||||
|
||||
peer_purged.clear();
|
||||
@ -6488,6 +6499,24 @@ void PG::RecoveryState::NotBackfilling::exit()
|
||||
pg->osd->recoverystate_perf->tinc(rs_notbackfilling_latency, dur);
|
||||
}
|
||||
|
||||
/*----NotRecovering------*/
|
||||
PG::RecoveryState::NotRecovering::NotRecovering(my_context ctx)
|
||||
: my_base(ctx),
|
||||
NamedState(context< RecoveryMachine >().pg->cct, "Started/Primary/Active/NotRecovering")
|
||||
{
|
||||
context< RecoveryMachine >().log_enter(state_name);
|
||||
PG *pg = context< RecoveryMachine >().pg;
|
||||
pg->publish_stats_to_osd();
|
||||
}
|
||||
|
||||
void PG::RecoveryState::NotRecovering::exit()
|
||||
{
|
||||
context< RecoveryMachine >().log_exit(state_name, enter_time);
|
||||
PG *pg = context< RecoveryMachine >().pg;
|
||||
utime_t dur = ceph_clock_now() - enter_time;
|
||||
pg->osd->recoverystate_perf->tinc(rs_notrecovering_latency, dur);
|
||||
}
|
||||
|
||||
/*---RepNotRecovering----*/
|
||||
PG::RecoveryState::RepNotRecovering::RepNotRecovering(my_context ctx)
|
||||
: my_base(ctx),
|
||||
@ -6554,18 +6583,17 @@ boost::statechart::result
|
||||
PG::RecoveryState::RepNotRecovering::react(const RequestBackfillPrio &evt)
|
||||
{
|
||||
PG *pg = context< RecoveryMachine >().pg;
|
||||
double ratio, max_ratio;
|
||||
ostringstream ss;
|
||||
|
||||
if (pg->cct->_conf->osd_debug_reject_backfill_probability > 0 &&
|
||||
(rand()%1000 < (pg->cct->_conf->osd_debug_reject_backfill_probability*1000.0))) {
|
||||
ldout(pg->cct, 10) << "backfill reservation rejected: failure injection"
|
||||
<< dendl;
|
||||
post_event(RemoteReservationRejected());
|
||||
} else if (pg->osd->too_full_for_backfill(&ratio, &max_ratio) &&
|
||||
!pg->cct->_conf->osd_debug_skip_full_check_in_backfill_reservation) {
|
||||
ldout(pg->cct, 10) << "backfill reservation rejected: full ratio is "
|
||||
<< ratio << ", which is greater than max allowed ratio "
|
||||
<< max_ratio << dendl;
|
||||
} else if (!pg->cct->_conf->osd_debug_skip_full_check_in_backfill_reservation &&
|
||||
pg->osd->check_backfill_full(ss)) {
|
||||
ldout(pg->cct, 10) << "backfill reservation rejected: "
|
||||
<< ss.str() << dendl;
|
||||
post_event(RemoteReservationRejected());
|
||||
} else {
|
||||
pg->osd->remote_reserver.request_reservation(
|
||||
@ -6590,7 +6618,7 @@ PG::RecoveryState::RepWaitBackfillReserved::react(const RemoteBackfillReserved &
|
||||
{
|
||||
PG *pg = context< RecoveryMachine >().pg;
|
||||
|
||||
double ratio, max_ratio;
|
||||
ostringstream ss;
|
||||
if (pg->cct->_conf->osd_debug_reject_backfill_probability > 0 &&
|
||||
(rand()%1000 < (pg->cct->_conf->osd_debug_reject_backfill_probability*1000.0))) {
|
||||
ldout(pg->cct, 10) << "backfill reservation rejected after reservation: "
|
||||
@ -6598,11 +6626,10 @@ PG::RecoveryState::RepWaitBackfillReserved::react(const RemoteBackfillReserved &
|
||||
pg->osd->remote_reserver.cancel_reservation(pg->info.pgid);
|
||||
post_event(RemoteReservationRejected());
|
||||
return discard_event();
|
||||
} else if (pg->osd->too_full_for_backfill(&ratio, &max_ratio) &&
|
||||
!pg->cct->_conf->osd_debug_skip_full_check_in_backfill_reservation) {
|
||||
ldout(pg->cct, 10) << "backfill reservation rejected after reservation: full ratio is "
|
||||
<< ratio << ", which is greater than max allowed ratio "
|
||||
<< max_ratio << dendl;
|
||||
} else if (!pg->cct->_conf->osd_debug_skip_full_check_in_backfill_reservation &&
|
||||
pg->osd->check_backfill_full(ss)) {
|
||||
ldout(pg->cct, 10) << "backfill reservation rejected after reservation: "
|
||||
<< ss.str() << dendl;
|
||||
pg->osd->remote_reserver.cancel_reservation(pg->info.pgid);
|
||||
post_event(RemoteReservationRejected());
|
||||
return discard_event();
|
||||
@ -6673,6 +6700,15 @@ PG::RecoveryState::WaitLocalRecoveryReserved::WaitLocalRecoveryReserved(my_conte
|
||||
{
|
||||
context< RecoveryMachine >().log_enter(state_name);
|
||||
PG *pg = context< RecoveryMachine >().pg;
|
||||
|
||||
// Make sure all nodes that part of the recovery aren't full
|
||||
if (!pg->cct->_conf->osd_debug_skip_full_check_in_recovery &&
|
||||
pg->osd->check_osdmap_full(pg->actingbackfill)) {
|
||||
post_event(RecoveryTooFull());
|
||||
return;
|
||||
}
|
||||
|
||||
pg->state_clear(PG_STATE_RECOVERY_TOOFULL);
|
||||
pg->state_set(PG_STATE_RECOVERY_WAIT);
|
||||
pg->osd->local_reserver.request_reservation(
|
||||
pg->info.pgid,
|
||||
@ -6683,6 +6719,15 @@ PG::RecoveryState::WaitLocalRecoveryReserved::WaitLocalRecoveryReserved(my_conte
|
||||
pg->publish_stats_to_osd();
|
||||
}
|
||||
|
||||
boost::statechart::result
|
||||
PG::RecoveryState::WaitLocalRecoveryReserved::react(const RecoveryTooFull &evt)
|
||||
{
|
||||
PG *pg = context< RecoveryMachine >().pg;
|
||||
pg->state_set(PG_STATE_RECOVERY_TOOFULL);
|
||||
pg->schedule_recovery_full_retry();
|
||||
return transit<NotRecovering>();
|
||||
}
|
||||
|
||||
void PG::RecoveryState::WaitLocalRecoveryReserved::exit()
|
||||
{
|
||||
context< RecoveryMachine >().log_exit(state_name, enter_time);
|
||||
@ -6739,6 +6784,7 @@ PG::RecoveryState::Recovering::Recovering(my_context ctx)
|
||||
|
||||
PG *pg = context< RecoveryMachine >().pg;
|
||||
pg->state_clear(PG_STATE_RECOVERY_WAIT);
|
||||
pg->state_clear(PG_STATE_RECOVERY_TOOFULL);
|
||||
pg->state_set(PG_STATE_RECOVERING);
|
||||
pg->publish_stats_to_osd();
|
||||
pg->queue_recovery();
|
||||
@ -7187,6 +7233,7 @@ void PG::RecoveryState::Active::exit()
|
||||
pg->state_clear(PG_STATE_BACKFILL_TOOFULL);
|
||||
pg->state_clear(PG_STATE_BACKFILL_WAIT);
|
||||
pg->state_clear(PG_STATE_RECOVERY_WAIT);
|
||||
pg->state_clear(PG_STATE_RECOVERY_TOOFULL);
|
||||
utime_t dur = ceph_clock_now() - enter_time;
|
||||
pg->osd->recoverystate_perf->tinc(rs_active_latency, dur);
|
||||
pg->agent_stop();
|
||||
|
14
src/osd/PG.h
14
src/osd/PG.h
@ -1340,6 +1340,7 @@ public:
|
||||
|
||||
void reject_reservation();
|
||||
void schedule_backfill_full_retry();
|
||||
void schedule_recovery_full_retry();
|
||||
|
||||
// -- recovery state --
|
||||
|
||||
@ -1505,6 +1506,7 @@ public:
|
||||
TrivialEvent(RequestRecovery)
|
||||
TrivialEvent(RecoveryDone)
|
||||
TrivialEvent(BackfillTooFull)
|
||||
TrivialEvent(RecoveryTooFull)
|
||||
|
||||
TrivialEvent(AllReplicasRecovered)
|
||||
TrivialEvent(DoRecovery)
|
||||
@ -1850,6 +1852,14 @@ public:
|
||||
boost::statechart::result react(const RemoteReservationRejected& evt);
|
||||
};
|
||||
|
||||
struct NotRecovering : boost::statechart::state< NotRecovering, Active>, NamedState {
|
||||
typedef boost::mpl::list<
|
||||
boost::statechart::transition< DoRecovery, WaitLocalRecoveryReserved >
|
||||
> reactions;
|
||||
explicit NotRecovering(my_context ctx);
|
||||
void exit();
|
||||
};
|
||||
|
||||
struct RepNotRecovering;
|
||||
struct ReplicaActive : boost::statechart::state< ReplicaActive, Started, RepNotRecovering >, NamedState {
|
||||
explicit ReplicaActive(my_context ctx);
|
||||
@ -1938,10 +1948,12 @@ public:
|
||||
|
||||
struct WaitLocalRecoveryReserved : boost::statechart::state< WaitLocalRecoveryReserved, Active >, NamedState {
|
||||
typedef boost::mpl::list <
|
||||
boost::statechart::transition< LocalRecoveryReserved, WaitRemoteRecoveryReserved >
|
||||
boost::statechart::transition< LocalRecoveryReserved, WaitRemoteRecoveryReserved >,
|
||||
boost::statechart::custom_reaction< RecoveryTooFull >
|
||||
> reactions;
|
||||
explicit WaitLocalRecoveryReserved(my_context ctx);
|
||||
void exit();
|
||||
boost::statechart::result react(const RecoveryTooFull &evt);
|
||||
};
|
||||
|
||||
struct Activating : boost::statechart::state< Activating, Active >, NamedState {
|
||||
|
@ -261,6 +261,10 @@ typedef ceph::shared_ptr<const OSDMap> OSDMapRef;
|
||||
|
||||
virtual LogClientTemp clog_error() = 0;
|
||||
|
||||
virtual bool check_failsafe_full(ostream &ss) = 0;
|
||||
|
||||
virtual bool check_osdmap_full(const set<pg_shard_t> &missing_on) = 0;
|
||||
|
||||
virtual ~Listener() {}
|
||||
};
|
||||
Listener *parent;
|
||||
|
@ -1888,8 +1888,13 @@ void PrimaryLogPG::do_op(OpRequestRef& op)
|
||||
<< *m << dendl;
|
||||
return;
|
||||
}
|
||||
if (!(m->get_source().is_mds()) && osd->check_failsafe_full() && write_ordered) {
|
||||
// mds should have stopped writing before this point.
|
||||
// We can't allow OSD to become non-startable even if mds
|
||||
// could be writing as part of file removals.
|
||||
ostringstream ss;
|
||||
if (write_ordered && osd->check_failsafe_full(ss)) {
|
||||
dout(10) << __func__ << " fail-safe full check failed, dropping request"
|
||||
<< ss.str()
|
||||
<< dendl;
|
||||
return;
|
||||
}
|
||||
@ -3328,10 +3333,9 @@ void PrimaryLogPG::do_scan(
|
||||
switch (m->op) {
|
||||
case MOSDPGScan::OP_SCAN_GET_DIGEST:
|
||||
{
|
||||
double ratio, full_ratio;
|
||||
if (osd->too_full_for_backfill(&ratio, &full_ratio)) {
|
||||
dout(1) << __func__ << ": Canceling backfill, current usage is "
|
||||
<< ratio << ", which exceeds " << full_ratio << dendl;
|
||||
ostringstream ss;
|
||||
if (osd->check_backfill_full(ss)) {
|
||||
dout(1) << __func__ << ": Canceling backfill, " << ss.str() << dendl;
|
||||
queue_peering_event(
|
||||
CephPeeringEvtRef(
|
||||
std::make_shared<CephPeeringEvt>(
|
||||
@ -13027,6 +13031,11 @@ void PrimaryLogPG::_scrub_finish()
|
||||
}
|
||||
}
|
||||
|
||||
bool PrimaryLogPG::check_osdmap_full(const set<pg_shard_t> &missing_on)
|
||||
{
|
||||
return osd->check_osdmap_full(missing_on);
|
||||
}
|
||||
|
||||
/*---SnapTrimmer Logging---*/
|
||||
#undef dout_prefix
|
||||
#define dout_prefix *_dout << pg->gen_prefix()
|
||||
@ -13268,6 +13277,10 @@ int PrimaryLogPG::getattrs_maybe_cache(
|
||||
return r;
|
||||
}
|
||||
|
||||
bool PrimaryLogPG::check_failsafe_full(ostream &ss) {
|
||||
return osd->check_failsafe_full(ss);
|
||||
}
|
||||
|
||||
void intrusive_ptr_add_ref(PrimaryLogPG *pg) { pg->get("intptr"); }
|
||||
void intrusive_ptr_release(PrimaryLogPG *pg) { pg->put("intptr"); }
|
||||
|
||||
|
@ -1731,6 +1731,8 @@ public:
|
||||
void on_flushed() override;
|
||||
void on_removal(ObjectStore::Transaction *t) override;
|
||||
void on_shutdown() override;
|
||||
bool check_failsafe_full(ostream &ss) override;
|
||||
bool check_osdmap_full(const set<pg_shard_t> &missing_on) override;
|
||||
|
||||
// attr cache handling
|
||||
void setattr_maybe_cache(
|
||||
|
@ -807,6 +807,11 @@ void ReplicatedBackend::_do_push(OpRequestRef op)
|
||||
|
||||
vector<PushReplyOp> replies;
|
||||
ObjectStore::Transaction t;
|
||||
ostringstream ss;
|
||||
if (get_parent()->check_failsafe_full(ss)) {
|
||||
dout(10) << __func__ << " Out of space (failsafe) processing push request: " << ss.str() << dendl;
|
||||
ceph_abort();
|
||||
}
|
||||
for (vector<PushOp>::const_iterator i = m->pushes.begin();
|
||||
i != m->pushes.end();
|
||||
++i) {
|
||||
@ -862,6 +867,13 @@ void ReplicatedBackend::_do_pull_response(OpRequestRef op)
|
||||
op->mark_started();
|
||||
|
||||
vector<PullOp> replies(1);
|
||||
|
||||
ostringstream ss;
|
||||
if (get_parent()->check_failsafe_full(ss)) {
|
||||
dout(10) << __func__ << " Out of space (failsafe) processing pull response (push): " << ss.str() << dendl;
|
||||
ceph_abort();
|
||||
}
|
||||
|
||||
ObjectStore::Transaction t;
|
||||
list<pull_complete_info> to_continue;
|
||||
for (vector<PushOp>::const_iterator i = m->pushes.begin();
|
||||
|
@ -789,6 +789,8 @@ std::string pg_state_string(int state)
|
||||
oss << "clean+";
|
||||
if (state & PG_STATE_RECOVERY_WAIT)
|
||||
oss << "recovery_wait+";
|
||||
if (state & PG_STATE_RECOVERY_TOOFULL)
|
||||
oss << "recovery_toofull+";
|
||||
if (state & PG_STATE_RECOVERING)
|
||||
oss << "recovering+";
|
||||
if (state & PG_STATE_DOWN)
|
||||
@ -869,6 +871,8 @@ int pg_string_state(const std::string& state)
|
||||
type = PG_STATE_BACKFILL_TOOFULL;
|
||||
else if (state == "recovery_wait")
|
||||
type = PG_STATE_RECOVERY_WAIT;
|
||||
else if (state == "recovery_toofull")
|
||||
type = PG_STATE_RECOVERY_TOOFULL;
|
||||
else if (state == "undersized")
|
||||
type = PG_STATE_UNDERSIZED;
|
||||
else if (state == "activating")
|
||||
|
@ -971,6 +971,7 @@ inline ostream& operator<<(ostream& out, const osd_stat_t& s) {
|
||||
#define PG_STATE_PEERED (1<<25) // peered, cannot go active, can recover
|
||||
#define PG_STATE_SNAPTRIM (1<<26) // trimming snaps
|
||||
#define PG_STATE_SNAPTRIM_WAIT (1<<27) // queued to trim snaps
|
||||
#define PG_STATE_RECOVERY_TOOFULL (1<<28) // recovery can't proceed: too full
|
||||
|
||||
std::string pg_state_string(int state);
|
||||
std::string pg_vector_string(const vector<int32_t> &a);
|
||||
|
@ -20,6 +20,7 @@
|
||||
modified \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)
|
||||
flags
|
||||
full_ratio 0
|
||||
backfillfull_ratio 0
|
||||
nearfull_ratio 0
|
||||
|
||||
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 192 pgp_num 192 last_change 0 flags hashpspool stripe_width 0
|
||||
@ -43,6 +44,7 @@
|
||||
modified \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)
|
||||
flags
|
||||
full_ratio 0
|
||||
backfillfull_ratio 0
|
||||
nearfull_ratio 0
|
||||
|
||||
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 0 flags hashpspool stripe_width 0
|
||||
|
@ -77,6 +77,7 @@
|
||||
modified \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)
|
||||
flags
|
||||
full_ratio 0
|
||||
backfillfull_ratio 0
|
||||
nearfull_ratio 0
|
||||
|
||||
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 192 pgp_num 192 last_change 0 flags hashpspool stripe_width 0
|
||||
|
@ -790,6 +790,7 @@
|
||||
modified \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)
|
||||
flags
|
||||
full_ratio 0
|
||||
backfillfull_ratio 0
|
||||
nearfull_ratio 0
|
||||
|
||||
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 15296 pgp_num 15296 last_change 0 flags hashpspool stripe_width 0
|
||||
|
@ -183,21 +183,6 @@ class TestPG(TestArgparse):
|
||||
def test_force_create_pg(self):
|
||||
self.one_pgid('force_create_pg')
|
||||
|
||||
def set_ratio(self, command):
|
||||
self.assert_valid_command(['pg',
|
||||
command,
|
||||
'0.0'])
|
||||
assert_equal({}, validate_command(sigdict, ['pg', command]))
|
||||
assert_equal({}, validate_command(sigdict, ['pg',
|
||||
command,
|
||||
'2.0']))
|
||||
|
||||
def test_set_full_ratio(self):
|
||||
self.set_ratio('set_full_ratio')
|
||||
|
||||
def test_set_nearfull_ratio(self):
|
||||
self.set_ratio('set_nearfull_ratio')
|
||||
|
||||
|
||||
class TestAuth(TestArgparse):
|
||||
|
||||
@ -1153,6 +1138,24 @@ class TestOSD(TestArgparse):
|
||||
'poolname',
|
||||
'toomany']))
|
||||
|
||||
def set_ratio(self, command):
|
||||
self.assert_valid_command(['osd',
|
||||
command,
|
||||
'0.0'])
|
||||
assert_equal({}, validate_command(sigdict, ['osd', command]))
|
||||
assert_equal({}, validate_command(sigdict, ['osd',
|
||||
command,
|
||||
'2.0']))
|
||||
|
||||
def test_set_full_ratio(self):
|
||||
self.set_ratio('set-full-ratio')
|
||||
|
||||
def test_set_backfillfull_ratio(self):
|
||||
self.set_ratio('set-backfillfull-ratio')
|
||||
|
||||
def test_set_nearfull_ratio(self):
|
||||
self.set_ratio('set-nearfull-ratio')
|
||||
|
||||
|
||||
class TestConfigKey(TestArgparse):
|
||||
|
||||
|
@ -654,6 +654,14 @@ static int update_pgmap_meta(MonitorDBStore& st)
|
||||
::encode(full_ratio, bl);
|
||||
t->put(prefix, "full_ratio", bl);
|
||||
}
|
||||
{
|
||||
auto backfillfull_ratio = g_ceph_context->_conf->mon_osd_backfillfull_ratio;
|
||||
if (backfillfull_ratio > 1.0)
|
||||
backfillfull_ratio /= 100.0;
|
||||
bufferlist bl;
|
||||
::encode(backfillfull_ratio, bl);
|
||||
t->put(prefix, "backfillfull_ratio", bl);
|
||||
}
|
||||
{
|
||||
auto nearfull_ratio = g_ceph_context->_conf->mon_osd_nearfull_ratio;
|
||||
if (nearfull_ratio > 1.0)
|
||||
|
@ -2906,7 +2906,7 @@ int main(int argc, char **argv)
|
||||
throw std::runtime_error(ss.str());
|
||||
}
|
||||
vector<json_spirit::Value>::iterator i = array.begin();
|
||||
//if (i == array.end() || i->type() != json_spirit::str_type) {
|
||||
assert(i != array.end());
|
||||
if (i->type() != json_spirit::str_type) {
|
||||
ss << "Object '" << object
|
||||
<< "' must be a JSON array with the first element a string";
|
||||
|
Loading…
Reference in New Issue
Block a user