One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
hence leaving the corresponding pg pinning undersized for a long time until all backfill
targets completes
It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.
The right way to achieve the above goal is for:
boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)
to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
Make sure PGs peer (simply flushing state to mon isn't enough).
Fixes: https://tracker.ceph.com/issues/43721
Signed-off-by: Sage Weil <sage@redhat.com>
To avoid confusion fix function names in osd-backfill-space.sh for how
they actually work.
Fixes: https://tracker.ceph.com/issues/43592
Signed-off-by: David Zafman <dzafman@redhat.com>
There were a couple of problems found by flake8 in the qa/
directory (most of them fixed now). Enabling flake8 during the usual
check runs hopefully avoids adding new issues in the future.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
* refs/pull/32138/head:
ceph-daemon: combine SUDO and ARGS into a single var
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
- reduce the amount of typing/noise for each CEPH_DAEMON invocation
- ensure the `--image` param is passed to each test invocation
- allow passing additional args to ceph-daemon via CEPH_DAEMON_ARGS
Signed-off-by: Michael Fritch <mfritch@suse.com>
The py2 ConfigParser doesn't like whitespace before the config option
name. (The py3 version doesn't care.) Filter it out before parsing.
Signed-off-by: Sage Weil <sage@redhat.com>
I thought I took this out of the PR but somehow it got merged in... must
have repushed and old branch and not realized. :/
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/32039/head:
test: Improve races by using kill_daemons which waits for OSDs terminate
test: run-standalone.sh: Only run execs in the subdirectories of qa/standalone
test: Use activate_osd() when restarting OSDs
test: osd-scrub-snaps.sh: Fix race with osd restart and doing a scrub
Reviewed-by: Neha Ojha <nojha@redhat.com>
Allow passing CEPH_DAEMON via the environment or default to using the
script from the standard location.
Signed-off-by: Michael Fritch <mfritch@suse.com>
We need to pay attention to account for CRUSH_ITEM_NONE entries in the
EC PG acting set.
Fixes: https://tracker.ceph.com/issues/43151
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/31869/head:
ceph-daemon: bootstrap: deploy initial mon via deploy_daemon()
qa/standalone/test_ceph_daemon.sh: more $SUDO
ceph-daemon: configure firewalld for new daemon deploys
ceph-daemon: name mgr the same way mgr/ssh does
Reviewed-by: Michael Fritch <mfritch@suse.com>
* refs/pull/31913/head:
ceph-daemon: Allow env var for setting the used image
Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Instead of always adding "--image my-custom-image" when calling
ceph-daemon with a non-standard image, allow to set the environment
variable called CEPH_DAEMON_IMAGE which will adjust the --image
default.
That way, the command line arguments when using ceph-daemon with a
custom image are a bit shorter.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
The deepsea.tgz tar contains actual device nodes for the OSD block devices
(not symlinks or files). Must be root to untar.
Signed-off-by: Sage Weil <sage@redhat.com>
mktemp creates these files, so we have to pass --allow-overwrite (or
delete them after we get the unique name but before we write to them--this
is easier).
Broken by c7fe27a72a
Signed-off-by: Sage Weil <sage@redhat.com>