Commit Graph

464 Commits

Author SHA1 Message Date
xie xingguo
023524a26d osd/PeeringState: restart peering on any previous down acting member coming back
One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
  pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
  hence leaving the corresponding pg pinning undersized for a long time until all backfill
  targets completes

It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.

The right way to achieve the above goal is for:

	boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)

to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
2020-02-21 17:52:52 +08:00
Sage Weil
f10cc22c60 Merge PR #32961 into master
* refs/pull/32961/head:
	qa/standalone/osd/osd-bench: debug bluestore

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-30 10:42:17 -06:00
Sage Weil
b99e506a3f qa/standalone/osd/osd-bench: debug bluestore
Looking for https://tracker.ceph.com/issues/43888

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 07:43:41 -06:00
David Zafman
e18519ad09 test: Update pg log test for new trimming behavior
Fixes: https://tracker.ceph.com/issues/43864

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-28 15:23:45 -08:00
Neha
b20817795a qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_2
Fixes: https://tracker.ceph.com/issues/43807
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-01-24 22:42:04 +00:00
Neha
994698277b qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_1
Fixes: https://tracker.ceph.com/issues/43807
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-01-24 22:20:21 +00:00
Sage Weil
76ea774c10 qa/standalone/misc/ok-to-stop: improve test
Make sure PGs peer (simply flushing state to mon isn't enough).

Fixes: https://tracker.ceph.com/issues/43721
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 13:24:30 -06:00
Sage Weil
78ec6aec90 qa/standalone/ceph-helpers: add wait_for_peered
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 13:23:56 -06:00
Sage Weil
c5710bc8fb Merge PR #32628 into master
* refs/pull/32628/head:
	test: Fix wait_for_state() to wait for a PG to get into a state

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-18 14:39:19 -06:00
Sage Weil
65fbc620b6 qa/standalone/mon/osd-create-pool: fix utf-8 grep LANG
This needs en_US.UTF-8... en_US does not work.

Fixes: https://tracker.ceph.com/issues/43422
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-17 14:19:53 -06:00
David Zafman
9f7aabbe9f test: Fix wait_for_state() to wait for a PG to get into a state
To avoid confusion fix function names in osd-backfill-space.sh for how
they actually work.

Fixes: https://tracker.ceph.com/issues/43592

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-13 18:39:38 -08:00
David Zafman
c65d5c8d14 test: Sort pool list because the order isn't guaranteed from "balancer pool ls"
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-06 21:35:19 -08:00
David Zafman
b0a1b758d0 mgr: Change default upmap_max_deviation to 5
Fixes: https://tracker.ceph.com/issues/43312

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-06 21:35:19 -08:00
David Zafman
8e46bbbf36 test: Fix test case for pool based balancing instead of rule batched
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-06 21:35:19 -08:00
Sage Weil
acd4f5bc43 qa/standalone: python -> python3
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 13:33:21 -06:00
Sage Weil
e47526e152 qa/standalone/special/ceph_objectstore_tool: python3
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 13:32:53 -06:00
Thomas Bechtold
0127cd1e88 qa: Enable flake8 tox and fix failures
There were a couple of problems found by flake8 in the qa/
directory (most of them fixed now). Enabling flake8 during the usual
check runs hopefully avoids adding new issues in the future.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-12-12 10:21:01 +01:00
Sage Weil
137fa64e12 qa: rename ceph-daemon tests -> cephadm
Also move the workunit to a better location.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Sage Weil
c8750b7066 files,rpm,deb: rename ceph-daemon -> cephadm
This is just renaming the files and adjusting the packages.  Lots of
cleanup to do still.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Sage Weil
80cbe97e7b qa/standalone/test_ceph_daemon.sh: disable adoption for the moment
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 07:32:29 -06:00
Sage Weil
6d3a035b26
qa/standalone/test_ceph_daemon.sh: clone corpus explicitly
When this is run by teuthology we don't have a full ceph source tree
checkout with submodules.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-10 11:34:42 -07:00
Michael Fritch
4aa7d5582b
ceph-daemon: re-enable the OSD standalone test
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-10 11:34:42 -07:00
Michael Fritch
a0eed4cb84
ceph-daemon: move standalone test tgz to corpus
Fixes: https://tracker.ceph.com/issues/42876
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-10 11:32:18 -07:00
Sage Weil
0e981c4c30 Merge PR #32138 into master
* refs/pull/32138/head:
	ceph-daemon: combine SUDO and ARGS into a single var

Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-12-10 12:16:07 -06:00
Kefu Chai
a11ae900e9
Merge pull request #32052 from mgfritch/wip-cd-standalone-tempfiles
ceph-daemon: clean-up tempfiles on EXIT

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-12-10 16:57:48 +08:00
Sage Weil
dcb5e9b6d8 Merge PR #32098 into master
* refs/pull/32098/head:
	ceph-daemon: py2: tolerate whitespace before config key name

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-12-09 18:29:41 -06:00
Sage Weil
bffe2dd9e9 Merge PR #32046 into master
* refs/pull/32046/head:
	mgr/DaemonServer: fix 'osd ok-to-stop' for EC pools

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-12-09 15:34:57 -06:00
Michael Fritch
8c355898f6
ceph-daemon: combine SUDO and ARGS into a single var
- reduce the amount of typing/noise for each CEPH_DAEMON invocation
- ensure the `--image` param is passed to each test invocation
- allow passing additional args to ceph-daemon via CEPH_DAEMON_ARGS

Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-09 09:53:31 -07:00
Sage Weil
3036d11c60 ceph-daemon: py2: tolerate whitespace before config key name
The py2 ConfigParser doesn't like whitespace before the config option
name.  (The py3 version doesn't care.)  Filter it out before parsing.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-09 07:15:13 -06:00
Sage Weil
ade391513c ceph-daemon: remove prepare-host
I thought I took this out of the PR but somehow it got merged in... must
have repushed and old branch and not realized.  :/

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-08 11:26:14 -06:00
Sage Weil
86950ce9aa Merge PR #32039 into master
* refs/pull/32039/head:
	test: Improve races by using kill_daemons which waits for OSDs terminate
	test: run-standalone.sh: Only run execs in the subdirectories of qa/standalone
	test: Use activate_osd() when restarting OSDs
	test: osd-scrub-snaps.sh: Fix race with osd restart and doing a scrub

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-12-07 12:28:15 -06:00
David Zafman
676d882649 test: Improve races by using kill_daemons which waits for OSDs terminate
osd-backfill-space.sh: More sleep time to make sure the backfill gets started

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-06 19:44:06 -08:00
Michael Fritch
995e5c3209
ceph-daemon: remove guesswork to find script file
Allow passing CEPH_DAEMON via the environment or default to using the
script from the standard location.

Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-05 21:13:56 -07:00
Michael Fritch
9e03530441
ceph-daemon: trap on EXIT
tempfiles were not being removed after a standalone test failure

Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-05 21:13:42 -07:00
David Zafman
43f6218993 test: Use activate_osd() when restarting OSDs
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-05 15:13:31 -08:00
David Zafman
cca541d0f9 test: osd-scrub-snaps.sh: Fix race with osd restart and doing a scrub
Fixes: https://tracker.ceph.com/issues/43150

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-05 15:12:43 -08:00
Sage Weil
66690ea314 mgr/DaemonServer: fix 'osd ok-to-stop' for EC pools
We need to pay attention to account for CRUSH_ITEM_NONE entries in the
EC PG acting set.

Fixes: https://tracker.ceph.com/issues/43151
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-05 14:31:24 -06:00
Sage Weil
91cb6eb613 ceph-daemon: add check-host and prepare-host
Check for (and/or install/configure):

- podman | docker
- systemctl
- LVM2
- chrony (or ntp or timesyncd)

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-04 09:22:06 -06:00
Sage Weil
1770270af6 Merge PR #31869 into master
* refs/pull/31869/head:
	ceph-daemon: bootstrap: deploy initial mon via deploy_daemon()
	qa/standalone/test_ceph_daemon.sh: more $SUDO
	ceph-daemon: configure firewalld for new daemon deploys
	ceph-daemon: name mgr the same way mgr/ssh does

Reviewed-by: Michael Fritch <mfritch@suse.com>
2019-12-03 16:00:14 -06:00
Sage Weil
8aadba15bf qa/standalone/test_ceph_daemon.sh: more $SUDO
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-03 10:13:37 -06:00
Sage Weil
2c1235ba69 Merge PR #31913 into master
* refs/pull/31913/head:
	ceph-daemon: Allow env var for setting the used image

Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-12-02 16:26:33 -06:00
Thomas Bechtold
45bae8219a ceph-daemon: Allow env var for setting the used image
Instead of always adding "--image my-custom-image" when calling
ceph-daemon with a non-standard image, allow to set the environment
variable called CEPH_DAEMON_IMAGE which will adjust the --image
default.
That way, the command line arguments when using ceph-daemon with a
custom image are a bit shorter.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-11-28 09:06:18 +01:00
David Zafman
9d2e0267e1 test: Add test case based on Xie script in commit comment
Other test fixes to reflect changes

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-11-27 16:29:29 -08:00
David Zafman
0af7e25620 mgr: Fix balancer print
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-11-27 16:29:29 -08:00
Sage Weil
61ba2d7b66 Merge PR #31677 into master
* refs/pull/31677/head:
	qa/standalone/ceph-helpers.sh: remove osd down check
	qa/standalone/ceph-helpers.sh: destroy_osd: mark osd down
	osd: add osd_fast_shutdown option (default true)

Reviewed-by: Sébastien Han <seb@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-11-25 08:54:45 -06:00
Sage Weil
3a62d166a7 qa/standalone/ceph-helpers.sh: remove osd down check
A kill doesn't induce a mark-down of the OSD with osd_fast_shutdown=true.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-24 12:19:33 -06:00
Sage Weil
07193aec3a qa/standalone/test_ceph_daemon.sh: remove old vg before creating
Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
fd6bfad498 qa/standalone/test_ceph_daemon.sh: sudo for untar
The deepsea.tgz tar contains actual device nodes for the OSD block devices
(not symlinks or files).  Must be root to untar.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
723fdb111a qa/standalone/test_ceph_daemon.sh: sudo for losetup etc
Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
cb67545e99 qa/standalone/test_ceph_daemon.sh: fix overwrites of temp files
mktemp creates these files, so we have to pass --allow-overwrite (or
delete them after we get the unique name but before we write to them--this
is easier).

Broken by c7fe27a72a

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00