Commit Graph

528 Commits

Author SHA1 Message Date
Sage Weil
3212932ba1 Merge PR #33809 into octopus
* refs/pull/33809/head:
	qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
	qa/standalone/scrub/osd-scrub-test: wait longer for update

Reviewed-by: David Zafman <dzafman@redhat.com>
2020-03-09 15:28:19 -05:00
Deepika Upadhyay
21508bd9dd mon/OSDMonitor: add flag --yes-i-really-mean-it for setting pool size 1
Adds option `mon_allow_pool_size_one` which will be disabled by default
to ensure pools are not configured without replicas.
If the user still wants to use pool size 1, they will have to change the
value of `mon_allow_pool_size_one` to true and then have to pass flag
`--yes-i-really-mean-it` to cli command:

Example:
`ceph osd pool test set size 1 --yes-i-really-mean-it`

Fixes: https://tracker.ceph.com/issues/44025
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
2020-03-09 23:27:36 +05:30
Sage Weil
0447ed0ff9 qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
flush_pg_stats isn't sufficient to ensure that OSDs have the latest
OSDMap.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-08 14:52:10 -05:00
Sage Weil
ac9befd450 qa/standalone/scrub/osd-scrub-test: wait longer for update
Fixes: https://tracker.ceph.com/issues/43865
Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-08 14:45:00 -05:00
David Zafman
e509b7c7d0 test: Add flush_pg_stats to avoid race with getting num_shards_repaired
Fixes: https://tracker.ceph.com/issues/44439

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-03-06 04:25:37 +00:00
Kefu Chai
c6088bdd26
Merge pull request #33593 from dzafman/wip-cot-fix
test: Fix failing ceph_objectstore_tool.py test

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-03-02 18:58:19 +08:00
Kefu Chai
7b0e18c09e
Merge pull request #33566 from dzafman/wip-44296
test: Expect being off by up to 2 and make sure all PGs are active+clean

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-02-28 11:42:47 +08:00
David Zafman
08f7e7980f test: Fix failing ceph_objectstore_tool.py test
The -N option to vstart.sh was removed, use -k

Old hinfo_key binary happen to be utf-8 decodable, now it
throws an exception trying to decode it. Use new
option to ceph-objectstore-tool to treat stdout as a terminal
and convert binary data to base64.

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-02-27 18:14:36 -08:00
David Zafman
49d9c7d664 test: Expect being off by up to 2 and make sure all PGs are active+clean
Fixes: https://tracker.ceph.com/issues/44296

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-02-27 18:12:25 -08:00
David Zafman
587cd64207
Merge pull request #32342 from dzafman/wip-43126
mon: Improvements to slow heartbeat health messages

Reviewed-by: Sage Weil <sage@redhat.com>
2020-02-25 17:42:00 -08:00
Sage Weil
4d42b4c5a0 common/TextTable: default to 2 spaces separating columns
This is what other projects and libraries default to, and it is more
legible.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-23 15:46:30 -06:00
Sage Weil
5afec0fbfb Merge PR #33091 into master
* refs/pull/33091/head:
	qa/suites/rados: disable device scraping
	qa/standalone/ceph-helpers: disable device monitoring
	qa/tasks/ceph.py: add pre-mgr-commands option for ceph task
	mgr/devicehealth: set default monitoring to 'on'

Reviewed-by: Sage Weil <sage@redhat.com>
2020-02-22 12:05:55 -06:00
xie xingguo
023524a26d osd/PeeringState: restart peering on any previous down acting member coming back
One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
  pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
  hence leaving the corresponding pg pinning undersized for a long time until all backfill
  targets completes

It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.

The right way to achieve the above goal is for:

	boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)

to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
2020-02-21 17:52:52 +08:00
Sage Weil
455cdcf89a qa/standalone/ceph-helpers: disable device monitoring
Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-19 15:31:26 -06:00
Sage Weil
f10cc22c60 Merge PR #32961 into master
* refs/pull/32961/head:
	qa/standalone/osd/osd-bench: debug bluestore

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-30 10:42:17 -06:00
Sage Weil
b99e506a3f qa/standalone/osd/osd-bench: debug bluestore
Looking for https://tracker.ceph.com/issues/43888

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 07:43:41 -06:00
David Zafman
e18519ad09 test: Update pg log test for new trimming behavior
Fixes: https://tracker.ceph.com/issues/43864

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-28 15:23:45 -08:00
Neha
b20817795a qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_2
Fixes: https://tracker.ceph.com/issues/43807
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-01-24 22:42:04 +00:00
Neha
994698277b qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_1
Fixes: https://tracker.ceph.com/issues/43807
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-01-24 22:20:21 +00:00
Sage Weil
76ea774c10 qa/standalone/misc/ok-to-stop: improve test
Make sure PGs peer (simply flushing state to mon isn't enough).

Fixes: https://tracker.ceph.com/issues/43721
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 13:24:30 -06:00
Sage Weil
78ec6aec90 qa/standalone/ceph-helpers: add wait_for_peered
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 13:23:56 -06:00
Sage Weil
c5710bc8fb Merge PR #32628 into master
* refs/pull/32628/head:
	test: Fix wait_for_state() to wait for a PG to get into a state

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-18 14:39:19 -06:00
Sage Weil
65fbc620b6 qa/standalone/mon/osd-create-pool: fix utf-8 grep LANG
This needs en_US.UTF-8... en_US does not work.

Fixes: https://tracker.ceph.com/issues/43422
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-17 14:19:53 -06:00
David Zafman
886475b5fe mon: Improvements to slow heartbeat health messages
Include crush parentage for each osd

Fixes: https://tracker.ceph.com/issues/43126

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-14 18:06:44 +00:00
David Zafman
9f7aabbe9f test: Fix wait_for_state() to wait for a PG to get into a state
To avoid confusion fix function names in osd-backfill-space.sh for how
they actually work.

Fixes: https://tracker.ceph.com/issues/43592

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-13 18:39:38 -08:00
David Zafman
c65d5c8d14 test: Sort pool list because the order isn't guaranteed from "balancer pool ls"
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-06 21:35:19 -08:00
David Zafman
b0a1b758d0 mgr: Change default upmap_max_deviation to 5
Fixes: https://tracker.ceph.com/issues/43312

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-06 21:35:19 -08:00
David Zafman
8e46bbbf36 test: Fix test case for pool based balancing instead of rule batched
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-06 21:35:19 -08:00
Sage Weil
acd4f5bc43 qa/standalone: python -> python3
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 13:33:21 -06:00
Sage Weil
e47526e152 qa/standalone/special/ceph_objectstore_tool: python3
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 13:32:53 -06:00
Thomas Bechtold
0127cd1e88 qa: Enable flake8 tox and fix failures
There were a couple of problems found by flake8 in the qa/
directory (most of them fixed now). Enabling flake8 during the usual
check runs hopefully avoids adding new issues in the future.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-12-12 10:21:01 +01:00
Sage Weil
137fa64e12 qa: rename ceph-daemon tests -> cephadm
Also move the workunit to a better location.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Sage Weil
c8750b7066 files,rpm,deb: rename ceph-daemon -> cephadm
This is just renaming the files and adjusting the packages.  Lots of
cleanup to do still.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Sage Weil
80cbe97e7b qa/standalone/test_ceph_daemon.sh: disable adoption for the moment
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 07:32:29 -06:00
Sage Weil
6d3a035b26
qa/standalone/test_ceph_daemon.sh: clone corpus explicitly
When this is run by teuthology we don't have a full ceph source tree
checkout with submodules.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-10 11:34:42 -07:00
Michael Fritch
4aa7d5582b
ceph-daemon: re-enable the OSD standalone test
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-10 11:34:42 -07:00
Michael Fritch
a0eed4cb84
ceph-daemon: move standalone test tgz to corpus
Fixes: https://tracker.ceph.com/issues/42876
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-10 11:32:18 -07:00
Sage Weil
0e981c4c30 Merge PR #32138 into master
* refs/pull/32138/head:
	ceph-daemon: combine SUDO and ARGS into a single var

Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-12-10 12:16:07 -06:00
Kefu Chai
a11ae900e9
Merge pull request #32052 from mgfritch/wip-cd-standalone-tempfiles
ceph-daemon: clean-up tempfiles on EXIT

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-12-10 16:57:48 +08:00
Sage Weil
dcb5e9b6d8 Merge PR #32098 into master
* refs/pull/32098/head:
	ceph-daemon: py2: tolerate whitespace before config key name

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-12-09 18:29:41 -06:00
Sage Weil
bffe2dd9e9 Merge PR #32046 into master
* refs/pull/32046/head:
	mgr/DaemonServer: fix 'osd ok-to-stop' for EC pools

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-12-09 15:34:57 -06:00
Michael Fritch
8c355898f6
ceph-daemon: combine SUDO and ARGS into a single var
- reduce the amount of typing/noise for each CEPH_DAEMON invocation
- ensure the `--image` param is passed to each test invocation
- allow passing additional args to ceph-daemon via CEPH_DAEMON_ARGS

Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-09 09:53:31 -07:00
Sage Weil
3036d11c60 ceph-daemon: py2: tolerate whitespace before config key name
The py2 ConfigParser doesn't like whitespace before the config option
name.  (The py3 version doesn't care.)  Filter it out before parsing.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-09 07:15:13 -06:00
Sage Weil
ade391513c ceph-daemon: remove prepare-host
I thought I took this out of the PR but somehow it got merged in... must
have repushed and old branch and not realized.  :/

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-08 11:26:14 -06:00
Sage Weil
86950ce9aa Merge PR #32039 into master
* refs/pull/32039/head:
	test: Improve races by using kill_daemons which waits for OSDs terminate
	test: run-standalone.sh: Only run execs in the subdirectories of qa/standalone
	test: Use activate_osd() when restarting OSDs
	test: osd-scrub-snaps.sh: Fix race with osd restart and doing a scrub

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-12-07 12:28:15 -06:00
David Zafman
676d882649 test: Improve races by using kill_daemons which waits for OSDs terminate
osd-backfill-space.sh: More sleep time to make sure the backfill gets started

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-06 19:44:06 -08:00
Michael Fritch
995e5c3209
ceph-daemon: remove guesswork to find script file
Allow passing CEPH_DAEMON via the environment or default to using the
script from the standard location.

Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-05 21:13:56 -07:00
Michael Fritch
9e03530441
ceph-daemon: trap on EXIT
tempfiles were not being removed after a standalone test failure

Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-05 21:13:42 -07:00
David Zafman
43f6218993 test: Use activate_osd() when restarting OSDs
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-05 15:13:31 -08:00
David Zafman
cca541d0f9 test: osd-scrub-snaps.sh: Fix race with osd restart and doing a scrub
Fixes: https://tracker.ceph.com/issues/43150

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-05 15:12:43 -08:00
Sage Weil
66690ea314 mgr/DaemonServer: fix 'osd ok-to-stop' for EC pools
We need to pay attention to account for CRUSH_ITEM_NONE entries in the
EC PG acting set.

Fixes: https://tracker.ceph.com/issues/43151
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-05 14:31:24 -06:00
Sage Weil
91cb6eb613 ceph-daemon: add check-host and prepare-host
Check for (and/or install/configure):

- podman | docker
- systemctl
- LVM2
- chrony (or ntp or timesyncd)

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-04 09:22:06 -06:00
Sage Weil
1770270af6 Merge PR #31869 into master
* refs/pull/31869/head:
	ceph-daemon: bootstrap: deploy initial mon via deploy_daemon()
	qa/standalone/test_ceph_daemon.sh: more $SUDO
	ceph-daemon: configure firewalld for new daemon deploys
	ceph-daemon: name mgr the same way mgr/ssh does

Reviewed-by: Michael Fritch <mfritch@suse.com>
2019-12-03 16:00:14 -06:00
Sage Weil
8aadba15bf qa/standalone/test_ceph_daemon.sh: more $SUDO
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-03 10:13:37 -06:00
Sage Weil
2c1235ba69 Merge PR #31913 into master
* refs/pull/31913/head:
	ceph-daemon: Allow env var for setting the used image

Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-12-02 16:26:33 -06:00
Thomas Bechtold
45bae8219a ceph-daemon: Allow env var for setting the used image
Instead of always adding "--image my-custom-image" when calling
ceph-daemon with a non-standard image, allow to set the environment
variable called CEPH_DAEMON_IMAGE which will adjust the --image
default.
That way, the command line arguments when using ceph-daemon with a
custom image are a bit shorter.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-11-28 09:06:18 +01:00
David Zafman
9d2e0267e1 test: Add test case based on Xie script in commit comment
Other test fixes to reflect changes

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-11-27 16:29:29 -08:00
David Zafman
0af7e25620 mgr: Fix balancer print
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-11-27 16:29:29 -08:00
Sage Weil
61ba2d7b66 Merge PR #31677 into master
* refs/pull/31677/head:
	qa/standalone/ceph-helpers.sh: remove osd down check
	qa/standalone/ceph-helpers.sh: destroy_osd: mark osd down
	osd: add osd_fast_shutdown option (default true)

Reviewed-by: Sébastien Han <seb@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-11-25 08:54:45 -06:00
Sage Weil
3a62d166a7 qa/standalone/ceph-helpers.sh: remove osd down check
A kill doesn't induce a mark-down of the OSD with osd_fast_shutdown=true.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-24 12:19:33 -06:00
Sage Weil
07193aec3a qa/standalone/test_ceph_daemon.sh: remove old vg before creating
Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
fd6bfad498 qa/standalone/test_ceph_daemon.sh: sudo for untar
The deepsea.tgz tar contains actual device nodes for the OSD block devices
(not symlinks or files).  Must be root to untar.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
723fdb111a qa/standalone/test_ceph_daemon.sh: sudo for losetup etc
Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
cb67545e99 qa/standalone/test_ceph_daemon.sh: fix overwrites of temp files
mktemp creates these files, so we have to pass --allow-overwrite (or
delete them after we get the unique name but before we write to them--this
is easier).

Broken by c7fe27a72a

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
ede1d36773 qa/standalone/ceph-helpers.sh: destroy_osd: mark osd down
Stopping the OSD doesn't guarantee that it will be marked down.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-19 20:05:16 -06:00
Michael Fritch
5cb5e77f50
ceph-daemon: add osd create test(s)
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-11-18 22:30:22 -07:00
Michael Fritch
479e9be91c
ceph-daemon: add standalone adopt tests
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-11-13 16:51:59 -07:00
Sridhar Seshasayee
8819c3c37a
Merge pull request #31416 from sseshasa/wip-41666-replicaSizeWarn
osd/OSDMap: Show health warning if a pool is configured with size 1

Reviewed-by: Sage Weil <sweil@redhat.com>
Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-11-12 12:06:46 +05:30
Sridhar Seshasayee
33c647e811 osd/OSDMap: Show health warning if a pool is configured with size 1
Introduce a config option called 'mon_warn_on_pool_no_redundancy' that is
used to show a health warning if any pool in the ceph cluster is
configured with a size of 1. The user can mute/unmute the warning using
'ceph health mute/unmute POOL_NO_REDUNDANCY'.

Add standalone test to verify warning on setting pool size=1. Set the
associated warning to 'false' in ceph.conf.template under qa/tasks so
that existing tests do not break.

Fixes: https://tracker.ceph.com/issues/41666
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2019-11-11 10:36:35 +05:30
Thomas Bechtold
4258c4772a ceph-daemon: Move ceph-daemon executable to own directory
Moving ceph-daemon into src/ceph-daemon/ makes it simpler to add extra
code (eg. tox.ini, README, unittests, ...) specific to ceph-daemon.
That way related files are in a single directory.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-11-08 17:05:57 +01:00
Sage Weil
5def1df5e8 Merge PR #31064 into master
* refs/pull/31064/head:
	test: Test balancer module commands
	mgr: Improve balancer module status
	mgr: Release GIL before calling OSDMap::calc_pg_upmaps()

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-11-07 10:57:56 -06:00
Thomas Bechtold
5eaf133038 qa/standalone/test_ceph_daemon: Make container images configurable
Instead of hardcoding the images, make them configureable via
environment variables.
That way, downstream can use the script with custom images.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-11-06 16:24:17 +01:00
Thomas Bechtold
b04b8f7398 qa/standalone/test_ceph_daemon: Allow running from root dir
Allow running the script from root directory via:

./qa/standalone/test_ceph_daemon.sh

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-11-06 16:11:46 +01:00
Thomas Bechtold
a004d92ae0 qa/standalone/test_ceph_daemon: Fix hang when CEPH_DAEMON is not set
When running test_ceph_daemon.sh from the root dir and not setting
$CEPH_DAEMON manually, the call hangs at:

$ ./qa/standalone/test_ceph_daemon.sh
[...]
+ for p in $PYTHONS
+ echo '=== re-running with python3 ==='
=== re-running with python3 ===
++ which python3
+ ln -s /usr/bin/python3 /tmp/tmp.6hneCsNMio/python
+ echo '#!/tmp/tmp.6hneCsNMio/python'
+ cat

Check that there is a ceph-daemon found before continue.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-11-06 16:09:55 +01:00
Sage Weil
f5c7a8c986 qa/standalone/test_ceph_daemon: fix multi-version python test
We have to rewrite the shebang line, since it is no longer just
'#/usr/bin/env python' (as of e12ad1b016).

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-03 10:09:06 -06:00
Sage Weil
df40a49eb8 ceph-daemon: use client.admin keyring during bootstrap
It's usually okay to use the mon. key for CLI commands, except we had a
mgr but that prevented you from issuing mgr commands correctly.  We have
the new client.admin key available, so use that instead.

Update tests to not --skip-ssh (now that it doesn't hang).

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-30 14:07:52 -05:00
Sage Weil
debde146d2 qa/standalone/test_ceph_damon.sh: test with python2 and python3
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-28 12:15:47 -05:00
Sage Weil
b830870bbb Merge PR #31130 into master
* refs/pull/31130/head:
	ceph-daemon: only set up crash dir mount if it exists

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-10-25 09:18:20 -05:00
David Zafman
3a0e2c8ff1 test: Test balancer module commands
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-10-24 18:56:19 -07:00
Sage Weil
f56a8db34d ceph-daemon: only set up crash dir mount if it exists
Sometimes we run containers on a host that doesn't have a crash dir set
up (becuase no daemon has been deployed).  Examples include shell and
ceph-volume.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-24 20:06:23 -05:00
David Zafman
4ea43f7342
Merge pull request #31133 from dzafman/wip-42476
ceph-objectstore-tool: call collection_bits() crashes on the meta col…

Reviewed-by: Sage Weil <sage@redhat.com>
2019-10-24 17:23:48 -07:00
David Zafman
2d79e77b6a ceph-objectstore-tool: call collection_bits() crashes on the meta collection
Skip new check for meta collection
test:
    Turn off osd_pool_default_pg_autoscale_mode just like bash tests do
    Fix test by checking for new error message

Caused by: f88b353454

Fixes: https://tracker.ceph.com/issues/42476

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-10-24 11:37:30 -07:00
Sage Weil
29c97547a9 Merge PR #30859 into master
* refs/pull/30859/head:
	auth: EACCES, not EPERM
	mon: shunt old tell commands from cli interface to asok
	mon: allow mgr to tell mon.foo smart
	mon: include quorum features in quorum_status
	qa/workunits/mon/caps.sh: fix test
	ceph_test_rados_api_cmd: fix MonDescribe test
	Merge branch 'vstart-fs-auth' of git://github.com/batrick/ceph into wip-cleanup-mon-asok
	test/pybind/test_ceph_argparse: fix tests
	vstart: add volume client keys to keyring
	vstart: use fs authorize to create master client key
	vstart: redirect some output to stderr
	vstart: output command strings to stderr
	qa/workunits/cephtool/test.sh: fix 'quorum enter' caller
	qa: change mon_status calls to quorum_status or tell commands
	mon: fix 'heap ...' command
	mon: consolidate 'sync force' commands
	mon: allow asok commands to return an error code
	mon: move 'quorum enter|exit' and 'mon_status' to asok
	mon: fix 'smart' asok command
	mon: remove old 'config set' and 'injectargs'

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-10-23 21:05:42 -05:00
Sage Weil
d7bd029b51 test_ceph_daemon: test unit, enter, shell
Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-23 15:08:55 -05:00
Sage Weil
86b2c8dd60 ceph-daemon: drop exec
It's not identical to enter.  enter seems more intuitive to me, but that
may be because I'm not a longtime docker user.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-23 15:08:55 -05:00
Sage Weil
202d615d38 qa/standalone/test_ceph_daemon.sh: add new functional tests
- sudo as needed
- clean up afterward

There is still a bit of missing coverage, but this captures most of it.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-23 15:08:55 -05:00
Sage Weil
70367de903 qa: change mon_status calls to quorum_status or tell commands
The tests were doing logs of 'ceph mon_status'; change that to
quorum_status or tell.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-12 12:05:36 -05:00
Sage Weil
1e44d86b2c osd: change trigger_[deep_]scrub tommands to a pg tell command
This is cleaner.  All users are currently standalone tests; updated.

It also means that *all* commands that have a name=pgid arg are pg tell
commands.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-04 09:07:02 -05:00
Sage Weil
d8d2b71db5 qa/standalone/mon/health-mute: use power of 2 for pg_num
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-26 09:29:32 -05:00
Sage Weil
ab594b9b31 Merge PR #30475 into master
* refs/pull/30475/head:
	qa/standalone/ceph-helpers: default pg autoscale mode off for standalone
	os/bluestore: fix objectstore_blackhole read-after-write
	test,misc: do not specify pg_num per pool
	mgr/volumes: do not specify pg_num
	pybind/ceph_volume_client: do not specify pg_num for new pools
	doc: remove all pg_num arguments to 'osd pool create'
	mon: do not require pg_num to 'osd pool create'
	common: default pg_autoscale_mode=on for new pools

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-09-23 09:12:42 -05:00
Sage Weil
f71672c6ad qa/standalone/ceph-helpers: default pg autoscale mode off for standalone
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-22 16:59:07 -05:00
Sage Weil
8994a65242 qa/standalone/osd/divergent-priors: add reproducer for bug 41816
Reproducer for https://tracker.ceph.com/issues/41816

Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-21 10:09:15 -05:00
David Zafman
b3e1c58b0e osd: Replace active/pending scrub tracking for local/remote
This is similar to how recovery reservations are split between
local and remote.

It was the case that scrubs_pending was used for reservations at
the replicas as well as at the primary while requesting reservations
from the replicas.  There was no need for scrubs_pending to turn
into scrubs_active at the primary as nothing treated that value
as special.  scrubber.active = true when scrubbing is
actually going.

Now scurbber.local_reserved indicates scrubs_local incremented
Now scrubber.remote_reserved indicates scrubs_remote incremented

Fixes: https://tracker.ceph.com/issues/41669

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-10 13:33:27 -07:00
David Zafman
b98950e707 osd: Rename dump_reservations to dump_recovery_reservations
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-10 13:32:29 -07:00
David Zafman
6d2e4cb109 test: Allow fractional milliseconds to make test possible
Fixes: https://tracker.ceph.com/issues/41689

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-06 11:23:52 -07:00
David Zafman
336b6b66ca
Merge pull request #28755 from dzafman/wip-network
feature: Health warnings on long network ping times, add "dump_osd_network" to get a report

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-09-05 07:54:43 -07:00
David Zafman
5f83a6158b osd doc mon mgr: To milliseconds for config value, user input and threshold out
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-04 17:13:32 +00:00
David Zafman
87d80eb417 test: ceph-objectstore-tool add remove --force with bad snapset test
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-08-27 22:30:02 +00:00
David Zafman
4fb42ea27e test: Add basic test for network ping tracking
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-08-26 15:25:34 +00:00
Sage Weil
2dca76ac84 Merge PR #29774 into master
* refs/pull/29774/head:
	qa/standalone/scrub/osd-scrub-snaps: snapmapper omap is now 'm'

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-08-22 12:27:26 -05:00