Commit Graph

658 Commits

Author SHA1 Message Date
David Zafman
a4fd1d650e Revert "qa/standalone/scrub/osd-recovery-scrub: fix unnoticed recovery state"
This reverts commit 1323bdb839e77fb27cba36ef2725bb7f163b1db4.

The tests needs to scrub while recovery is in progress, so catching
recovery from the logs after the fact isn't the proper setup.
We can use osd_recovery_sleep config.

Signed-off-by: David Zafman <dzafman@redhat.com>
2021-03-13 11:40:55 -08:00
David Zafman
dd63577ab3 test: Add test for scrub parallelism
Signed-off-by: David Zafman <dzafman@redhat.com>
2021-03-05 11:41:26 -08:00
Sage Weil
5e197a21e6 Merge PR #39455 into master
* refs/pull/39455/head:
	doc/man/8/ceph: document --max option
	src/test/osd/safe-to-destroy: adjust test
	ceph: print command output to stdout even on error
	mgr/DaemonServer: include details in 'osd ok-to-stop' output
	mgr: add --max <n> to 'osd ok-to-stop' command
	mgr: relax osd ok-to-stop condition on degraded pgs

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-02-27 10:15:27 -05:00
Sage Weil
33dee7d7bf crush/CrushWrapper: update shadow trees on update_item()
insert_item() already does this, but update_item did not.

Fixes: https://tracker.ceph.com/issues/48065
Signed-off-by: Sage Weil <sage@newdream.net>
2021-02-22 14:21:04 -06:00
Sage Weil
722f57dee1 mgr: add --max <n> to 'osd ok-to-stop' command
Given and initial (set of) osd(s), if provide up to N OSDs that can be
stopped together without making PGs become unavailable.

This can be used to quickly identify large(r) batches of OSDs that can be
stopped together to (for example) upgrade.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-02-20 09:53:51 -05:00
Kefu Chai
8dc097ff46 qa/standalone/mon/misc: verify that len(monmap.features.persistent) == 9
in beb62c029a, FEATURE_QUINCY was added to
ceph::features::mon::get_persistent(), so update the test accordingly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-01-30 22:45:20 +08:00
Sage Weil
7bbc92eda3 mon: updates for quincy
Signed-off-by: Sage Weil <sage@newdream.net>
2021-01-28 13:29:28 -06:00
Neha Ojha
5c11f40c12
Merge pull request #38856 from dzafman/wip-48789
test: Fix osd-scrub-scaps.sh to handle DB format change

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-01-15 16:27:59 -08:00
Neha Ojha
6fc9166af4
Merge pull request #38726 from ronen-fr/wip-ronenf-48720
qa/standalone/scrub/osd-recovery-scrub: handle primary change when waiting for scrub

Reviewed-by: David Zafman <dzafman@redhat.com>
2021-01-15 13:46:30 -08:00
David Zafman
af9befb0f4 test: Fix osd-scrub-scaps.sh to handle DB format change
Caused by: f9c95fa7fc

Fixes: https://tracker.ceph.com/issues/48789

Signed-off-by: David Zafman <dzafman@redhat.com>
2021-01-15 10:35:30 -08:00
David Zafman
4814648155 test: osd-recovery-prio.sh replace sleep with wait for both PGs
recovering

fixes: https://tracker.ceph.com/issues/48842

Signed-off-by: David Zafman <dzafman@redhat.com>
2021-01-11 17:30:00 -08:00
Ronen Friedman
1323bdb839 qa/standalone/scrub/osd-recovery-scrub: fix unnoticed recovery state
The 'recovering' state is transitory. Existing code looks for it by
polling 'pg stat', missing from time to time.
New version searches the tails of the relevant OSDs' logs.

Fixes: https://tracker.ceph.com/issues/48719
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-01-04 13:29:41 +02:00
Ronen Friedman
bb848cfd90 qa/standalone/ceph-helpers.sh: log meaningful PIDs for run_in_background()
While the relevant comment says:
'# Execute the command and prepend the output with its pid'
the actual PID logged is the same for all background processes,
which isn't very helpful.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2020-12-28 10:47:02 +02:00
Ronen Friedman
445db7f171 qa/standalone/scrub/osd-recovery-scrub: handle a Primary change
Stop waiting for a scrub to happen if the Primary for the target
PG changes.

Fixes: https://tracker.ceph.com/issues/48720
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2020-12-28 10:42:41 +02:00
Ronen Friedman
dff7faaf3c qa/standalone/scrub/osd-scrub-snaps.sh: fix Python print syntax
Fixes: https://tracker.ceph.com/issues/48690
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2020-12-21 16:52:27 +02:00
Kefu Chai
694ed23e9d qa/standalone/misc/ver-health.sh: include the bootup-time
in my test bed, it takes 11 seconds to boot the 3 OSDs and to restart
one of them, this fails the test.

so we need to take the time into consideration. in this change, the
delay is added to the total "warn_older_version_delay", so the monitor
does not start sending warning earlier than expected.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-12-11 16:14:03 +08:00
Kefu Chai
4bcfa139ab mon/HealthMonitor: use timespan for mon_warn_older_version_delay
for better user experience

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-12-11 16:12:47 +08:00
Kefu Chai
1f5406a752 src/*: do not pass cct to ceph_version_to_str()
in e5b1ae5554, a new option named
"debug_version_for_testing" is introduced to override the version so
we can test version check.

in crimson, we have two families of shared functions.

- one of them is used by alien store. they are compiled with
  -DWITH_SEASTAR and -DWITH_ALIEN, to enable the shim code between
  seastar and POSIX thread.
- another is used by crimson in general. where no lock is allowed.

currently, we use the "crimson" and "ceph" namespace to differentiate
these two families of functions, so they can colocate in the same
executable without violating the ODR. see src/include/common_fwd.h for
more details.

the functions defined in src/common/version.cc are also shared by
alien store and crimson code. and because we have different
implementations of `CephContext` in crimson and in classic OSD (i.e.
alienstore), we have to have different implementations of this function
as well, if we follow the same approach. but since these functions are
very simple and are non-blocking, there is not much value in
differentiating them, it is better to inject the test settings using
environment variable instead of using ceph option subsystem.

in this change, "ceph_debug_version_for_testing" environment variable is
checked instead, so that crimson and alienstore can share the same
compilation unit of version.cc. and "debug_version_for_testing" option
is removed.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-12-10 18:26:39 +08:00
Ronen Friedman
43b1129030 test: cancelling both noscrub *and* nodeep-scrub
as part of osd-scrub-test.sh.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2020-12-09 20:16:23 +02:00
haoyixing
0e7e036aa7 doc/dev: use http://docs.ceph.com/en/latest/ instead of /docs/master/ for docs
Several links under http://docs.ceph.com/docs/master/ were unable to access.
Change them to http://docs.ceph.com/en/lastest so we can access them directly.

Signed-off-by: haoyixing <haoyixing@kuaishou.com>
2020-11-24 12:49:47 +08:00
David Zafman
89af82bf4f
Merge pull request #38054 from dzafman/wip-test-fixes
test: Fix osd-scrub-test.sh and ver-health.sh tests

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-11-18 08:52:28 -08:00
David Zafman
38c3130654 test: Fix TEST_scrub_extended_sleep test (corrected test name)
Didn't really test extended sleep in original code:
Cause by: 3bfb5c2621cf9b5e602bc37724b20c18eb852aea

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-16 18:30:14 -08:00
David Zafman
0a0ed890c2 test: Improve version checking test, to improve reliability
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-16 18:30:14 -08:00
Kefu Chai
0463a774c9
Merge pull request #37908 from dzafman/wip-47930
test: Fix race in TEST_recovery_scrub test

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-11-16 01:00:56 +08:00
David Zafman
870bde04a5 test: Changes based on code review comments
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-11 15:31:26 -08:00
David Zafman
93373746f5 osd test: Delay reporting until mon_warn_older_version_delay has passed
Move release notes description to 16.0.0 and update
Update documentation

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-11 15:10:11 -08:00
David Zafman
9d988c3dbc test: Simple test case for version health warning
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-11 15:10:11 -08:00
David Zafman
410e230d09 test: Fix race in TEST_recovery_scrub test
Fixes: https://tracker.ceph.com/issues/47930

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-10 00:45:13 +00:00
David Zafman
d3cc647583 osd: Eliminate day of weeek 7 and hour 24
Add test case for permitted hours to make sure scrub doesn't start
Remove permitted hours in extended sleep test

Fixes: https://tracker.ceph.com/issues/48077

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-09 22:47:00 +00:00
David Zafman
ef47a3e708 test: set mon_allow_pool_size_one for consistency with original test intention
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-03 21:49:00 +00:00
Neha Ojha
343107766e
Merge pull request #37483 from dzafman/wip-46405
osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning:  return 1

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-10-08 11:44:00 -07:00
David Zafman
3ba7ebd3e2 test: Avoid races by waiting for PGs go clean before query
Fixes: https://tracker.ceph.com/issues/46405

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-10-01 19:43:57 +00:00
David Zafman
b20a277f05 test: Inconsequential change to get object names as desired
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-09-29 18:01:24 +00:00
Prashant D
f8b7fddc4c mon: validate crush-failure-domain
While creating erasure-coded profile make sure
that user is specifying valid crush-failure-domain.

Fixes: https://tracker.ceph.com/issues/47452

Signed-off-by: Prashant Dhange <pdhange@redhat.com>
2020-09-22 07:27:22 -04:00
Patrick Donnelly
7eceaf45de
Merge PR #37202 into master
* refs/pull/37202/head:
	mon: allow overriding the initial mon_host

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-09-18 18:54:57 -07:00
Neha Ojha
8ba0a61a51
Merge pull request #35906 from gregsfortytwo/wip-stretch-mode
Add a new stretch mode for 2-site Ceph clusters

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2020-09-18 14:31:45 -07:00
Patrick Donnelly
ed3782e60a
mon: allow overriding the initial mon_host
This overrides what the CephContext believes to be the current quorum of
monitors (retrieved from other instances of the MonClient), introduced
by [1]. Tests need to be able to target a specific monitor for
exercising forwarding and other things.

[1] 731e2db9fb4611f767446a3c8e778a097ce70d35
Fixes: https://tracker.ceph.com/issues/47180
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2020-09-16 18:34:23 -07:00
Greg Farnum
9506d09e3b Merge remote-tracking branch 'origin/master' into wip-stretch-mode
Conflicts:
	src/include/ceph_features.h

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2020-09-15 02:25:07 +00:00
David Zafman
5b0ba0e5a8 test: Modify test to check new feature might_have_unfound added to list_unfound
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-09-14 13:06:29 -07:00
Greg Farnum
d02625331c Merge remote-tracking branch 'origin/master' into wip-stretch-mode 2020-09-14 02:32:19 +00:00
Kefu Chai
e5b9b08cc4
Merge pull request #36962 from tchaikov/wip-qa-py3-cleanup
qa: py3 cleanups

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-09-10 09:39:20 +08:00
Neha Ojha
21c08f0be2 qa/*/mon/mon-last-epoch-clean.sh: mark osd out instead of down
The test should mark the OSD out to check if only "in" OSDs are considered by
the osdmap trimming logic.

Fixes: https://tracker.ceph.com/issues/47309
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-09-04 22:09:05 +00:00
Kefu Chai
5c758f63aa qa/standalone: always decode output from check_output()
we could pass `text=True` for better readability, but that's introduced
in python3.7, or pass `error="ignore"` but it's too long.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-09-03 13:09:16 +08:00
Kefu Chai
eda90040ad qa: always use subprocess.{DEVNULL,check_output}
no need to check for their existence, and prepare a replacement.
because we've migrated to python3. and we only support python3.6 and up.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-09-03 13:09:16 +08:00
Kefu Chai
4f6443737e
Merge pull request #30838 from ifed01/wip-ifed-single-alloc
os/bluestore: use single allocator for shared bluestore/bluefs device

Reviewed-by: Sage Weil <sage@redhat.com>
2020-08-03 18:00:16 +08:00
Igor Fedotov
9a8f1ae492 os/bluestore: fix bluefs migrate/expand to match single allocator.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2020-07-31 15:36:47 +03:00
Dan van der Ster
b550112dba qa/standalone/osd: add bad-inc-map.sh
Test that the osd doesn't crash when it gets a bad incremental osdmap.

Related-to: https://tracker.ceph.com/issues/46443
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
2020-07-28 23:15:42 +02:00
David Zafman
365e48d6ec test: Check for interuption of scrubs with nosrub/nodeep_scrub
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-07-24 11:41:20 -07:00
David Zafman
f272768802 test: mon-last-epoch-clean.sh fixed to avoid shell globbing
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-07-24 11:40:24 -07:00
Greg Farnum
3ee09571dc qa: update the mon/misc.sh script for the new feature count
I have absolutely no idea why it's counting features, but
apparently it is and bumping the value to 7 makes it pass.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2020-07-20 07:08:50 +00:00
Kefu Chai
0ac787be2a qa/standalone: drop py2 support
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-07-05 10:58:28 +08:00
Kefu Chai
48f0e02d76 qa/standalone: flake8 fixes
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-06-23 23:01:27 +08:00
Neha Ojha
64bcd436cc
Merge pull request #35632 from dzafman/wip-46064
tools: Add statfs operation to ceph-objecstore-tool

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-06-18 16:25:04 -07:00
David Zafman
19054ceb43 tools: Add statfs operation to ceph-objecstore-tool
Fixes: https://tracker.ceph.com/issues/46064

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-06-18 10:07:38 -07:00
David Zafman
41322eaa62 test: flush_pg_stats() ignore OSDs that don't respond to getting sequence
This eliminates bogus errors in the logs and returned from flush_pg_stats()

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-06-16 17:45:26 -07:00
David Zafman
661996d434 mgr: Warn when too many reads are repaired on an OSD
Include test case
Configurable by setting mon_osd_warn_num_repaired (default 10)
Ignore new health warning with random eio injection test

Fixes: https://tracker.ceph.com/issues/41564

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-06-16 17:45:27 -07:00
David Zafman
1efa5ca0a6
Merge pull request #35425 from dzafman/wip-44314
test: osd-backfill-stats.sh use nobackfill to avoid races in remainin…

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-06-09 17:15:52 -07:00
David Zafman
92f970cbed test: osd-backfill-stats.sh use nobackfill to avoid races in remaining test
Fixes: https://tracker.ceph.com/issues/44314

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-06-05 17:48:10 -07:00
Yuri Weinstein
b8f632327f
Merge pull request #35279 from badone/wip-py2-fix-osd-scrub-repair.sh
qa/*/osd-scrub-repair.sh: Convert to python3 print syntax

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-06-03 11:12:21 -07:00
Neha Ojha
3a06af5af5 qa/standalone/scrub/osd-scrub-snaps.sh: fix grep pattern
The error looks like this:

2020-05-28T20:56:30.214+0000 7f66cdecf700 -1 log_channel(cluster) log [ERR] : scrub 1.0 1:ab946124:::obj15:head : can't decode 'snapset' attr void SnapSet::decode(ceph::buffer::v15_2_0::list::const_iterator&) no longer understand old encoding version 3 < 97: Malformed input

Fixes: https://tracker.ceph.com/issues/45760
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-05-28 22:41:38 +00:00
Neha Ojha
f72b19d09c qa/standalone/scrub/osd-scrub-repair.sh: fix grep pattern to match decode exception
We fail because the error message in the log looks like:

2020-05-27T21:02:48.447+0000 7fbfc4e60700 -1 log_channel(cluster) log [ERR] : scrub 3.0 3:5c7b2c47:::ROBJ16:head : can't decode 'snapset' attr void SnapSet::decode(ceph::buffer::v15_2_0::list::const_iterator&) no longer understand old encoding version 3 < 97: Malformed input

Fixes: https://tracker.ceph.com/issues/45660
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-05-28 00:38:17 +00:00
Brad Hubbard
80e7b7c19b qa/*/osd-scrub-repair.sh: Convert to python3 print syntax
Fixes: https://tracker.ceph.com/issues/45733

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2020-05-28 08:32:54 +10:00
Neha Ojha
7c8b627eaa qa/*/osd-scrub-repair.sh: don't fail if PG is in active+clean+wait
a0b453ad33 added the wait state, which can
make PGs stay in active+clean+wait for a while instead of going into
active+clean directly. As far as TEST_auto_repair_bluestore_failed is
concerned, we only care about the repair state being cleared.

Fixes: https://tracker.ceph.com/issues/45075
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-04-23 20:24:28 +00:00
Neha Ojha
4f82ebf41b qa/standalone/scrub/osd-scrub-repair.sh: fix race in TEST_auto_repair_bluestore_failed
We need to flush_pg_stats before checking for active+clean.

Fixed: https://tracker.ceph.com/issues/45075
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-04-20 18:29:51 +00:00
Neha Ojha
61ad12e6ad
Merge pull request #34541 from neha-ojha/wip-balancer-on
mgr: turn on balancer in upmap mode by default

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2020-04-15 15:03:28 -07:00
Kefu Chai
eff9d0fc9a
Merge pull request #19076 from jecluis/wip-mon-fix-osdmap-lec-trim
mon/OSDMonitor: allow trimming maps even if osds are down

Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-04-15 08:02:51 +08:00
Neha Ojha
ec85af5b19 qa/standalone/mon/osd-pool-df.sh: flush_pg_stats explicitly
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-04-14 19:09:45 +00:00
Neha Ojha
321faa9c6b qa/standalone/mon/osd-pool-df.sh: fix test to check for the right values
Though the test passed, we weren't checking for the correct values:

.../qa/standalone/mon/osd-pool-df.sh:62: TEST_ceph_df:  ceph df -f json
.../qa/standalone/mon/osd-pool-df.sh:62: TEST_ceph_df:  jq .stats.total_avail_bytes
../qa/standalone/mon/osd-pool-df.sh:62: TEST_ceph_df:  local global_avail=0
.../qa/standalone/mon/osd-pool-df.sh:63: TEST_ceph_df:  ceph df -f json
.../qa/standalone/mon/osd-pool-df.sh:63: TEST_ceph_df:  jq '.pools | map(select(.name == "$rep_poolname"))[0].stats.max_avail'
../qa/standalone/mon/osd-pool-df.sh:63: TEST_ceph_df:  local rep_avail=null
.../qa/standalone/mon/osd-pool-df.sh:64: TEST_ceph_df:  ceph df -f json
.../qa/standalone/mon/osd-pool-df.sh:64: TEST_ceph_df:  jq '.pools | map(select(.name == "$ec_poolname"))[0].stats.max_avail'
../qa/standalone/mon/osd-pool-df.sh:64: TEST_ceph_df:  local ec_avail=null
../qa/standalone/mon/osd-pool-df.sh:66: TEST_ceph_df:  echo '0 >= null*3'
../qa/standalone/mon/osd-pool-df.sh:66: TEST_ceph_df:  bc
1
../qa/standalone/mon/osd-pool-df.sh:67: TEST_ceph_df:  echo '0 >= null*1.5'
../qa/standalone/mon/osd-pool-df.sh:67: TEST_ceph_df:  bc
1

Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-04-14 00:05:02 +00:00
Neha Ojha
480afa61b6 qa/standalone/mgr/balancer.sh: adapt test
Now that the balancer is on by default the test needs these changes.

Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-04-14 00:05:02 +00:00
Sage Weil
731e508bbe qa/standalone/mon/msgr-v2-transition: remove test
v2 was introduced in nautilus, and we don't support mimic -> pacific
upgrades (only mimic -> octopus).  This test can be removed!

Signed-off-by: Sage Weil <sage@redhat.com>
2020-04-08 08:10:32 -05:00
Sage Weil
279c437994 qa/standalone/mon/misc: update TEST_mon_features
Signed-off-by: Sage Weil <sage@redhat.com>
2020-04-08 08:10:32 -05:00
Kefu Chai
b1738cd1ef qa/standalone/scrub: s/$(pgid)/${pgid}/
to address the test failures like
```
2020-04-07T15:44:58.693 INFO:tasks.workunit.client.0.smithi049.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:498: TEST_auto_repair_bluestore_failed:  ceph pg dump
pgs
2020-04-07T15:44:58.694 INFO:tasks.workunit.client.0.smithi049.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:498: TEST_auto_repair_bluestore_failed:  pgid
2020-04-07T15:44:58.694 INFO:tasks.workunit.client.0.smithi049.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh: line 498: pgid: command not found
```

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-04-08 00:54:46 +08:00
Sage Weil
04e0b9c2f8 Merge PR #34126 into master
* refs/pull/34126/head:
	qa/*/osd-backfill-recovery-log.sh: flush_pg_stats before checking log length

Reviewed-by: Sage Weil <sage@redhat.com>
2020-03-23 13:55:16 -05:00
Neha
cfebec1b12 qa/*/osd-backfill-recovery-log.sh: flush_pg_stats before checking log length
It is possible for the pg dump to not be the latest when we check for newprimary
in _common_test(). This is because mgr_stats_period is 5 seconds, and we may not
have fetched the latest stats just yet. This causes the test to look at the same
stats before and after wait_for_clean.

Fixes: https://tracker.ceph.com/issues/43807 (2)
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-03-23 15:37:12 +00:00
Joao Eduardo Luis
3d682c21f6 qa/standalone: exercise osdmon's last epoch clean
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
2020-03-23 14:58:59 +00:00
Kefu Chai
b0dca75a59
Merge pull request #34056 from xiexingguo/wip-44662
qa/*/osd-markdown.sh: propagate map to osd before testing its reaction

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-03-21 14:27:51 +08:00
xie xingguo
afdff0cd3f qa/*/osd-markdown.sh: propagate map to osd before testing its reaction
Mon might fail to share the newest map with any of up osds, e.g.,
due to an injected broken pipe. Since we don't have any client
activities during the osd-markdown tests, osds might be unaware of
the map changes made through CLI. Make sure osds have pulled the
newest map down before we can test its reaction correctly.

Fixes: https://tracker.ceph.com/issues/44662
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2020-03-19 18:17:28 +08:00
Neha
6edd1cb686 qa/standalone/osd/osd-backfill-stats.sh: get_latest_osdmap to propagate map change
Fixes: https://tracker.ceph.com/issues/44518
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-03-18 22:57:41 +00:00
Sage Weil
603383605f Merge PR #33885 into master
* refs/pull/33885/head:
	Merge pull request #33848 from mchangir/octopus-tests-remove-suprious-whitespace
	Merge PR #33746 into octopus
	Merge PR #33830 into octopus
	Merge PR #33732 into octopus
	Merge PR #33620 into octopus
	Merge pull request #33876 from tchaikov/octopus-cephadm-mypy
	cephadm: add "assert foo is not None" for mypy check
	Merge pull request #33067 from tspmelo/wip-rbd-delete-with-snapshot
	cephadm: add grafana adopt
	Merge PR #33771 into octopus
	Merge PR #33850 into octopus
	Merge PR #33853 into octopus
	Merge PR #33857 into octopus
	Merge PR #32990 into octopus
	Merge PR #33713 into octopus
	Merge PR #33838 into octopus
	qa/tasks/cephadm: no default mon|mgr|crash service specs
	qa/suites/rados/cephadm/upgrade: upgrade start point that supports the no-spec option
	Merge PR #33832 into octopus
	cephadm: bootstrap: wait for mgr to restart after enabling a module
	mgr: add 'mgr_status' tell command
	Merge pull request #33839 from rhcs-dashboard/44538-fix-rgw-grafana-get-put-latencies
	Merge pull request #33743 from votdev/issue_43869_fix_qa_test
	cephadm: create initial mon and mgr service specs too
	cephadm: no need to pregenerate a crash key for the bootstrap host
	mgr/cephadm: do not complain when we don't have enough hosts
	mgr/cephadm: remove orphan daemons
	mgr/cephadm: report size=0 for fabricated ServiceDescription
	mgr/cephadm: safety check to prevent removing all mon|mgr daemons
	mgr/cephadm: prevent scaling mon|mgr below count=1
	mgr/cephadm: do not remove daemons from remove_service
	Merge pull request #33805 from tchaikov/wip-44500
	spec: Podman (temporarily) requires apparmor-abstractions on suse
	mgr/cephadm: Make sure we don't co-locate the same daemon
	monitoring: fix RGW grafana chart 'Average GET/PUT Latencies'
	tests: remove spurious whitespace
	mgr/cephadm: fix service list filtering
	Merge PR #33825 into octopus
	Merge PR #33811 into octopus
	Revert "Merge pull request #33673 from cbodley/wip-denc-enum"
	mgr/cephadm: fix upgrade order
	Merge PR #33801 into octopus
	Merge PR #33822 into octopus
	cephadm: bootstrap: tolerate error return from -h
	Merge PR #33809 into octopus
	Merge PR #32678 into octopus
	cephadm: use `sh` instead of `bash` during enter
	ceph.in: only shut down rados on clean exit
	common/ceph_timer: Pass reference to waited time on stack
	common/ceph_timer: Add test
	common/ceph_timer: Use unique_function, allowing noncopyable events
	common/ceph_timer: Couple cleanups
	common/ceph_timer: Fix namespaces
	common/ceph_timer: Add missing includes
	common/ceph_timer.h: Don't indent contents of a namespace
	mgr/dashboard: Crush rule modal
	mgr/dashboard: Preserve rule selection on pool type change
	mgr/dashboard: Crush rule is only send during replicated pool creation
	mgr/dashboard: Explicit returns in pool form
	mgr/dashboard: Removes fork join in pool form
	mgr/dashboard: Hide ECP actions during ec pool edit
	mgr/dashboard: Pool form erasure/replicated boolean
	mgr/dashboard: Change pool info API endpoint
	mgr/dashboard: Moves ECP info endpoint to UI-API
	mgr/cephadm: add _remove_osds_bg back to main loop
	mgr/cephadm/osd: update removal report immediately
	qa/tasks/ceph_manager: use StringIO for capturing COT output
	qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
	qa/standalone/scrub/osd-scrub-test: wait longer for update
	qa/tasks/ceph_manager: capture stderr for COT
	qa/suites/rados/ceph: drop opensuse for now
	mon/MonClient: send logs to mon on separate schedule than pings
	mgr/dashboard: Fix missing ImageSpec usage
	mgr/dashboard: Allow removing RBD with snapshots
	mgr/dashboard: Refactor and cleanup tasks.mgr.dashboard.test_user
	mgr/dashboard: support multiple DriveGroups when creating OSDs
	mon/MonClient: send logs to mon even if we have no keelalive2
	cephadm: flag dashboard user to change password

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2020-03-11 17:38:59 -05:00
Neha Ojha
6117a0d4db
Merge pull request #33281 from ideepika/wip-set-osd-pool-size-extra-param-check
mon/OSDMonitor: add flag `--yes-i-really-mean-it` for setting pool size 1

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-03-09 19:14:50 -07:00
Sage Weil
3212932ba1 Merge PR #33809 into octopus
* refs/pull/33809/head:
	qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
	qa/standalone/scrub/osd-scrub-test: wait longer for update

Reviewed-by: David Zafman <dzafman@redhat.com>
2020-03-09 15:28:19 -05:00
Deepika Upadhyay
21508bd9dd mon/OSDMonitor: add flag --yes-i-really-mean-it for setting pool size 1
Adds option `mon_allow_pool_size_one` which will be disabled by default
to ensure pools are not configured without replicas.
If the user still wants to use pool size 1, they will have to change the
value of `mon_allow_pool_size_one` to true and then have to pass flag
`--yes-i-really-mean-it` to cli command:

Example:
`ceph osd pool test set size 1 --yes-i-really-mean-it`

Fixes: https://tracker.ceph.com/issues/44025
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
2020-03-09 23:27:36 +05:30
Sage Weil
0447ed0ff9 qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
flush_pg_stats isn't sufficient to ensure that OSDs have the latest
OSDMap.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-08 14:52:10 -05:00
Sage Weil
ac9befd450 qa/standalone/scrub/osd-scrub-test: wait longer for update
Fixes: https://tracker.ceph.com/issues/43865
Signed-off-by: Sage Weil <sage@redhat.com>
2020-03-08 14:45:00 -05:00
David Zafman
e509b7c7d0 test: Add flush_pg_stats to avoid race with getting num_shards_repaired
Fixes: https://tracker.ceph.com/issues/44439

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-03-06 04:25:37 +00:00
Kefu Chai
c6088bdd26
Merge pull request #33593 from dzafman/wip-cot-fix
test: Fix failing ceph_objectstore_tool.py test

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-03-02 18:58:19 +08:00
Kefu Chai
7b0e18c09e
Merge pull request #33566 from dzafman/wip-44296
test: Expect being off by up to 2 and make sure all PGs are active+clean

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2020-02-28 11:42:47 +08:00
David Zafman
08f7e7980f test: Fix failing ceph_objectstore_tool.py test
The -N option to vstart.sh was removed, use -k

Old hinfo_key binary happen to be utf-8 decodable, now it
throws an exception trying to decode it. Use new
option to ceph-objectstore-tool to treat stdout as a terminal
and convert binary data to base64.

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-02-27 18:14:36 -08:00
David Zafman
49d9c7d664 test: Expect being off by up to 2 and make sure all PGs are active+clean
Fixes: https://tracker.ceph.com/issues/44296

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-02-27 18:12:25 -08:00
David Zafman
587cd64207
Merge pull request #32342 from dzafman/wip-43126
mon: Improvements to slow heartbeat health messages

Reviewed-by: Sage Weil <sage@redhat.com>
2020-02-25 17:42:00 -08:00
Sage Weil
4d42b4c5a0 common/TextTable: default to 2 spaces separating columns
This is what other projects and libraries default to, and it is more
legible.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-23 15:46:30 -06:00
Sage Weil
5afec0fbfb Merge PR #33091 into master
* refs/pull/33091/head:
	qa/suites/rados: disable device scraping
	qa/standalone/ceph-helpers: disable device monitoring
	qa/tasks/ceph.py: add pre-mgr-commands option for ceph task
	mgr/devicehealth: set default monitoring to 'on'

Reviewed-by: Sage Weil <sage@redhat.com>
2020-02-22 12:05:55 -06:00
xie xingguo
023524a26d osd/PeeringState: restart peering on any previous down acting member coming back
One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
  pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
  hence leaving the corresponding pg pinning undersized for a long time until all backfill
  targets completes

It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.

The right way to achieve the above goal is for:

	boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)

to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
2020-02-21 17:52:52 +08:00
Sage Weil
455cdcf89a qa/standalone/ceph-helpers: disable device monitoring
Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-19 15:31:26 -06:00
Sage Weil
f10cc22c60 Merge PR #32961 into master
* refs/pull/32961/head:
	qa/standalone/osd/osd-bench: debug bluestore

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-30 10:42:17 -06:00
Sage Weil
b99e506a3f qa/standalone/osd/osd-bench: debug bluestore
Looking for https://tracker.ceph.com/issues/43888

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 07:43:41 -06:00
David Zafman
e18519ad09 test: Update pg log test for new trimming behavior
Fixes: https://tracker.ceph.com/issues/43864

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-28 15:23:45 -08:00
Neha
b20817795a qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_2
Fixes: https://tracker.ceph.com/issues/43807
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-01-24 22:42:04 +00:00
Neha
994698277b qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_1
Fixes: https://tracker.ceph.com/issues/43807
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-01-24 22:20:21 +00:00
Sage Weil
76ea774c10 qa/standalone/misc/ok-to-stop: improve test
Make sure PGs peer (simply flushing state to mon isn't enough).

Fixes: https://tracker.ceph.com/issues/43721
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 13:24:30 -06:00
Sage Weil
78ec6aec90 qa/standalone/ceph-helpers: add wait_for_peered
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 13:23:56 -06:00
Sage Weil
c5710bc8fb Merge PR #32628 into master
* refs/pull/32628/head:
	test: Fix wait_for_state() to wait for a PG to get into a state

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-18 14:39:19 -06:00
Sage Weil
65fbc620b6 qa/standalone/mon/osd-create-pool: fix utf-8 grep LANG
This needs en_US.UTF-8... en_US does not work.

Fixes: https://tracker.ceph.com/issues/43422
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-17 14:19:53 -06:00
David Zafman
886475b5fe mon: Improvements to slow heartbeat health messages
Include crush parentage for each osd

Fixes: https://tracker.ceph.com/issues/43126

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-14 18:06:44 +00:00
David Zafman
9f7aabbe9f test: Fix wait_for_state() to wait for a PG to get into a state
To avoid confusion fix function names in osd-backfill-space.sh for how
they actually work.

Fixes: https://tracker.ceph.com/issues/43592

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-13 18:39:38 -08:00
David Zafman
c65d5c8d14 test: Sort pool list because the order isn't guaranteed from "balancer pool ls"
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-06 21:35:19 -08:00
David Zafman
b0a1b758d0 mgr: Change default upmap_max_deviation to 5
Fixes: https://tracker.ceph.com/issues/43312

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-06 21:35:19 -08:00
David Zafman
8e46bbbf36 test: Fix test case for pool based balancing instead of rule batched
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-06 21:35:19 -08:00
Sage Weil
acd4f5bc43 qa/standalone: python -> python3
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 13:33:21 -06:00
Sage Weil
e47526e152 qa/standalone/special/ceph_objectstore_tool: python3
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-20 13:32:53 -06:00
Thomas Bechtold
0127cd1e88 qa: Enable flake8 tox and fix failures
There were a couple of problems found by flake8 in the qa/
directory (most of them fixed now). Enabling flake8 during the usual
check runs hopefully avoids adding new issues in the future.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-12-12 10:21:01 +01:00
Sage Weil
137fa64e12 qa: rename ceph-daemon tests -> cephadm
Also move the workunit to a better location.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Sage Weil
c8750b7066 files,rpm,deb: rename ceph-daemon -> cephadm
This is just renaming the files and adjusting the packages.  Lots of
cleanup to do still.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 19:14:09 -06:00
Sage Weil
80cbe97e7b qa/standalone/test_ceph_daemon.sh: disable adoption for the moment
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-11 07:32:29 -06:00
Sage Weil
6d3a035b26
qa/standalone/test_ceph_daemon.sh: clone corpus explicitly
When this is run by teuthology we don't have a full ceph source tree
checkout with submodules.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-10 11:34:42 -07:00
Michael Fritch
4aa7d5582b
ceph-daemon: re-enable the OSD standalone test
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-10 11:34:42 -07:00
Michael Fritch
a0eed4cb84
ceph-daemon: move standalone test tgz to corpus
Fixes: https://tracker.ceph.com/issues/42876
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-10 11:32:18 -07:00
Sage Weil
0e981c4c30 Merge PR #32138 into master
* refs/pull/32138/head:
	ceph-daemon: combine SUDO and ARGS into a single var

Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-12-10 12:16:07 -06:00
Kefu Chai
a11ae900e9
Merge pull request #32052 from mgfritch/wip-cd-standalone-tempfiles
ceph-daemon: clean-up tempfiles on EXIT

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-12-10 16:57:48 +08:00
Sage Weil
dcb5e9b6d8 Merge PR #32098 into master
* refs/pull/32098/head:
	ceph-daemon: py2: tolerate whitespace before config key name

Reviewed-by: Sebastian Wagner <swagner@suse.com>
2019-12-09 18:29:41 -06:00
Sage Weil
bffe2dd9e9 Merge PR #32046 into master
* refs/pull/32046/head:
	mgr/DaemonServer: fix 'osd ok-to-stop' for EC pools

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-12-09 15:34:57 -06:00
Michael Fritch
8c355898f6
ceph-daemon: combine SUDO and ARGS into a single var
- reduce the amount of typing/noise for each CEPH_DAEMON invocation
- ensure the `--image` param is passed to each test invocation
- allow passing additional args to ceph-daemon via CEPH_DAEMON_ARGS

Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-09 09:53:31 -07:00
Sage Weil
3036d11c60 ceph-daemon: py2: tolerate whitespace before config key name
The py2 ConfigParser doesn't like whitespace before the config option
name.  (The py3 version doesn't care.)  Filter it out before parsing.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-09 07:15:13 -06:00
Sage Weil
ade391513c ceph-daemon: remove prepare-host
I thought I took this out of the PR but somehow it got merged in... must
have repushed and old branch and not realized.  :/

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-08 11:26:14 -06:00
Sage Weil
86950ce9aa Merge PR #32039 into master
* refs/pull/32039/head:
	test: Improve races by using kill_daemons which waits for OSDs terminate
	test: run-standalone.sh: Only run execs in the subdirectories of qa/standalone
	test: Use activate_osd() when restarting OSDs
	test: osd-scrub-snaps.sh: Fix race with osd restart and doing a scrub

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-12-07 12:28:15 -06:00
David Zafman
676d882649 test: Improve races by using kill_daemons which waits for OSDs terminate
osd-backfill-space.sh: More sleep time to make sure the backfill gets started

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-06 19:44:06 -08:00
Michael Fritch
995e5c3209
ceph-daemon: remove guesswork to find script file
Allow passing CEPH_DAEMON via the environment or default to using the
script from the standard location.

Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-05 21:13:56 -07:00
Michael Fritch
9e03530441
ceph-daemon: trap on EXIT
tempfiles were not being removed after a standalone test failure

Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-12-05 21:13:42 -07:00
David Zafman
43f6218993 test: Use activate_osd() when restarting OSDs
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-05 15:13:31 -08:00
David Zafman
cca541d0f9 test: osd-scrub-snaps.sh: Fix race with osd restart and doing a scrub
Fixes: https://tracker.ceph.com/issues/43150

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-05 15:12:43 -08:00
Sage Weil
66690ea314 mgr/DaemonServer: fix 'osd ok-to-stop' for EC pools
We need to pay attention to account for CRUSH_ITEM_NONE entries in the
EC PG acting set.

Fixes: https://tracker.ceph.com/issues/43151
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-05 14:31:24 -06:00
Sage Weil
91cb6eb613 ceph-daemon: add check-host and prepare-host
Check for (and/or install/configure):

- podman | docker
- systemctl
- LVM2
- chrony (or ntp or timesyncd)

Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-04 09:22:06 -06:00
Sage Weil
1770270af6 Merge PR #31869 into master
* refs/pull/31869/head:
	ceph-daemon: bootstrap: deploy initial mon via deploy_daemon()
	qa/standalone/test_ceph_daemon.sh: more $SUDO
	ceph-daemon: configure firewalld for new daemon deploys
	ceph-daemon: name mgr the same way mgr/ssh does

Reviewed-by: Michael Fritch <mfritch@suse.com>
2019-12-03 16:00:14 -06:00
Sage Weil
8aadba15bf qa/standalone/test_ceph_daemon.sh: more $SUDO
Signed-off-by: Sage Weil <sage@redhat.com>
2019-12-03 10:13:37 -06:00
Sage Weil
2c1235ba69 Merge PR #31913 into master
* refs/pull/31913/head:
	ceph-daemon: Allow env var for setting the used image

Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2019-12-02 16:26:33 -06:00
Thomas Bechtold
45bae8219a ceph-daemon: Allow env var for setting the used image
Instead of always adding "--image my-custom-image" when calling
ceph-daemon with a non-standard image, allow to set the environment
variable called CEPH_DAEMON_IMAGE which will adjust the --image
default.
That way, the command line arguments when using ceph-daemon with a
custom image are a bit shorter.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-11-28 09:06:18 +01:00
David Zafman
9d2e0267e1 test: Add test case based on Xie script in commit comment
Other test fixes to reflect changes

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-11-27 16:29:29 -08:00
David Zafman
0af7e25620 mgr: Fix balancer print
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-11-27 16:29:29 -08:00
Sage Weil
61ba2d7b66 Merge PR #31677 into master
* refs/pull/31677/head:
	qa/standalone/ceph-helpers.sh: remove osd down check
	qa/standalone/ceph-helpers.sh: destroy_osd: mark osd down
	osd: add osd_fast_shutdown option (default true)

Reviewed-by: Sébastien Han <seb@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-11-25 08:54:45 -06:00
Sage Weil
3a62d166a7 qa/standalone/ceph-helpers.sh: remove osd down check
A kill doesn't induce a mark-down of the OSD with osd_fast_shutdown=true.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-24 12:19:33 -06:00
Sage Weil
07193aec3a qa/standalone/test_ceph_daemon.sh: remove old vg before creating
Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
fd6bfad498 qa/standalone/test_ceph_daemon.sh: sudo for untar
The deepsea.tgz tar contains actual device nodes for the OSD block devices
(not symlinks or files).  Must be root to untar.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
723fdb111a qa/standalone/test_ceph_daemon.sh: sudo for losetup etc
Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
cb67545e99 qa/standalone/test_ceph_daemon.sh: fix overwrites of temp files
mktemp creates these files, so we have to pass --allow-overwrite (or
delete them after we get the unique name but before we write to them--this
is easier).

Broken by c7fe27a72a61d1345a66b8830fd17e7b922abd44

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-20 18:27:31 -06:00
Sage Weil
ede1d36773 qa/standalone/ceph-helpers.sh: destroy_osd: mark osd down
Stopping the OSD doesn't guarantee that it will be marked down.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-19 20:05:16 -06:00
Michael Fritch
5cb5e77f50
ceph-daemon: add osd create test(s)
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-11-18 22:30:22 -07:00
Michael Fritch
479e9be91c
ceph-daemon: add standalone adopt tests
Signed-off-by: Michael Fritch <mfritch@suse.com>
2019-11-13 16:51:59 -07:00
Sridhar Seshasayee
8819c3c37a
Merge pull request #31416 from sseshasa/wip-41666-replicaSizeWarn
osd/OSDMap: Show health warning if a pool is configured with size 1

Reviewed-by: Sage Weil <sweil@redhat.com>
Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-11-12 12:06:46 +05:30
Sridhar Seshasayee
33c647e811 osd/OSDMap: Show health warning if a pool is configured with size 1
Introduce a config option called 'mon_warn_on_pool_no_redundancy' that is
used to show a health warning if any pool in the ceph cluster is
configured with a size of 1. The user can mute/unmute the warning using
'ceph health mute/unmute POOL_NO_REDUNDANCY'.

Add standalone test to verify warning on setting pool size=1. Set the
associated warning to 'false' in ceph.conf.template under qa/tasks so
that existing tests do not break.

Fixes: https://tracker.ceph.com/issues/41666
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2019-11-11 10:36:35 +05:30
Thomas Bechtold
4258c4772a ceph-daemon: Move ceph-daemon executable to own directory
Moving ceph-daemon into src/ceph-daemon/ makes it simpler to add extra
code (eg. tox.ini, README, unittests, ...) specific to ceph-daemon.
That way related files are in a single directory.

Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
2019-11-08 17:05:57 +01:00