One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
hence leaving the corresponding pg pinning undersized for a long time until all backfill
targets completes
It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.
The right way to achieve the above goal is for:
boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)
to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
This is fixed in RHEL 8.1.1 (and by extension centos/rhel 8.2+).
No fix for el 7 yet
Partially-fixes: https://tracker.ceph.com/issues/43703
Signed-off-by: Sage Weil <sage@redhat.com>
this test will end with a failure like
```
2020-01-30T18:15:15.870 INFO:tasks.ceph.mgr.x.smithi042.stderr:Warning: Permanently added 'smithi042.front.sepia.ceph.com,172.21.15.42' (ECDSA) to the list of known hosts.
2020-01-30T18:15:15.925 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.932 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.939 INFO:tasks.ceph.mgr.x.smithi042.stderr:root@smithi042.front.sepia.ceph.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
```
because mgr is not able to establish an ssh connection to that host with "root".
please note, the teuthology worker is acting using the "ubuntu" account on the
test node, and by default, "root" does not have its pubkey. and actually
`qa/tasks/cephadm.py` does push the pubkey to all the managed hosts before
testing cephadm.
since `qa/tasks/cephadm.py` is a better test for cephadm, let's just
drop this one.
as suites/rados/cephadm already covers cephadm
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/31633/head:
cephfs-shell: Instead of assert use stat for tests in rmdir
cephfs-shell: Add function for common rmdir test code
cephfs-shell: Add rmdir test for non empty directory
cephfs-shell: Add rmdir -p test for non empty directory
cephfs-shell: Add rmdir -p test for non existing dir
cephfs-shell: Add rmdir -p test to delete all dirs in given path
cephfs-shell: Add rmdir -p test for root directory with empty directories
cephfs-shell: Add rmdir test for valid file
cephfs-shell: Add rmdir test for invalid directory
cephfs-shell: Add rmdir test for valid directory
cephfs-shell: Fix rmdir '-p' issues
Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
This is harmless if logging is low, but adds useful info when it is turned
up.
Hunting bug https://tracker.ceph.com/issues/43914
Signed-off-by: Sage Weil <sage@redhat.com>
If we haven't scrubbed everything, we occasinoally re-request scrub in case
the request was missed by the OSD (this can happen). But we were
re-requesting scrub on ALL pgs, and if they are done in a
semi-deterministic order and are slow, then we may never get to the final
ones.
Signed-off-by: Sage Weil <sage@redhat.com>
Fixes:
2020-01-30T04:41:24.697 INFO:tasks.thrashosds.thrasher:fixing pg num pool None
2020-01-30T04:41:24.698 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1070, in wrapper
return func(self)
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 1200, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 768, in fix_pgp_num
if self.ceph_manager.set_pool_pgpnum(pool, force):
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-29-1034/qa/tasks/ceph_manager.py", line 2088, in set_pool_pgpnum
assert isinstance(pool_name, six.string_types)
AssertionError
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/32972/head:
python-common/ceph/deployment/translate: use 'prepare' instead of 'batch' for trivial case
qa/tasks/cephadm: pass short dev name to osd prepare
mgr/cephadm: fix detection of just-created OSDs
mgr/cephadm: properly indent raise conditions
mgr/cephadm: add warning to other orchestrators
mgr/cephadm: separate acceptance criterias for Devices
mgr/cephadm: fix typos
mgr/cephadm: move utils in test/utils.py
mgr/ssh: increase disk size to 20G
drivegroups: add support for drivegroups + tests
mgr/orch_cli: allow multiple drivegroups
drivegroups: translate disk spec to ceph-volume call
Reviewed-by: Jan Fajerski <jfajerski@suse.com>
Reviewed-by: Joshua Schmid <jschmid@suse.de>
Include _latest.yaml in a few cases here to be a bit future-proof.
cephadm-smoke/ is *just* a cephadm bring-up, and includes el7. cephadm/
installs packages and runs a real workload.
Signed-off-by: Sage Weil <sage@redhat.com>
Zap needs a full path, but create/prepare needs the VG/LV
only if it is an existing LV.
We'll make c-v more friendly later.
Signed-off-by: Sage Weil <sage@redhat.com>