One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
hence leaving the corresponding pg pinning undersized for a long time until all backfill
targets completes
It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.
The right way to achieve the above goal is for:
boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)
to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
* refs/pull/33114/head:
cephadm:Fix name argument parsing during image check for non-ceph components
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
bug in parsing introduced in 97def7c
args.name may exist but will be none if flag is not used
check the value in addition to checking if it exists
Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
A single report on a non-lvm device now works.
Format was cleaned up, report lvm journal,wal, db only once.
Fixes: https://tracker.ceph.com/issues/44009
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Also drop the sep argument from get_lvs and siblings, unused.
Introduce LV_CMD_OPTIONS to unify options to lvs.
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
This was broken by d8debba782
because the 'images' json output works with podman but not with
docker. (Also, the inspect command is more explicit and cleaner.)
Signed-off-by: Sage Weil <sage@redhat.com>
This is fixed in RHEL 8.1.1 (and by extension centos/rhel 8.2+).
No fix for el 7 yet
Partially-fixes: https://tracker.ceph.com/issues/43703
Signed-off-by: Sage Weil <sage@redhat.com>
this test will end with a failure like
```
2020-01-30T18:15:15.870 INFO:tasks.ceph.mgr.x.smithi042.stderr:Warning: Permanently added 'smithi042.front.sepia.ceph.com,172.21.15.42' (ECDSA) to the list of known hosts.
2020-01-30T18:15:15.925 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.932 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.939 INFO:tasks.ceph.mgr.x.smithi042.stderr:root@smithi042.front.sepia.ceph.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
```
because mgr is not able to establish an ssh connection to that host with "root".
please note, the teuthology worker is acting using the "ubuntu" account on the
test node, and by default, "root" does not have its pubkey. and actually
`qa/tasks/cephadm.py` does push the pubkey to all the managed hosts before
testing cephadm.
since `qa/tasks/cephadm.py` is a better test for cephadm, let's just
drop this one.
as suites/rados/cephadm already covers cephadm
Signed-off-by: Kefu Chai <kchai@redhat.com>
since rexec module has been removed in python3, we cannot use it
anymore.
Fixes: https://tracker.ceph.com/issues/43657
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/33058/head:
mgr/cephadm: enforce that a host is a valid DNS name
mgr/cephadm: verify host's hostname matches our host name
cephadm: check-host: add optional --expect-hostname
Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
* refs/pull/33069/head:
cephadm: use appropriate default image for non-ceph components
Reviewed-by: Patrick Seidensal <pseidensal@suse.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
* refs/pull/33039/head:
osd/OSD: prevent down osds from immediately rejoining the culster
osd/OSD: trim osd_markdown_log in tick() thread
Reviewed-by: yanjun <yan.jun8@zte.com.cn>
Reviewed-by: Sage Weil <sage@redhat.com>
This combines the hostname restrictions
* 1-63 chars
* a-z, A-Z, 0-9, -
and the DNS name restrictions
* .-delimited
* no empty components (or leading or trailing .)
* 250 chars total max
Note that this allows bare IPv4 addresses (which are indistinguishable from
a valid DNS name, AFAICS), but disallows bare IPv6 addresses.
Signed-off-by: Sage Weil <sage@redhat.com>
This makes the daemon disappear immediately from 'service ls', and also
avoids a temporary health warning about a stray service.
Signed-off-by: Sage Weil <sage@redhat.com>
This will fix an error caused by the usage of the latest version of Angular CLI
and Node.js v10.16.0.
Fixes: https://tracker.ceph.com/issues/43961
Signed-off-by: Tiago Melo <tmelo@suse.com>
mon/MgrMonitor.cc: warn about missing mgr in a cluster with osds
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>