pre-single-major.yaml kernel doesn't have any of the monitor client
fixes that came in 4.6. If the connection is closed, it closes the
session and retries only after 10 seconds. On top of that, there is
nothing to prevent it from picking the same monitor when reconnecting.
This means that when given both v1 and v2 ports (which look like two
different monitors), it is susceptible to mount_timeout (60 seconds):
$ sudo rbd map img
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (5) Input/output error
[ 822.242313] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 832.265494] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 842.296175] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 852.326924] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 862.357611] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 872.388373] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
[ 882.676136] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
Unlike newer kernels that return ETIMEDOUT, it returns EIO.
Newer kernels are much more aggressive about retries and will pick
a different monitor when reconnecting, hence they are always able to
establish the session in time.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
krbd: retry on transient errors from udev_enumerate_scan_devices()
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
* refs/pull/31063/head:
qa: disable too few PG warning during Mimic deploy
Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
* refs/pull/31168/head:
ceph-daemon: try py2 import before py3
qa/suites/rados/singleton-nomsgr/ceph-daemon: make sure python3 is installed
qa/standalone/test_ceph_damon.sh: test with python2 and python3
mgr/ssh: python, not python3
ceph-daemon: python, not python3
ceph-daemon: os.makedirs
ceph-daemon: configparser is ConfigParser on py2
ceph-daemon: avoid py3-isms
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Alfredo Deza <adeza@redhat.com>
qa: enable dashboard tests to be run with "--suite rados/dashboard"
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
stop-all.sh will work if the right deps are there (currently we lack 'nc')
also killall -9 java to be sure.
Fixes: https://tracker.ceph.com/issues/42496
Signed-off-by: Sage Weil <sage@redhat.com>
Sometimes we run containers on a host that doesn't have a crash dir set
up (becuase no daemon has been deployed). Examples include shell and
ceph-volume.
Signed-off-by: Sage Weil <sage@redhat.com>
Mimic will raise this warning when we use 8 PGs for CephFS metadata/data
pools.
Fixes: fc88e6c6c5
Fixes: https://tracker.ceph.com/issues/42434
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Skip new check for meta collection
test:
Turn off osd_pool_default_pg_autoscale_mode just like bash tests do
Fix test by checking for new error message
Caused by: f88b353454
Fixes: https://tracker.ceph.com/issues/42476
Signed-off-by: David Zafman <dzafman@redhat.com>
Ceph luminous is known to run on SLE-12-SP3 and nautilus on SLE-15-SP1, so add
these two to qa/distros/all.
Signed-off-by: Nathan Cutler <ncutler@suse.com>
* refs/pull/30859/head:
auth: EACCES, not EPERM
mon: shunt old tell commands from cli interface to asok
mon: allow mgr to tell mon.foo smart
mon: include quorum features in quorum_status
qa/workunits/mon/caps.sh: fix test
ceph_test_rados_api_cmd: fix MonDescribe test
Merge branch 'vstart-fs-auth' of git://github.com/batrick/ceph into wip-cleanup-mon-asok
test/pybind/test_ceph_argparse: fix tests
vstart: add volume client keys to keyring
vstart: use fs authorize to create master client key
vstart: redirect some output to stderr
vstart: output command strings to stderr
qa/workunits/cephtool/test.sh: fix 'quorum enter' caller
qa: change mon_status calls to quorum_status or tell commands
mon: fix 'heap ...' command
mon: consolidate 'sync force' commands
mon: allow asok commands to return an error code
mon: move 'quorum enter|exit' and 'mon_status' to asok
mon: fix 'smart' asok command
mon: remove old 'config set' and 'injectargs'
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
* refs/pull/31094/head:
ceph-daemon: remove redundant --privileged
test_ceph_daemon: test unit, enter, shell
ceph-daemon: drop exec
ceph-daemon: fix exit code for run, shell, enter, exec
ceph-daemon: allow optional command for 'enter'
ceph-daemon: fix LANG for 'enter' command
ceph-daemon: allow shell to take optional command
qa/suites/rados/singleton-nomsgr/ceph-daemon: run test_ceph_daemon.sh
qa/standalone/test_ceph_daemon.sh: add new functional tests
test_ceph_daemon.sh: use newer image
ceph-daemon: unconditionally enable and start crash unit
ceph-daemon: fix crash unit cleanup
ceph-daemon: include 'crash' unit/item in 'ls' output
ceph-daemon: fix 'ls'
mgr/orchestrator: s/sdd/ssd/
mgr/ssh: remove stdout/stderr kludges
ceph-daemon: fix ceph-volume command to write stdout to stdout
Reviewed-by: Sebastian Wagner <swagner@suse.com>
The consensus seems to be that PrioritizedQueue is strictly worse than
WeightedPriorityQueue.
mClockClientQueue and mClockClassQueue are superceded by
mClockScheduler.
Signed-off-by: Samuel Just <sjust@redhat.com>
It's not identical to enter. enter seems more intuitive to me, but that
may be because I'm not a longtime docker user.
Signed-off-by: Sage Weil <sage@redhat.com>
- sudo as needed
- clean up afterward
There is still a bit of missing coverage, but this captures most of it.
Signed-off-by: Sage Weil <sage@redhat.com>
We don't need to specify the number of PGs for a new data pool anymore
since b1b821f608 and other related
changes. The related health warnings are also deprecated/gone. So this
no longer needs to be done.
Fixes: b1b821f608
Fixes: https://tracker.ceph.com/issues/42436
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/30813/head:
qa: get rid of iteritems for python3 compatibility
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
I'm not really sure why this test expected EPERM before when it expects 0
a bit earlier, but it should certainly expect EPERM after the user is
deleted.
Signed-off-by: Sage Weil <sage@redhat.com>
... without this there is a sutle race where it takes a
bit of time after a hard reset of a client mount causing
further checks to fail as the (still up) client is still
connected to the MDS.
Fixes: http://tracker.ceph.com/issues/42213
Signed-off-by: Venky Shankar <vshankar@redhat.com>
This moves dashboard.yaml from rados/mgr into a new, separate rados/dashboard
suite. The common elements it uses are moved from rados/mgr into qa/ and
replaced with symlinks.
Fixes: https://tracker.ceph.com/issues/41820
Signed-off-by: Nathan Cutler <ncutler@suse.com>
we don't have "pg_num_target" in "osd dump" back in mimic, so we don't
need to check it if it is missing when performing upgrade test.
Signed-off-by: Kefu Chai <kchai@redhat.com>
since it requires a running ceph cluster, it can't run in 'make check'
as a unittest. add it to the rgw/verify suite instead
Signed-off-by: Casey Bodley <cbodley@redhat.com>