This was using an obscure syntax that worked at one time and wasn't
documented (AFAIK).
Fixes: https://tracker.ceph.com/issues/51182
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Right now scrub_test picks any pg in ceph. Unfortunately, it picked the
.mgr pool's only pg in [1]:
2021-05-16T11:36:35.035 DEBUG:teuthology.orchestra.run.smithi049:> adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados --cluster ceph --pool rbd setomapval main.db-journal.0000000000000000 key val
Instead, only pick a pg in the rbd pool.
[1] /ceph/teuthology-archive/kchai-2021-05-16_11:19:39-rados-wip-kefu-testing-2021-05-16-1043-distro-basic-smithi/6117396/teuthology.log
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
"device_health_metrics" pool is gone -- .mgr pool is in.
I don't think the pool removal code in some test cases is necessary any
longer with recent changes to remove those warnings; so that code is
gone too.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/39505/head:
qa: test nowsync option in kernel client workflows
qa: deep merge top level overrides for fuse/kclient
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like
handle_auth_bad_method server allowed_methods [2] but i only support [2]
in the case of ceph cli, the error would look like:
[errno 13] RADOS permission denied (error connecting to the cluster)
so, to address this issue, the EACCES error is ignored when waiting
for a quorum.
Signed-off-by: Kefu Chai <kchai@redhat.com>
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.
Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.
Fixes: https://tracker.ceph.com/issues/49988
Signed-off-by: Kamoltat <ksirivad@redhat.com>
The documentation specifies this in [1] and yet we were using (I
believe) an older syntax:
ceph tell mds.foo:0 scrub start / recursive force
instead of
ceph tell mds.foo:0 scrub start / recursive,force
Oddly the former works at least as recently as in [2]:
2021-06-03T07:11:42.071 DEBUG:teuthology.orchestra.run.smithi025:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub start / recursive force
...
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:{
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "return_code": 0,
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "scrub_tag": "cf7a74b2-3eb2-4657-9274-ea504b1ebf8f",
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout: "mode": "asynchronous"
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout:}
[1] https://docs.ceph.com/en/latest/cephfs/scrub/
[2] /ceph/teuthology-archive/pdonnell-2021-06-03_03:40:33-fs-wip-pdonnell-testing-20210603.020013-distro-basic-smithi/6148097/teuthology.log
Fixes: https://tracker.ceph.com/issues/51146
See-also: https://tracker.ceph.com/issues/51145
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mgr/dashboard: Include Network address and labels on Host Creation form
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
* refs/pull/41509/head:
common/cmdparse: fix CephBool validation for tell commands
mgr/nfs: fix 'nfs export create' argument order
common/cmdparse: emit proper json
mon/MonCommands: add -- seperator to example
qa/tasks/cephfs/test_nfs: fix export create test
mgr: make mgr commands compat with pre-quincy mon
doc/_ext/ceph_commands: handle non-positional args in docs
mgr: fix reweight-by-utilization cephbool flag
mon/MonCommands: convert some CephChoices to CephBool
mgr/k8sevents: fix help strings
pybind/mgr/mgr_module: fix help desc formatting
mgr/orchestrator: clean up 'orch {daemon add,apply} rgw' args
mgr/orchestrator: add end_positional to a few methods
mgr/orchestrator: reformat a few methods
pybind/ceph_argparse: stop parsing when we run out of positional args
pybind/ceph_argparse: remove dead code
pybind/mgr/mgr_module: infer non-positional args
pybind/mgr/mgr_module: add separator for non-positional args
command/cmdparse: use -- to separate positional from non-positional args
pybind/ceph_argparse: adjust help text for non-positional args
pybind/ceph_argparse: track a 'positional' property on cli args
Reviewed-by: Kefu Chai <kchai@redhat.com>
The ability to create host by specifying network address and also create
labels.
https://tracker.ceph.com/issues/50318
Signed-off-by: Nizamudeen A <nia@redhat.com>
This allows for array/dict configs like mntopts to accumulate changes
from multiple yaml fragments.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
We have to reap connections promptly for this test to work.
This test was broken indirectly by d51d80b323,
which moved the counter decrement to reap time instead of mark_down/stop
time.
The reaping is asynchronous, so allow for a delay in the count change.
Fixes: https://tracker.ceph.com/issues/50622
Signed-off-by: Sage Weil <sage@newdream.net>
* refs/pull/39910/head:
test: Add test for mgr hang when osd is full
mgr: Set client_check_pool_perm to false
mds: Add full caps to avoid osd full check
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
osd: Run osd bench test to override default max osd capacity for mclock
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
so we can cater the needs of different implementation of osd, i.e.,
classic osd and crimson osd. they offer different set of asock commands.
Signed-off-by: Kefu Chai <kchai@redhat.com>
With mclock scheduler enabled, the recovery throughput is throttled based
on factors like the type of mclock profile enabled, the OSD capacity among
others. Due to this the recovery times may vary and therefore the existing
timeout of 120 secs may not be sufficient.
To address the above, a new method called _is_inprogress_or_complete() is
introduced in the TestProgress Class that checks if the event with the
specified 'id' is in progress by checking the 'progress' key of the
progress command response. This method also handles the corner case where
the event completes just before it's called.
The existing wait_until_true() method in the CephTestCase Class is
modified to accept another function argument called "check_fn". This is
set to the _is_inprogress_or_complete() function described earlier in the
"test_turn_off_module" test that has been observed to fail due to the
reasons already described above. A retry mechanism of a maximum of 5
attempts is introduced after the first timeout is hit. This means that
the wait can extend up to a maximum of 600 secs (120 secs * 5) as long as
there is recovery progress reported by the 'ceph progress' command result.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
This commit majorly consists of the RabbitMQ task which is a required and supported endpoint in bucket notification tests.
And some related changes in the AMQP tests. Major changes are:
1. Addition of RabbitMQ task
2. Documentation update for the steps to execute AMQP tests
3. Addition of attributes to the tests
4. Tox dependency removal from kafka.py
Signed-off-by: Kalpesh Pandya <kapandya@redhat.com>
- registry-url, registry-username and registry-password bootstrap options are
supported now. This is needed to access monitoring service container images.
- usage of RHEL distribution based cephadm in download_cephadm task.
Signed-off-by: sunilkumarn417 <sunnagar@redhat.com>
Add fs suite for tests requiring one node as well.
Fixes: https://tracker.ceph.com/issues/50532
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Kotresh HR <khiremat@redhat.com>
For teuthology runs, we set log_to_stderr=false, so that we only see
derr-level events in the container log (and teuthology.log). Now that we
log directly to journald, set log_to_journald=false too, so that we don't
see level-20 logs in teuthology.log.
Signed-off-by: Sage Weil <sage@newdream.net>
Added the config 'delay_snapshot_clone' to insert delay at the beginning
of the clone to avoid races in tests. The default value is set to 0.
Fixes: https://tracker.ceph.com/issues/48231
Signed-off-by: Kotresh HR <khiremat@redhat.com>
* refs/pull/41084/head:
test: test to verify dir path removal when no mirror daemons are running
pybind/mirroring: advance state machine from stalled state
pybind/mirroring: start from correct state during policy init
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/41286/head:
qa/suites/orch/rook: disable centos for now
qa/suites/orch/rook/smoke: initial smoke suite
qa/tasks/rook: ROOK_HOSTPATH_REQUIRES_PRIVILEGED=true on centos
qa/tasks/rook: simplify shutdown
qa/tasks/rook: archive logs
qa/tasks/rook: more orderly cluster teardown
qa/tasks/rook: deploy ceph via rook on top of kubernetes
qa/tasks/kubeadm: install kubernetes with kubeadm
qa/suites: move rados/cephadm -> orch/cephadm; symlink
qa/tasks/cephadm: add whitespace between functions
qa/tasks/cephadm: clean up ctx.manager setup
Reviewed-by: Sébastien Han <seb@redhat.com>