This was using an obscure syntax that worked at one time and wasn't
documented (AFAIK).
Fixes: https://tracker.ceph.com/issues/51182
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Right now scrub_test picks any pg in ceph. Unfortunately, it picked the
.mgr pool's only pg in [1]:
2021-05-16T11:36:35.035 DEBUG:teuthology.orchestra.run.smithi049:> adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados --cluster ceph --pool rbd setomapval main.db-journal.0000000000000000 key val
Instead, only pick a pg in the rbd pool.
[1] /ceph/teuthology-archive/kchai-2021-05-16_11:19:39-rados-wip-kefu-testing-2021-05-16-1043-distro-basic-smithi/6117396/teuthology.log
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This is mostly for testing: a lot of tests assume that there are no
existing pools. These tests relied on a config to turn off creating the
"device_health_metrics" pool which generally exists for any new Ceph
cluster. It would be better to make these tests tolerant of the new .mgr
pool but clearly there's a lot of these. So just convert the config to
make it work.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
"device_health_metrics" pool is gone -- .mgr pool is in.
I don't think the pool removal code in some test cases is necessary any
longer with recent changes to remove those warnings; so that code is
gone too.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/39505/head:
qa: test nowsync option in kernel client workflows
qa: deep merge top level overrides for fuse/kclient
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like
handle_auth_bad_method server allowed_methods [2] but i only support [2]
in the case of ceph cli, the error would look like:
[errno 13] RADOS permission denied (error connecting to the cluster)
so, to address this issue, the EACCES error is ignored when waiting
for a quorum.
Signed-off-by: Kefu Chai <kchai@redhat.com>
qa/standalone: Use osd op queue = wpq in activate_osd() within ceph-helpers.sh.
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.
Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.
Fixes: https://tracker.ceph.com/issues/49988
Signed-off-by: Kamoltat <ksirivad@redhat.com>
The documentation specifies this in [1] and yet we were using (I
believe) an older syntax:
ceph tell mds.foo:0 scrub start / recursive force
instead of
ceph tell mds.foo:0 scrub start / recursive,force
Oddly the former works at least as recently as in [2]:
2021-06-03T07:11:42.071 DEBUG:teuthology.orchestra.run.smithi025:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub start / recursive force
...
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:{
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "return_code": 0,
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "scrub_tag": "cf7a74b2-3eb2-4657-9274-ea504b1ebf8f",
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout: "mode": "asynchronous"
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout:}
[1] https://docs.ceph.com/en/latest/cephfs/scrub/
[2] /ceph/teuthology-archive/pdonnell-2021-06-03_03:40:33-fs-wip-pdonnell-testing-20210603.020013-distro-basic-smithi/6148097/teuthology.log
Fixes: https://tracker.ceph.com/issues/51146
See-also: https://tracker.ceph.com/issues/51145
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This change is a follow-up to commit
b6e9c0903d that set the scheduler to wpq in
run_osd() and run_osd_filestore(). In addition, activate_osd() too has to
set the scheduler type to 'wpq' in order to be consistent and avoid test
failures.
The above is a temporary measure until all the standalone tests are
modified to run well with the mclock_scheduler.
Fixes: https://tracker.ceph.com/issues/51074
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
mgr/dashboard: Include Network address and labels on Host Creation form
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
* refs/pull/41509/head:
common/cmdparse: fix CephBool validation for tell commands
mgr/nfs: fix 'nfs export create' argument order
common/cmdparse: emit proper json
mon/MonCommands: add -- seperator to example
qa/tasks/cephfs/test_nfs: fix export create test
mgr: make mgr commands compat with pre-quincy mon
doc/_ext/ceph_commands: handle non-positional args in docs
mgr: fix reweight-by-utilization cephbool flag
mon/MonCommands: convert some CephChoices to CephBool
mgr/k8sevents: fix help strings
pybind/mgr/mgr_module: fix help desc formatting
mgr/orchestrator: clean up 'orch {daemon add,apply} rgw' args
mgr/orchestrator: add end_positional to a few methods
mgr/orchestrator: reformat a few methods
pybind/ceph_argparse: stop parsing when we run out of positional args
pybind/ceph_argparse: remove dead code
pybind/mgr/mgr_module: infer non-positional args
pybind/mgr/mgr_module: add separator for non-positional args
command/cmdparse: use -- to separate positional from non-positional args
pybind/ceph_argparse: adjust help text for non-positional args
pybind/ceph_argparse: track a 'positional' property on cli args
Reviewed-by: Kefu Chai <kchai@redhat.com>
The ability to create host by specifying network address and also create
labels.
https://tracker.ceph.com/issues/50318
Signed-off-by: Nizamudeen A <nia@redhat.com>
This allows for array/dict configs like mntopts to accumulate changes
from multiple yaml fragments.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
even rook does not really install ceph packages in the host directly, it
uses the ceph container image. but teuthology insists on checking the
existence of debian packages by querying shaman server when it sees a
teuthology facet file which includes:
os_type: ubuntu
os_version: "18.04"
but since we've stopped building ubuntu/bionic packages, teuthology
just complains when we are scheduling test suites which are composed
from facets in qa/suites/orch/rook/smoke.
in this change, the ubuntu_18.04.yaml is dropped because ubuntu/bionic
does not really increase the test coverage of ceph. it helps to test
the rook and container runtime though.
Signed-off-by: Kefu Chai <kchai@redhat.com>
We have to reap connections promptly for this test to work.
This test was broken indirectly by d51d80b323,
which moved the counter decrement to reap time instead of mark_down/stop
time.
The reaping is asynchronous, so allow for a delay in the count change.
Fixes: https://tracker.ceph.com/issues/50622
Signed-off-by: Sage Weil <sage@newdream.net>
* refs/pull/39910/head:
test: Add test for mgr hang when osd is full
mgr: Set client_check_pool_perm to false
mds: Add full caps to avoid osd full check
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
osd: Run osd bench test to override default max osd capacity for mclock
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
"get_heap_property *" asock commands are exposed to operators
to check the tcmalloc internals for understanding the performance
of the memory subsystem. but crimson uses the builtin seastar allocator
which is not backed by tcmalloc. but we can dump the metrics using
the "dump_metrics" asock command which is only available from
crimson-osd.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
so we can cater the needs of different implementation of osd, i.e.,
classic osd and crimson osd. they offer different set of asock commands.
Signed-off-by: Kefu Chai <kchai@redhat.com>
With mclock scheduler enabled, the recovery throughput is throttled based
on factors like the type of mclock profile enabled, the OSD capacity among
others. Due to this the recovery times may vary and therefore the existing
timeout of 120 secs may not be sufficient.
To address the above, a new method called _is_inprogress_or_complete() is
introduced in the TestProgress Class that checks if the event with the
specified 'id' is in progress by checking the 'progress' key of the
progress command response. This method also handles the corner case where
the event completes just before it's called.
The existing wait_until_true() method in the CephTestCase Class is
modified to accept another function argument called "check_fn". This is
set to the _is_inprogress_or_complete() function described earlier in the
"test_turn_off_module" test that has been observed to fail due to the
reasons already described above. A retry mechanism of a maximum of 5
attempts is introduced after the first timeout is hit. This means that
the wait can extend up to a maximum of 600 secs (120 secs * 5) as long as
there is recovery progress reported by the 'ceph progress' command result.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>