When you add a host in maintenance mode and then exit the maintenance
mode, a 500 server error will popup which will interrupt the whole
exit maintenance process and leave the host in an unknown/offline state.
It happened when I was setting the status of the host through the
HostSpec(). With this change, I am using the enter_maintenance api of
the orch to enable the maintenance.
Fixes: https://tracker.ceph.com/issues/51218
Signed-off-by: Nizamudeen A <nia@redhat.com>
let's avoid getting new versions of those packages by accident.
Unfortunately this means we have to manually update those
packages regurarly.
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
This was using an obscure syntax that worked at one time and wasn't
documented (AFAIK).
Fixes: https://tracker.ceph.com/issues/51182
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This PR is solving an error: Directory not empty.
This was encountered during removal of the downloaded keycloak package from keycloak.py
Signed-off-by: Kalpesh Pandya <kapandya@redhat.com>
Right now scrub_test picks any pg in ceph. Unfortunately, it picked the
.mgr pool's only pg in [1]:
2021-05-16T11:36:35.035 DEBUG:teuthology.orchestra.run.smithi049:> adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage rados --cluster ceph --pool rbd setomapval main.db-journal.0000000000000000 key val
Instead, only pick a pg in the rbd pool.
[1] /ceph/teuthology-archive/kchai-2021-05-16_11:19:39-rados-wip-kefu-testing-2021-05-16-1043-distro-basic-smithi/6117396/teuthology.log
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This is mostly for testing: a lot of tests assume that there are no
existing pools. These tests relied on a config to turn off creating the
"device_health_metrics" pool which generally exists for any new Ceph
cluster. It would be better to make these tests tolerant of the new .mgr
pool but clearly there's a lot of these. So just convert the config to
make it work.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
"device_health_metrics" pool is gone -- .mgr pool is in.
I don't think the pool removal code in some test cases is necessary any
longer with recent changes to remove those warnings; so that code is
gone too.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/39505/head:
qa: test nowsync option in kernel client workflows
qa: deep merge top level overrides for fuse/kclient
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like
handle_auth_bad_method server allowed_methods [2] but i only support [2]
in the case of ceph cli, the error would look like:
[errno 13] RADOS permission denied (error connecting to the cluster)
so, to address this issue, the EACCES error is ignored when waiting
for a quorum.
Signed-off-by: Kefu Chai <kchai@redhat.com>
qa/standalone: Use osd op queue = wpq in activate_osd() within ceph-helpers.sh.
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.
Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.
Fixes: https://tracker.ceph.com/issues/49988
Signed-off-by: Kamoltat <ksirivad@redhat.com>
The documentation specifies this in [1] and yet we were using (I
believe) an older syntax:
ceph tell mds.foo:0 scrub start / recursive force
instead of
ceph tell mds.foo:0 scrub start / recursive,force
Oddly the former works at least as recently as in [2]:
2021-06-03T07:11:42.071 DEBUG:teuthology.orchestra.run.smithi025:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub start / recursive force
...
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:{
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "return_code": 0,
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "scrub_tag": "cf7a74b2-3eb2-4657-9274-ea504b1ebf8f",
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout: "mode": "asynchronous"
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout:}
[1] https://docs.ceph.com/en/latest/cephfs/scrub/
[2] /ceph/teuthology-archive/pdonnell-2021-06-03_03:40:33-fs-wip-pdonnell-testing-20210603.020013-distro-basic-smithi/6148097/teuthology.log
Fixes: https://tracker.ceph.com/issues/51146
See-also: https://tracker.ceph.com/issues/51145
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This change is a follow-up to commit
b6e9c0903d that set the scheduler to wpq in
run_osd() and run_osd_filestore(). In addition, activate_osd() too has to
set the scheduler type to 'wpq' in order to be consistent and avoid test
failures.
The above is a temporary measure until all the standalone tests are
modified to run well with the mclock_scheduler.
Fixes: https://tracker.ceph.com/issues/51074
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
mgr/dashboard: Include Network address and labels on Host Creation form
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
* refs/pull/41509/head:
common/cmdparse: fix CephBool validation for tell commands
mgr/nfs: fix 'nfs export create' argument order
common/cmdparse: emit proper json
mon/MonCommands: add -- seperator to example
qa/tasks/cephfs/test_nfs: fix export create test
mgr: make mgr commands compat with pre-quincy mon
doc/_ext/ceph_commands: handle non-positional args in docs
mgr: fix reweight-by-utilization cephbool flag
mon/MonCommands: convert some CephChoices to CephBool
mgr/k8sevents: fix help strings
pybind/mgr/mgr_module: fix help desc formatting
mgr/orchestrator: clean up 'orch {daemon add,apply} rgw' args
mgr/orchestrator: add end_positional to a few methods
mgr/orchestrator: reformat a few methods
pybind/ceph_argparse: stop parsing when we run out of positional args
pybind/ceph_argparse: remove dead code
pybind/mgr/mgr_module: infer non-positional args
pybind/mgr/mgr_module: add separator for non-positional args
command/cmdparse: use -- to separate positional from non-positional args
pybind/ceph_argparse: adjust help text for non-positional args
pybind/ceph_argparse: track a 'positional' property on cli args
Reviewed-by: Kefu Chai <kchai@redhat.com>
The ability to create host by specifying network address and also create
labels.
https://tracker.ceph.com/issues/50318
Signed-off-by: Nizamudeen A <nia@redhat.com>
This allows for array/dict configs like mntopts to accumulate changes
from multiple yaml fragments.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
even rook does not really install ceph packages in the host directly, it
uses the ceph container image. but teuthology insists on checking the
existence of debian packages by querying shaman server when it sees a
teuthology facet file which includes:
os_type: ubuntu
os_version: "18.04"
but since we've stopped building ubuntu/bionic packages, teuthology
just complains when we are scheduling test suites which are composed
from facets in qa/suites/orch/rook/smoke.
in this change, the ubuntu_18.04.yaml is dropped because ubuntu/bionic
does not really increase the test coverage of ceph. it helps to test
the rook and container runtime though.
Signed-off-by: Kefu Chai <kchai@redhat.com>
We have to reap connections promptly for this test to work.
This test was broken indirectly by d51d80b323,
which moved the counter decrement to reap time instead of mark_down/stop
time.
The reaping is asynchronous, so allow for a delay in the count change.
Fixes: https://tracker.ceph.com/issues/50622
Signed-off-by: Sage Weil <sage@newdream.net>