Currently, to recover a file system after recovering monitor store, you
need to stop all the MDSs; create FSMap with defaults using `fs new`
command; execute `fs reset` command to get the file system's rank 0 into
existing but failed state; and then restart MDSs.
Add 'recover' flag to the `fs new` command that sets the file system's
rank 0 to existing but failed state, and sets the file system's
'joinable' setting to False. Using the `fs new` command with 'recover'
flag gets rid of the steps to stop all the MDSs and execute `fs reset`
command when recovering the file system after recoving monitor store.
Fixes: https://tracker.ceph.com/issues/51716
Signed-off-by: Ramana Raja <rraja@redhat.com>
IMO the amount of symlinks we have to manually maintain
is tedious and error prone. Any ideas on improving thing?
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Force a subset of tests that explicitly employ the filestore backend to
use WPQ scheduler. This is because mclock scheduler will not be
optimized for filestore.
Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
* refs/pull/42687/head:
qa: test the "ms_mode" options in kclient workflows
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
not really fixing anything, but moves the failures out of the normal
upgrade suite
Fixes: https://tracker.ceph.com/issues/49955
Signed-off-by: Casey Bodley <cbodley@redhat.com>
This commit adds the device ls command to the rook qa task
since that command should be working from now on.
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
Note that I didn't bother adding the prefer-* options, as I figure it's
better to be definite.
Fixes: https://tracker.ceph.com/issues/52068
Signed-off-by: Jeff Layton <jlayton@redhat.com>
* refs/pull/42691/head:
mgr/nfs: add --port to 'nfs cluster create' and port to 'nfs cluster info'
qa/suites/orch/cephadm/smoke-roleless: test taking ganeshas offline
qa/tasks/vip: exec with bash -ex
qa/suites/orch/cephadm: separate test_nfs from test_orch_cli
Reviewed-by: Varsha Rao <varao@redhat.com>
This is no longer required because we removed cosbench workloads in
fd350fd015. This is also required to prevent
failures like the following or any other changes that break the rgw task:
```
2021-08-06T20:13:25.812 INFO:teuthology.orchestra.run.smithi060.stderr:curl: (7) Failed to connect to smithi060.front.sepia.ceph.com port 80: Connection refused
2021-08-06T20:15:33.813 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_04c2febe7099917d97a71271f17abb5710030132/teuthology/contextutil.py", line 31, in nested
vars.append(enter())
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/rgw.py", line 191, in start_rgw
wait_for_radosgw(url, remote)
File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/util/rgw.py", line 94, in wait_for_radosgw
assert exit_status == 0
AssertionError
```
Signed-off-by: Neha Ojha <nojha@redhat.com>
* refs/pull/42349/head:
mon/MDSMonitor: propose if FSMap struct_v is too old
mon/MDSMonitor: give a proper error message if FSMap struct_v is too old
mds/FSMap: use DECODE_OLDEST to gate FSMap version
qa: add tests for fs dump of epoch and trimming
qa: add file system support for dumping epoch
mon/MDSMonitor: return mon_mds_force_trim_to even if equal to current epoch
mon: add debugging for trimming methods
mon: fix debug spacing
qa: add nofs upgrade suite
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
* refs/pull/41025/head:
qa: wait pgs to be clean before using the pools
qa: ignore PG_RECOVERY_FULL and PG_DEGRADED for mds-full
qa: wait more time since there have many more pgs than before
qa: do not multiple the full ratio twice
qa: do not raise for kclient for _fsync test
qa: use the pg autoscale mode to calcuate the pg_num
qa: set the object_size to 1M
qa: move the is_full() to parent class
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
This adds an upgrade suite to ensure that a Ceph cluster without a
CephFS file system does not blow up on upgrade (in particular, that the
MDSMonitor does not trip). This was developed to potentially reproduce
tracker 51673 but the actual cause for that issue was an old encoding
for the MDSMap which was obsoleted in Pacific. You must create a cluster
older than the FSMap (~Hammer or Infernalis) to reproduce. In any case,
this upgrade suite may be useful in the future so let's keep it!
Related-to: https://tracker.ceph.com/issues/51673
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
These overrides are standard for all configurations. The config to
enable fragmentation is also long removed.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
We can use pacific features when installing pacific.
Otherwise, we end up with the default keyring rule for client.admin,
which uses mode 0600, which makes teuthology jobs fail.
Signed-off-by: Sage Weil <sage@newdream.net>
Otherwise, we isntall new podman at the end, and the
container-selinux-policy package install triggers a bunch of selinux
errors.
Fixes: https://tracker.ceph.com/issues/50151
Signed-off-by: Sage Weil <sage@newdream.net>
Changes some the tests in teuthology to make
the test more deterministic.
Using:
`ceph osd set norecover` and
`ceph osd set nobackfill` when marking osds in
or out. As this will delay the recovery and make
sure it the test cases get the chance to check
that there is actually events poping up in
the progress module.
took out test_osd_cannot_recover from
tasks/mgr/test_progress.py since it is no longer
a relevant test case since recovery will get
triggered regardless if pg is unmoved.
Ignoring `OSDMAP_FLAGS` in teuthology
because we are using norecover and nobackfill
to delay the recovery process, therefore, it
will create a health warning and fails the
teuthology test.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
In 8b95c4b7c5 we set log_to_journald=false
in the cephadm config. However, that's not present in pre-quincy builds,
which means that when we upgrade the new daemons start spamming the
teuthology.log. Set this (with --force, since it's not valid pre-quincy)
in the config before we start the ugprade.
Signed-off-by: Sage Weil <sage@newdream.net>
this cephadm task was merged without testing in
https://github.com/ceph/ceph/pull/39855/ and fails consistently with an
error in kernel.py. the teuthology issue
https://tracker.ceph.com/issues/50338 has gone unfixed for months, so
removing rgw_cephadm.yaml to clean up the rgw suite
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Add a workunit for testing the rgw object cache
by using s3cmd to write objects and then
verify the objects in the cache.
Also move the 0-install.yaml file out of tasks and
into the main dir for the rgw/verify subsuite.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
qa: d3n: add debug logs
Signed-off-by: Ali Maredia <amaredia@redhat.com>
rgw: s3n: qa: fix netstat search for rgw process
Signed-off-by: Mark Kogan <mkogan@redhat.com>
it's a regression introduced by the restrcuture of the test suites,
let's pin the test to CentOS8.
See-also: https://tracker.ceph.com/issues/49638
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/41574/head:
qa/tasks/vstart_runner: add LocalCluster.run
qa/tasks/cephfs/test_nfs: fiddle with sudo
mgr/nfs/export: some cleanup, minor refactoring
mgr/nfs/cluster: remove unused @cluster_setter
nfs/mgr: fix help message case
doc/cephfs/fs-nfs-export: add note about export update behavior
mgr/nfs: move user create/delete into helper
mgr/nfs: refactor _delete_user helper
mgr/nfs: refactor create_export_from_dict() helper
mgr/nfs: keep 'nfs export get' around for backward-compat
mgr/nfs: rename method
qa/tasks/cephfs/test_nfs: test new export via apply
doc/cephfs/fs-nfs-export: be consistent with cluster_id and _ vs -
mgr/nfs: addr -> client_addr for 'nfs export create ...'
mgr/nfs: fix tests
mgr/nfs: 'nfs export get' -> 'nfs export info'
mgr/nfs: binding -> pseudo_path
mgr/nfs: more revisions based on review
mgr/nfs: adjust NFSExceptoin errno arg
doc/cephfs: update 'nfs export {get,apply}' docs
mgr/nfs: merge FSExport back into ExportMgr
doc/radosgw/nfs: document mgr/nfs way to add/remove rgw exports
mgr/nfs: merge 'nfs export {update,import}' -> 'nfs export apply'
mgr/nfs: test export creation and list
mgr/nfs: test export_update (+ fixes)
mgr/nfs: test Export.validate(); several fixes
mgr/nfs: test that export <-> block+dict conversions go both ways
mgr/nfs: clean up test a bit
mgr/nfs/export: fix export validation
mgr/nfs/export: fix tests
mgr/nfs: handle option addr/client block in create_export()
mgr/nfs: allow multiple addrs for new exports
mgr/nfs: fix/finish rgw export
mgr/nfs/module: clusterid -> cluster_id
mgr/nfs/export: fix export_update_1 to type check
mgr/nfs/cluster: fix type error
mgr/nfs/export: wrap long lines
mgr/nfs: ExportMgr._delete_export only works for cephfs for now
mgr/nfs: Remove pool_ns from NFSCluster
mgr/nfs: Remove ExportMgr.rados_namespace
mgr/nfs: flake8
mgr/nfs: Add type checking
mgr/nfs: Add __eq__ method to Export
mgr/nfs: Add some compatibility to mgr/dashboard
mgr/nfs: Fix whitespace handling
mgr/nfs: Copy unit tests from mgr/dashboard
mgr/nfs: partially implement rgw export support
mgr/nfs: abstract FSAL; add RGWFSAL
mgr/nfs: refactor to merge 'update' and 'import' code
mgr/nfs: add 'nfs export import' command
mgr/nfs: refactor 'nfs export update' and export validation
mgr/nfs: fix _fetch_export to distinguish between clusters
mgr/nfs: move export ganesha conf translation into caller
mgr/nfs: name nfs cephfs client key 'nfs.{cluster_id}.{export_id}'
mgr/nfs: add --addr to 'nfs export create'
mgr/nfs: add --squash to 'nfs export create'
mgr/nfs/export_utils: include false but non-None items in config
vstart.sh: enable nfs module
mgr/cephadm: nfs: drop attr_expiration_time from top-level config
mgr/cephadm: remove Dir_Chunk = 0
Reviewed-by: Michael Fritch <mfritch@suse.com>
This is mostly for testing: a lot of tests assume that there are no
existing pools. These tests relied on a config to turn off creating the
"device_health_metrics" pool which generally exists for any new Ceph
cluster. It would be better to make these tests tolerant of the new .mgr
pool but clearly there's a lot of these. So just convert the config to
make it work.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/39505/head:
qa: test nowsync option in kernel client workflows
qa: deep merge top level overrides for fuse/kclient
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
even rook does not really install ceph packages in the host directly, it
uses the ceph container image. but teuthology insists on checking the
existence of debian packages by querying shaman server when it sees a
teuthology facet file which includes:
os_type: ubuntu
os_version: "18.04"
but since we've stopped building ubuntu/bionic packages, teuthology
just complains when we are scheduling test suites which are composed
from facets in qa/suites/orch/rook/smoke.
in this change, the ubuntu_18.04.yaml is dropped because ubuntu/bionic
does not really increase the test coverage of ceph. it helps to test
the rook and container runtime though.
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/39910/head:
test: Add test for mgr hang when osd is full
mgr: Set client_check_pool_perm to false
mds: Add full caps to avoid osd full check
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
"get_heap_property *" asock commands are exposed to operators
to check the tcmalloc internals for understanding the performance
of the memory subsystem. but crimson uses the builtin seastar allocator
which is not backed by tcmalloc. but we can dump the metrics using
the "dump_metrics" asock command which is only available from
crimson-osd.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
This commit majorly consists of the RabbitMQ task which is a required and supported endpoint in bucket notification tests.
And some related changes in the AMQP tests. Major changes are:
1. Addition of RabbitMQ task
2. Documentation update for the steps to execute AMQP tests
3. Addition of attributes to the tests
4. Tox dependency removal from kafka.py
Signed-off-by: Kalpesh Pandya <kapandya@redhat.com>