Uses the client's global id to get the metrics, instead of using the index.
This ensures that test_perf_stats_stale_metrics checks only the clients mounted for
the tests.
Fixes: https://tracker.ceph.com/issues/54971
Signed-off-by: Jos Collin <jcollin@redhat.com>
Add test to verify that the NFS servers don't restart when the
access type of a CephFS NFS export is updated.
And check the NFS servers are restarted when the pseudo path of
a CephFS NFS export is updated.
Signed-off-by: Ramana Raja <rraja@redhat.com>
That `ceph fs perf stats` doesn't output stale metrics
after the rank0 MDS failover.
Fixes: https://tracker.ceph.com/issues/50033
Signed-off-by: Jos Collin <jcollin@redhat.com>
The `fs volume rename` command renames the volume, i.e.,
orchestrator MDS service, file system, and the data and
metadata pool of the file system.
Fixes: https://tracker.ceph.com/issues/51162
Signed-off-by: Ramana Raja <rraja@redhat.com>
* refs/pull/44342/head:
mds: trigger stray reintegration when loading dentry
qa: test that scrub causes reintegration
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Otherwise, certain upgrade tests fail which install pacific
or earlier releases since the mount helper does not understand
this mount option, thereby passing it to the kernel which would
does not handle this config causing mount to fail in tests.
Note that this mount config is only used during teuthology tests
to catch v2-style syntax implementation bugs in the kernel.
Fixes: http://tracker.ceph.com/issues/53487
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Validate the symlink xattr is stored on first data
object of the symlink along with backtrace if
'mds_symlink_recovery' option is enabled and
vice-versa.
Also add 'string_wrapper' class to decode bufferlist
to string. This helps 'ceph-dencoder' tool to decode
the symlink target stored, which is used in tests to
validate.
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Fixes: https://tracker.ceph.com/issues/46166
Validates that the 'cephfs-data-scan' tool recovers
symlink during disaster recovery of metadata pool
from data pool correctly as symlink
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Fixes: https://tracker.ceph.com/issues/46166
TestFragmentation.test_deep_split relies on `num_strays`
to reach zero expecting that the purge threads would
have deleted the directory entries. However, checking
`num_strays` cannot be relied on since PurqeQueue merely
journals the purge item (see PurgeQueue::push) followed
by the StrayManager marking the stray as removed thereby
accounting `num_strays`.
So, add an additional condition to check if the purge
threads have finished processing items.
Fixes: http://tracker.ceph.com/issues/52487
Signed-off-by: Venky Shankar <vshankar@redhat.com>
But, do not throw away the old style mount syntax since we would
want to continue testing it since users (scripts) might still be
using it.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Test if the number of snaps on the file-system and the stats on created
snaps in the DB match.
NOTE:
Since it is difficult to get the snapshot created on the exact second,
the timestamp comparison has been limited up to the last 'minute' as the
comparison granularity.
Signed-off-by: Milind Changire <mchangir@redhat.com>
Recently, Luis posted a patch to turn the metrics debugfs file into a
directory with separate files for the different sections in the old
metrics file.
Account for this change in get_op_read_count().
Fixes: https://tracker.ceph.com/issues/53214
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Sometimes the OpenFileTable::commit() will just come after the 30
seconds' waiting.
Fixes: https://tracker.ceph.com/issues/52887
Signed-off-by: Xiubo Li <xiubli@redhat.com>
* refs/pull/43590/head:
qa: test that new mounts of same fs function after old mount is evicted
qa: remove REQUIRE_KCLIENT_REMOTE
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
* refs/pull/38752/head:
qa: enable dynamic debug support to kclient
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/43425/head:
qa: import CommandFailedError from exceptions not run
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
pybind/mgr/cephadm: set allow_standby_replay during CephFS upgrade
Reviewed-by: Sage Weil <sage@newdream.net>
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Use the enhanced create_n_files to dedup code. Also split the large test
into three.
Fixes: https://tracker.ceph.com/issues/52606
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Few things:
- Allow calling fsync on directory (to support async create kernel).
- Allow immediately unlinking the created file (for stray testing).
- Close any file descriptors created.
- Write unique content (the i variable) to each file.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The test is not needing to check that the new MDS becomes active, only
that a replacement occurs.
Fixes: https://tracker.ceph.com/issues/52677
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Stop importing CommandFailedError from teuthology.orchestra.run, it is
actually defined in teuthology.exception.
Fixes: https://tracker.ceph.com/issues/51226
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This allows hooks for `cephadm shell` to function so that this code
works with cephadm deployments.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/42719/head:
mgr/volumes: Fix permission during subvol creation with mode
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/42584/head:
doc: fix `daemon status` interface (exclude file system name)
test: adjust mirroring tests for `daemon status` change
mgr/mirroring: `daemon status` command does not require file system name
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Add a 'kmount_count' counter in ctx to make sure the dynamic debug
log won't be disabled until the last kernel mounter is unmounted.
Fixes: https://tracker.ceph.com/issues/48736
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Currently, to recover a file system after recovering monitor store, you
need to stop all the MDSs; create FSMap with defaults using `fs new`
command; execute `fs reset` command to get the file system's rank 0 into
existing but failed state; and then restart MDSs.
Add 'recover' flag to the `fs new` command that sets the file system's
rank 0 to existing but failed state, and sets the file system's
'joinable' setting to False. Using the `fs new` command with 'recover'
flag gets rid of the steps to stop all the MDSs and execute `fs reset`
command when recovering the file system after recoving monitor store.
Fixes: https://tracker.ceph.com/issues/51716
Signed-off-by: Ramana Raja <rraja@redhat.com>
The subvolume creation with specific mode leads to
creation of parent directories ('/volumes/_no_group') with
the same mode if it's not already created. Fixed the same.
Similarly, the subvolumegroup creation with specific mode
leads to creation of parent directory ('/volumes') with
same mode if it's not already created. Fixed the same.
Fixes: https://tracker.ceph.com/issues/51870
Signed-off-by: Kotresh HR <khiremat@redhat.com>
* refs/pull/42371/head:
mgr/volumes: Fix a race during clone cancel
mgr/volumes: Fail subvolume removal if it's in progress
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Removing an in-progress subvolume clone with force doesn't
remove the clone index (tracker). This results in the cloner
thread to stuck in loop trying to clone the deleted one.
This patch addresses the issue by not allowing the subvolume clone
to be removed if it's not complete/cancelled/failed even with force option.
It throws the error EAGAIN, asking the user to cancel the pending clone
and retry.
Fixes: https://tracker.ceph.com/issues/51707
Signed-off-by: Kotresh HR <khiremat@redhat.com>
* refs/pull/42349/head:
mon/MDSMonitor: propose if FSMap struct_v is too old
mon/MDSMonitor: give a proper error message if FSMap struct_v is too old
mds/FSMap: use DECODE_OLDEST to gate FSMap version
qa: add tests for fs dump of epoch and trimming
qa: add file system support for dumping epoch
mon/MDSMonitor: return mon_mds_force_trim_to even if equal to current epoch
mon: add debugging for trimming methods
mon: fix debug spacing
qa: add nofs upgrade suite
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
The cluster has already multiple the full ratio before returning
the "max_avail".
Fixes: https://tracker.ceph.com/issues/50984
Signed-off-by: Xiubo Li <xiubli@redhat.com>
For kclient, the write() will return -ENOSPC instead of the fsync().
Fixes: https://tracker.ceph.com/issues/45434
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Setting the pg_num to 8 is too small that some osds maybe not covered by the
pools, some osds maybe overloaded. Remove the hardcodeing pg_num here and let
the pg autoscale mode to calculate it as needed, and at the same time set the
pg_num_min to 64 to avoid the pg_num to small.
If ec pool is used, for the test cases most datas will go to the ec pool and
the primary replicated pool will store a small amount of metadata for all the
files only, so set the target size ratio to 0.05 should be enough.
Fixes: https://tracker.ceph.com/issues/45434
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Set the object_size to 1MB to make the objects destributed more even
among the OSDs.
Fixes: https://tracker.ceph.com/issues/45434
Signed-off-by: Xiubo Li <xiubli@redhat.com>
It may take a moment for a ganesha to (re)configure itself with a new
export. If a mount fails, retry a couple times.
Signed-off-by: Sage Weil <sage@newdream.net>
This is a new pool that we can migrate all past NFS configuration to,
simplifying the migration process (and also allowing us to pick a
.-prefixed name).
Signed-off-by: Sage Weil <sage@newdream.net>
File system will need to be recreated when monitor databases are lost
and rebuilt. Some applications (e.g., CSI) expect that the recovered
file system have the same ID as before. Allow creating a file system
with a specific ID to help in such scenarios. This can now be done by
the `fs new` command using the argument 'fscid' and 'force' flag.
Newer file systems will no longer have increasing IDs as a corollary.
Fixes: https://tracker.ceph.com/issues/51340
Signed-off-by: Ramana Raja <rraja@redhat.com>
* refs/pull/42081/head:
qa: use kclient xattr to lookup client id
qa: refactor reading debug file code
qa: get mount id before failing fs
Reviewed-by: Xiubo Li <xiubli@redhat.com>
These tests want to immediately use the mount anyway. But the main
problem is, without waiting for the mount to complete, the command:
chmod 1777 /path/to/mount
is not run so the mount cannot be written to by normal users without
sudo.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
"run_shell" adds 'sudo' which runs afoul of new security protections on
Ubuntu 20.04.
Fixes: https://tracker.ceph.com/issues/51417
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/41574/head:
qa/tasks/vstart_runner: add LocalCluster.run
qa/tasks/cephfs/test_nfs: fiddle with sudo
mgr/nfs/export: some cleanup, minor refactoring
mgr/nfs/cluster: remove unused @cluster_setter
nfs/mgr: fix help message case
doc/cephfs/fs-nfs-export: add note about export update behavior
mgr/nfs: move user create/delete into helper
mgr/nfs: refactor _delete_user helper
mgr/nfs: refactor create_export_from_dict() helper
mgr/nfs: keep 'nfs export get' around for backward-compat
mgr/nfs: rename method
qa/tasks/cephfs/test_nfs: test new export via apply
doc/cephfs/fs-nfs-export: be consistent with cluster_id and _ vs -
mgr/nfs: addr -> client_addr for 'nfs export create ...'
mgr/nfs: fix tests
mgr/nfs: 'nfs export get' -> 'nfs export info'
mgr/nfs: binding -> pseudo_path
mgr/nfs: more revisions based on review
mgr/nfs: adjust NFSExceptoin errno arg
doc/cephfs: update 'nfs export {get,apply}' docs
mgr/nfs: merge FSExport back into ExportMgr
doc/radosgw/nfs: document mgr/nfs way to add/remove rgw exports
mgr/nfs: merge 'nfs export {update,import}' -> 'nfs export apply'
mgr/nfs: test export creation and list
mgr/nfs: test export_update (+ fixes)
mgr/nfs: test Export.validate(); several fixes
mgr/nfs: test that export <-> block+dict conversions go both ways
mgr/nfs: clean up test a bit
mgr/nfs/export: fix export validation
mgr/nfs/export: fix tests
mgr/nfs: handle option addr/client block in create_export()
mgr/nfs: allow multiple addrs for new exports
mgr/nfs: fix/finish rgw export
mgr/nfs/module: clusterid -> cluster_id
mgr/nfs/export: fix export_update_1 to type check
mgr/nfs/cluster: fix type error
mgr/nfs/export: wrap long lines
mgr/nfs: ExportMgr._delete_export only works for cephfs for now
mgr/nfs: Remove pool_ns from NFSCluster
mgr/nfs: Remove ExportMgr.rados_namespace
mgr/nfs: flake8
mgr/nfs: Add type checking
mgr/nfs: Add __eq__ method to Export
mgr/nfs: Add some compatibility to mgr/dashboard
mgr/nfs: Fix whitespace handling
mgr/nfs: Copy unit tests from mgr/dashboard
mgr/nfs: partially implement rgw export support
mgr/nfs: abstract FSAL; add RGWFSAL
mgr/nfs: refactor to merge 'update' and 'import' code
mgr/nfs: add 'nfs export import' command
mgr/nfs: refactor 'nfs export update' and export validation
mgr/nfs: fix _fetch_export to distinguish between clusters
mgr/nfs: move export ganesha conf translation into caller
mgr/nfs: name nfs cephfs client key 'nfs.{cluster_id}.{export_id}'
mgr/nfs: add --addr to 'nfs export create'
mgr/nfs: add --squash to 'nfs export create'
mgr/nfs/export_utils: include false but non-None items in config
vstart.sh: enable nfs module
mgr/cephadm: nfs: drop attr_expiration_time from top-level config
mgr/cephadm: remove Dir_Chunk = 0
Reviewed-by: Michael Fritch <mfritch@suse.com>
- Do user create or delete via a helper
- Defer until after we have validated the Export (on create or update)
- Support updates to user_id, which is needed to keep the naming consistent
and to also support changing the bucket, since the user_id is derived
from that.
Signed-off-by: Sage Weil <sage@newdream.net>
* refs/pull/41908/head:
qa: always format the pgid in hex
Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/41385/head:
mon/FSCommands: add command to rename a file system
qa/cephfs: split test_admin.TestAdminCommands
mds: remove 'fs_name' from MDSRank
Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/41860/head:
qa: log messages when falling back to force/lazy umount
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
* refs/pull/40997/head:
test: add test to verify adding an active peer back to source
pybind/mirroring: disallow adding a active peer back to source
pybind/cephfs: interface to fetch file system id
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/36823/head:
qa : add a test for the cmd, dump cache
mds : add timeout to the command, dump cache, to prevent it from running too long and affecting the service
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
The fs_name of the relevant MDSMap is set to the new name. Also,
the application tags of the data pools and the meta data pool of
the file system is set to the new name.
Fixes: https://tracker.ceph.com/issues/47276
Signed-off-by: Ramana Raja <rraja@redhat.com>
Splitting TestAdminCommands makes it easier to exercise a particular
"ceph fs" subcommand. Also, rename class TestSubCmdFsAuthorize to
TestFsAuthorize.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
If the pg number is larger than 9, this won't match the array index,
which was in dec just before this.
Fixes: https://tracker.ceph.com/issues/50808
Signed-off-by: Xiubo Li <xiubli@redhat.com>
We suspect that we may be hitting races against lazy umounts that are
causing problems when enumerating debugfs files. Currently though we
don't log anything special when falling back to a lazy unmount. Log
a message when killing processes on a mount and when force+lazy
unmounting.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
This was using an obscure syntax that worked at one time and wasn't
documented (AFAIK).
Fixes: https://tracker.ceph.com/issues/51182
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The documentation specifies this in [1] and yet we were using (I
believe) an older syntax:
ceph tell mds.foo:0 scrub start / recursive force
instead of
ceph tell mds.foo:0 scrub start / recursive,force
Oddly the former works at least as recently as in [2]:
2021-06-03T07:11:42.071 DEBUG:teuthology.orchestra.run.smithi025:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub start / recursive force
...
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:{
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "return_code": 0,
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "scrub_tag": "cf7a74b2-3eb2-4657-9274-ea504b1ebf8f",
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout: "mode": "asynchronous"
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout:}
[1] https://docs.ceph.com/en/latest/cephfs/scrub/
[2] /ceph/teuthology-archive/pdonnell-2021-06-03_03:40:33-fs-wip-pdonnell-testing-20210603.020013-distro-basic-smithi/6148097/teuthology.log
Fixes: https://tracker.ceph.com/issues/51146
See-also: https://tracker.ceph.com/issues/51145
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/41509/head:
common/cmdparse: fix CephBool validation for tell commands
mgr/nfs: fix 'nfs export create' argument order
common/cmdparse: emit proper json
mon/MonCommands: add -- seperator to example
qa/tasks/cephfs/test_nfs: fix export create test
mgr: make mgr commands compat with pre-quincy mon
doc/_ext/ceph_commands: handle non-positional args in docs
mgr: fix reweight-by-utilization cephbool flag
mon/MonCommands: convert some CephChoices to CephBool
mgr/k8sevents: fix help strings
pybind/mgr/mgr_module: fix help desc formatting
mgr/orchestrator: clean up 'orch {daemon add,apply} rgw' args
mgr/orchestrator: add end_positional to a few methods
mgr/orchestrator: reformat a few methods
pybind/ceph_argparse: stop parsing when we run out of positional args
pybind/ceph_argparse: remove dead code
pybind/mgr/mgr_module: infer non-positional args
pybind/mgr/mgr_module: add separator for non-positional args
command/cmdparse: use -- to separate positional from non-positional args
pybind/ceph_argparse: adjust help text for non-positional args
pybind/ceph_argparse: track a 'positional' property on cli args
Reviewed-by: Kefu Chai <kchai@redhat.com>
We have to reap connections promptly for this test to work.
This test was broken indirectly by d51d80b323,
which moved the counter decrement to reap time instead of mark_down/stop
time.
The reaping is asynchronous, so allow for a delay in the count change.
Fixes: https://tracker.ceph.com/issues/50622
Signed-off-by: Sage Weil <sage@newdream.net>
Added the config 'delay_snapshot_clone' to insert delay at the beginning
of the clone to avoid races in tests. The default value is set to 0.
Fixes: https://tracker.ceph.com/issues/48231
Signed-off-by: Kotresh HR <khiremat@redhat.com>
* refs/pull/41084/head:
test: test to verify dir path removal when no mirror daemons are running
pybind/mirroring: advance state machine from stalled state
pybind/mirroring: start from correct state during policy init
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>