* refs/pull/42719/head:
mgr/volumes: Fix permission during subvol creation with mode
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/42584/head:
doc: fix `daemon status` interface (exclude file system name)
test: adjust mirroring tests for `daemon status` change
mgr/mirroring: `daemon status` command does not require file system name
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Add a 'kmount_count' counter in ctx to make sure the dynamic debug
log won't be disabled until the last kernel mounter is unmounted.
Fixes: https://tracker.ceph.com/issues/48736
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Currently, to recover a file system after recovering monitor store, you
need to stop all the MDSs; create FSMap with defaults using `fs new`
command; execute `fs reset` command to get the file system's rank 0 into
existing but failed state; and then restart MDSs.
Add 'recover' flag to the `fs new` command that sets the file system's
rank 0 to existing but failed state, and sets the file system's
'joinable' setting to False. Using the `fs new` command with 'recover'
flag gets rid of the steps to stop all the MDSs and execute `fs reset`
command when recovering the file system after recoving monitor store.
Fixes: https://tracker.ceph.com/issues/51716
Signed-off-by: Ramana Raja <rraja@redhat.com>
mon/MonCap: Update osd profile to allow cmd to set iops capacity on mon db
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
The subvolume creation with specific mode leads to
creation of parent directories ('/volumes/_no_group') with
the same mode if it's not already created. Fixed the same.
Similarly, the subvolumegroup creation with specific mode
leads to creation of parent directory ('/volumes') with
same mode if it's not already created. Fixed the same.
Fixes: https://tracker.ceph.com/issues/51870
Signed-off-by: Kotresh HR <khiremat@redhat.com>
This commit adds OSD creation to the Rook QA tasks. The Rook task will
explicitly wait for the mgr to start and the CLI to work (instead of
implicitly doing so while waiting for 'ceph osd dump' to work).
Then it will do `ceph orch apply osd --all-available-devices` to create
OSDs on the rest of the PVs.
Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
Assign the default caps for osds to be the same as what the AuthMonitor
sets for a new osd. See AuthMonitor::validate_osd_new() which sets the
following caps for a new osd:
mon='allow profile osd'
mgr='allow profile osd'
osd=''allow *'
When an actual real world cluster is deployed, the above caps are applied.
Unless the user modifies the defaults, a cluster will operate with the
above caps. Therefore, it makes sense to use the defaults when testing
Ceph so that issues if any due to the default settings may be caught and
fixed.
Therefore, the caps for the 'osd' type is reset to the default in
generate_caps(). The caps for 'mgr' already reflects the system defaults.
The caps for 'mds' type is not changed in this commit and will be
investigated and changed if necessary later.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
* refs/pull/38481/head:
qa/vstart_runner: inherit methods instead of duplicating them
qa/ceph_manager: make it possible to reuse few methods
qa/vstart_runner: don't use "shell=False" in run_ceph_w()
qa/ceph_manager: minor refactor
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/42371/head:
mgr/volumes: Fix a race during clone cancel
mgr/volumes: Fail subvolume removal if it's in progress
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
* refs/pull/42691/head:
mgr/nfs: add --port to 'nfs cluster create' and port to 'nfs cluster info'
qa/suites/orch/cephadm/smoke-roleless: test taking ganeshas offline
qa/tasks/vip: exec with bash -ex
qa/suites/orch/cephadm: separate test_nfs from test_orch_cli
Reviewed-by: Varsha Rao <varao@redhat.com>
- Rename the dashboard command to better reflect its behavior.
- Rename '_radosgw_admin' method to 'send_rgwadmin_command' for consistency with
'send_mon_command' and move it to the mgr_module.py .
- Cleanup: remove unneeded rgw settings.
- Better error handling and test coverage.
Fixes: https://tracker.ceph.com/issues/44605
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Downloading 461087a514/cryptography-3.4.7.tar.gz (546kB)
Complete output from command python setup.py egg_info:
=============================DEBUG ASSISTANCE==========================
If you are seeing an error here please try the following to
successfully install cryptography:
Upgrade to the latest pip and try again. This will fix errors for most
users. See: https://pip.pypa.io/en/stable/installing/#upgrading-pip
=============================DEBUG ASSISTANCE==========================
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-7fhnk5us/cryptography/setup.py", line 14, in <module>
from setuptools_rust import RustExtension
ModuleNotFoundError: No module named 'setuptools_rust'
Fixes: https://tracker.ceph.com/issues/52070
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Inherit methods run_ceph_w(), run_cluster_cmd(), raw_cluster_cmd() and
raw_cluster_cmd_result() from ceph_manager.CephManager in
vstart_runner.LocalCephManager instead of duplicating them.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Make minor adjustments to ceph_manager.CephManager so that methods
run_ceph_w(), run_cluster_cmd() raw_cluster_cmd() and
raw_cluster_cmd_result() can be reused, instead of duplicating, in
subclasses. The adjustments are -
* Having variables contain arguments that'll be prepended to every
command received by the methods above.
* Grouping variables that needs to be overridden together so that it is
easy to spot and override them for users.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Instead prepend "exec sudo" to the command arguments of
LocalCephManager.run_ceph_w(). This makes the default parameter
"shell=False" redundant in case of
ceph_manager.CephManager.run_ceph_w(), so get rid of it too and update
calls to run_ceph_w() accordingly.
The reason behind using any of these workarounds is that running "ceph
-w" with "shell" set to True leads to crash for Ceph API CI job. See
this ticket for more details: https://tracker.ceph.com/issues/49644.
The reason behind switching the workaround is that in the following
commits to reduce duplication LocalCephManager.run_ceph_w() will be
deleted and CephManager.run_ceph_w() will be used by LocalCephManager
via inheritance. However, due to the issue described above, Ceph API
test will fail since "shell" is set to "True" for the command issued by
CephManager.run_ceph_w(). Prepending "exec sudo" to the command when it
is used in LocalCephManager makes this duplication unnecessary and also
prevents Ceph API test from failing.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Save the return value of method "teuthology.get_testdir()" instead of
calling it repeatedly in the same class.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
so we don't need to use virtualenv python package for creating a
virtualenv, the "venv" module in Python3 would suffice.
see also https://docs.python.org/3/library/venv.html
Signed-off-by: Kefu Chai <kchai@redhat.com>
Removing an in-progress subvolume clone with force doesn't
remove the clone index (tracker). This results in the cloner
thread to stuck in loop trying to clone the deleted one.
This patch addresses the issue by not allowing the subvolume clone
to be removed if it's not complete/cancelled/failed even with force option.
It throws the error EAGAIN, asking the user to cancel the pending clone
and retry.
Fixes: https://tracker.ceph.com/issues/51707
Signed-off-by: Kotresh HR <khiremat@redhat.com>
* refs/pull/42349/head:
mon/MDSMonitor: propose if FSMap struct_v is too old
mon/MDSMonitor: give a proper error message if FSMap struct_v is too old
mds/FSMap: use DECODE_OLDEST to gate FSMap version
qa: add tests for fs dump of epoch and trimming
qa: add file system support for dumping epoch
mon/MDSMonitor: return mon_mds_force_trim_to even if equal to current epoch
mon: add debugging for trimming methods
mon: fix debug spacing
qa: add nofs upgrade suite
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
* refs/pull/41025/head:
qa: wait pgs to be clean before using the pools
qa: ignore PG_RECOVERY_FULL and PG_DEGRADED for mds-full
qa: wait more time since there have many more pgs than before
qa: do not multiple the full ratio twice
qa: do not raise for kclient for _fsync test
qa: use the pg autoscale mode to calcuate the pg_num
qa: set the object_size to 1M
qa: move the is_full() to parent class
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
The cluster has already multiple the full ratio before returning
the "max_avail".
Fixes: https://tracker.ceph.com/issues/50984
Signed-off-by: Xiubo Li <xiubli@redhat.com>
For kclient, the write() will return -ENOSPC instead of the fsync().
Fixes: https://tracker.ceph.com/issues/45434
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Setting the pg_num to 8 is too small that some osds maybe not covered by the
pools, some osds maybe overloaded. Remove the hardcodeing pg_num here and let
the pg autoscale mode to calculate it as needed, and at the same time set the
pg_num_min to 64 to avoid the pg_num to small.
If ec pool is used, for the test cases most datas will go to the ec pool and
the primary replicated pool will store a small amount of metadata for all the
files only, so set the target size ratio to 0.05 should be enough.
Fixes: https://tracker.ceph.com/issues/45434
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Set the object_size to 1MB to make the objects destributed more even
among the OSDs.
Fixes: https://tracker.ceph.com/issues/45434
Signed-off-by: Xiubo Li <xiubli@redhat.com>
* refs/pull/42431/head:
cmake: add "mypy" back to tox envlist of "qa""
qa/tasks/vstart_runner: add optional "sudo" param to _run_python()
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
to silence mypy warnings like:
tasks/vstart_runner.py:691: error: Definition of "_run_python" in base class "LocalCephFSMount" is incompatible with definition in base class "CephFSMount"
tasks/vstart_runner.py:705: error: Definition of "_run_python" in base class "LocalCephFSMount" is incompatible with definition in base class "CephFSMount"
Signed-off-by: Kefu Chai <kchai@redhat.com>
otherwise we have following warning in health report
{"status":"HEALTH_WARN","checks":{"RECENT_MGR_MODULE_CRASH":{"severity":"HEALTH_WARN","summary":{"message":"1 mgr modules have recently crashed","count":1},"muted":false}},"mutes":[]}
and it does not disappear after the test waits for 30 seconds.
and the tasks.mgr.test_module_selftest.TestModuleSelftest test
fails like:
2021-07-21T09:59:52.560 INFO:tasks.cephfs_test_runner:======================================================================
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:ERROR: test_module_commands (tasks.mgr.test_module_selftest.TestModuleSelftest)
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2021-07-21T09:59:52.561 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/mgr/test_module_selftest.py", line 201, in
test_mo
dule_commands
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: self.wait_for_health_clear(timeout=30)
2021-07-21T09:59:52.562 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 172, in
wait_for_health_c
lear
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: self.wait_until_true(is_clear, timeout)
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6a5d5abc027f706687dec92f92ff6fc6f074d2ae/qa/tasks/ceph_test_case.py", line 209, in
wait_until_true
2021-07-21T09:59:52.563 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2021-07-21T09:59:52.564 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s and 0 retries
in this change, the crash reports are nuked right after
we see the warning, so that we can have a clean health
report.
Fixes: https://tracker.ceph.com/issues/51743
Signed-off-by: Kefu Chai <kchai@redhat.com>
It may take a moment for a ganesha to (re)configure itself with a new
export. If a mount fails, retry a couple times.
Signed-off-by: Sage Weil <sage@newdream.net>
This is a new pool that we can migrate all past NFS configuration to,
simplifying the migration process (and also allowing us to pick a
.-prefixed name).
Signed-off-by: Sage Weil <sage@newdream.net>
* refs/pull/42041/head:
mgr/restful: ignore min/max_size
test/crush: drop min/max_size refs
qa/workunits/mon/pool_ops: remove test for min/max_size check
qa: scrub a few remaining mentions of ruleset
qa/standalone/mon/osd-*: fix tests
PendingReleaseNotes: note min/max_size removal
mgr/dashboard: remove max/min_size and ruleset
mon/OSDMonitor: fix calls to CrushTester
crush: eliminate min_size and max_size
test/cli/crushtool: reunumber rulesets in test maps
crushtool: require min/max or num-rep for --test
crush: remove last traces of 'ruleset'
test/cli/crushtool: use 'id' instead of 'ruleset' in crush inputs
crushtool: take --min-rep and --max-rep explicitly
crush/CrushTester: drop --ruleset
doc: scrub 'ruleset' from docs
src/erasure-code: rule, not ruleset
mon/OSDMonitor: remove check_crush_rule() callers
mon/OSDMonitor: rule, not ruleset
crushtool: remove check for overlapped ruels
crush/CrushWrapper: get_osd_pool_default_crush_replicated_ruleset -> rule
crush: remove find_rule()
mon/OSDMonitor: use pool's crush rule directly
osd/OSDMap: drop checks for ruleset == ruleid
osd/OSDMap: use pool's crush rule_id directly
mon/PGMap: use pool's crush_rule directly
mon/OSDMonitor: remove crush ruleset->rule rewrite
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Changes some the tests in teuthology to make
the test more deterministic.
Using:
`ceph osd set norecover` and
`ceph osd set nobackfill` when marking osds in
or out. As this will delay the recovery and make
sure it the test cases get the chance to check
that there is actually events poping up in
the progress module.
took out test_osd_cannot_recover from
tasks/mgr/test_progress.py since it is no longer
a relevant test case since recovery will get
triggered regardless if pg is unmoved.
Ignoring `OSDMAP_FLAGS` in teuthology
because we are using norecover and nobackfill
to delay the recovery process, therefore, it
will create a health warning and fails the
teuthology test.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
* refs/pull/42029/head:
vstart_runner: use FileNotFoundError when os.stat() fails
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
* refs/pull/42030/head:
vstart_runner: maintain log level when --debug is passed
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
otherwise following error is expected in some cases:
INFO:__main__:Traceback (most recent call last):
INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/test_dashboard.py", line 18, in setUp
INFO:__main__: self._assign_ports("dashboard", "ssl_server_port")
INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/mgr_test_case.py", line 197, in _assign_ports
INFO:__main__: cls.mgr_cluster.mgr_stop(mgr_id)
INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/mgr_test_case.py", line 30, in mgr_stop
INFO:__main__: self.mgr_daemons[mgr_id].stop()
INFO:__main__: File "../qa/tasks/vstart_runner.py", line 558, in stop
INFO:__main__: os.kill(pid, signal.SIGTERM)
INFO:__main__:TypeError: an integer is required (got type NoneType)
Signed-off-by: Kefu Chai <kchai@redhat.com>
File system will need to be recreated when monitor databases are lost
and rebuilt. Some applications (e.g., CSI) expect that the recovered
file system have the same ID as before. Allow creating a file system
with a specific ID to help in such scenarios. This can now be done by
the `fs new` command using the argument 'fscid' and 'force' flag.
Newer file systems will no longer have increasing IDs as a corollary.
Fixes: https://tracker.ceph.com/issues/51340
Signed-off-by: Ramana Raja <rraja@redhat.com>