This mirrors teuthology and makes it possible to check the exit status of a
daemon.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit e2e2144a56)
Commands issued by negtest_ceph_cmd() aren't printed because log level
(due to code for teuthology) changes from DEBUG to INFO in case of some
files.
This patch ensures that users can see commands being executed regardless
of whether log level is changed or not.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 03df86b7c5)
A journal flush sometimes takes more than 120 seconds and so the 'scrub
status' command after blocking for more than 120 seconds is declared
failed causing the job to be declared as failed.
This bumping up of the timeout gives more time for the 'scrub status'
command to wait and eventually let the journal flush to complete.
Fixes: https://tracker.ceph.com/issues/63411
Signed-off-by: Milind Changire <mchangir@redhat.com>
(cherry picked from commit 33899fdaac)
Conflicts:
qa/tasks/ceph_manager.py
- fixed diff between main and reef
qa/tasks/cephfs/filesystem.py
- fixed diff between main and reef
qa/tasks/vstart_runner.py
- fixed diff between main and reef
With mClock scheduler enabled, a small subset of config options related
to recovery limits are not allowed to be modified unless
osd_mclock_override_recovery_settings option is enabled. This override
option is disabled by default. The following options cannot be modified
without enabling the override option:
- osd_max_backfills
- osd_recovery_max_active[_(hdd|ssd)]
The above options are removed from the mon kv store which effectively
restores them to the default values.
This was resulting in tests for example,
test_cluster_configuration.ClusterConfigurationTest to fail since it
modifies the recovery options and expects to verify the modified value.
Therefore, for tests, osd_mclock_override_recovery_settings option is
enabled in vstart_runner.py so that current and future tests
are not affected.
Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit aed71b56be)
Before unmounting check if the client has been evicted and, if so, run
"umount -f -l" for the mount point of the client and cleanup the mount
right after it.
Attempting to unmount, cleanup or operate in any way over mount point
of a evicted client will hang the operation (and thereby our Python
code too). Lazy-force unmount prevents such hangs for our Python code
and also frees the mount point.
This commit also adds code to gather session info for kernel mounts
after mounting is successful. This is a necessity since network address
of session is needed to check if it is blocked by Ceph cluster.
Fixes: https://tracker.ceph.com/issues/56476
Signed-off-by: Rishabh Dave <ridave@redhat.com>
do_rados() prefixes extra arguments to every command because they are
helpful during execution of tests with teuthology. This patch
eliminates these extra arguments entirely (through overriding) for test
executions with vstart_runner.py.
Note: "timeout 120" is now prefixed to rados commands too. AFAIS, it
shouldn't have any side-effects on anything.
This commit is similar to commit 93677576c1.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
qa/cephfs: set omit_sudo False when sudo is set to True
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Nikhilkumar Shelke <nshelke@redhat.com>
1. Method cluster() in ceph.py creates a dictionary "ctx.ceph", attaches
a namespace to ctx.ceph[cluster_name], create an attribute "fsid" and
stores Ceph cluster's FSID in it.
2. The method kernel_mount.KernelMount._get_debug_dir() uses that "fsid"
attribute to get Ceph cluster's FSID. (The exact that does that is
"fsid = self.ctx.ceph[cluster_name].fsid").
3. Test test_readahead.TestReadahead.test_flush() crashes with
vstart_runner.py because that test eventually calls _get_debug_dir()
and "ctx" in case of vstart_runner.py doesn't hold "ceph" dictionary
or anything similar.
Adding a dictionary, similar to the one added in ceph.py, to
vstart_runner.LocalContext's instances will fix this issue.
Fixes: https://tracker.ceph.com/issues/55694
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Result of os.path.join() before "./bin/ceph-mds" and after
"./bin/./ceph-mds".
Before -
2022-05-05 19:36:11,100.100 DEBUG:__main__:> ./bin/./ceph-mds -i a
After -
2022-05-05 19:38:48,179.179 DEBUG:__main__:> ./bin/ceph-mds -i a
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The message regarding deletion of helper tools is printed for every
command. This message should be printed only when applicable.
Besides -
* Move XXX comments to _do_run() since it increases visibility of
these messages.
* Move omission of arguments stuff to new method clear up the clutter.
* And remove shell as a parameter from _perform_checks_and_adjustments
since it's redundant.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This methods fails to collect return value from
FuseMount._run_mount_cmd() and return it. This leads to a bug for tests
that expect mount command to fail when executed with vstart_runner.py.
Fixes: https://tracker.ceph.com/issues/55553
Signed-off-by: Rishabh Dave <ridave@redhat.com>
In these methods, parameter "sudo" indicates whether or not sudo is set
to True but the same is not indicated to methods underneath. This value
needs to be passed for the parameter to fulfill it's commitment.
Fixes: https://tracker.ceph.com/issues/55557
Signed-off-by: Rishabh Dave <ridave@redhat.com>
And therefore get rid of methods duplicated in LocalRemote and add a
call to empty constructor of RemoteShell in LocalRemote.__init__().
Signed-off-by: Rishabh Dave <ridave@redhat.com>
vstart_runner.py is written assuming that it can run commands with
superuser privileges whenever possible and vstart_runner.py is meant to
be executed without sudo.
So, it's better kill a process using "sudo kill -9 <PID>", instead of
using os.kill() because os.kill() can't kill process launched with
superuser privileges.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
About the commit date: this commit got dropped from the patch series
during some PR branch update but is added back now.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Passing "exec sudo" to "ceph -w" caused "Ceph API test" CI job to fail.
Error was not related to this tracker issue but the code added for it
is reversed now in this commit. The tracker issue -
https://tracker.ceph.com/issues/49644.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
We convert all cmd args to str and pass bash functions along to override
certain arguments in those command arguments. Let's save cmd args
without those bash functions since they can be useful later (for
example, printing cmd args in logs, which is the case in this patch.)
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The intention behind copying these note points is to document the
behaviour of vstart_runner.py inside vstart_runner.py as well so that
developer don't miss it out while working on it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Overridding commands is much better than deleting these commands from
command argument string using Python since, unlike deleting, overridding
doesn't require parsing. A note has been added for this to
vstart_runner.py's module docstring and to Ceph Developer's Guide
document.
Since functions don't work with sh shell, to make overriding work
vstart_runner.py will use bash shell here onwards.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Convert all command arguments to str from list, update checks and
adjustments performed on command arguments accordingly and update
documentation to include warnings about some critical parts of
vstart_runner.py and update tasks.cephfs.mount.MountCephFS.run_shell().
Fixes: https://tracker.ceph.com/issues/47849
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Use LocalContext instance to set LocalCephManager.cluster.
Fixes: https://tracker.ceph.com/issues/53601
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Restore ability to run radosgw_admin.py unit standalone--improved
to use vstart_runner hooks.
Local rgwadmin(...) wrapper suggested as a cleanup in review by Casey.
Fixes: https://tracker.ceph.com/issues/52837
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Without this plenty tests become incompatible with vstart_runner.py.
Ideally, vstart_runner.py should've been updated in commit 7812cfb674.
Fixes: https://tracker.ceph.com/issues/53043
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Stop importing CommandFailedError from teuthology.orchestra.run, it is
actually defined in teuthology.exception.
Fixes: https://tracker.ceph.com/issues/51226
Signed-off-by: Rishabh Dave <ridave@redhat.com>
* refs/pull/38481/head:
qa/vstart_runner: inherit methods instead of duplicating them
qa/ceph_manager: make it possible to reuse few methods
qa/vstart_runner: don't use "shell=False" in run_ceph_w()
qa/ceph_manager: minor refactor
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Inherit methods run_ceph_w(), run_cluster_cmd(), raw_cluster_cmd() and
raw_cluster_cmd_result() from ceph_manager.CephManager in
vstart_runner.LocalCephManager instead of duplicating them.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Instead prepend "exec sudo" to the command arguments of
LocalCephManager.run_ceph_w(). This makes the default parameter
"shell=False" redundant in case of
ceph_manager.CephManager.run_ceph_w(), so get rid of it too and update
calls to run_ceph_w() accordingly.
The reason behind using any of these workarounds is that running "ceph
-w" with "shell" set to True leads to crash for Ceph API CI job. See
this ticket for more details: https://tracker.ceph.com/issues/49644.
The reason behind switching the workaround is that in the following
commits to reduce duplication LocalCephManager.run_ceph_w() will be
deleted and CephManager.run_ceph_w() will be used by LocalCephManager
via inheritance. However, due to the issue described above, Ceph API
test will fail since "shell" is set to "True" for the command issued by
CephManager.run_ceph_w(). Prepending "exec sudo" to the command when it
is used in LocalCephManager makes this duplication unnecessary and also
prevents Ceph API test from failing.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
to silence mypy warnings like:
tasks/vstart_runner.py:691: error: Definition of "_run_python" in base class "LocalCephFSMount" is incompatible with definition in base class "CephFSMount"
tasks/vstart_runner.py:705: error: Definition of "_run_python" in base class "LocalCephFSMount" is incompatible with definition in base class "CephFSMount"
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/42029/head:
vstart_runner: use FileNotFoundError when os.stat() fails
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
* refs/pull/42030/head:
vstart_runner: maintain log level when --debug is passed
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
otherwise following error is expected in some cases:
INFO:__main__:Traceback (most recent call last):
INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/test_dashboard.py", line 18, in setUp
INFO:__main__: self._assign_ports("dashboard", "ssl_server_port")
INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/mgr_test_case.py", line 197, in _assign_ports
INFO:__main__: cls.mgr_cluster.mgr_stop(mgr_id)
INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/mgr_test_case.py", line 30, in mgr_stop
INFO:__main__: self.mgr_daemons[mgr_id].stop()
INFO:__main__: File "../qa/tasks/vstart_runner.py", line 558, in stop
INFO:__main__: os.kill(pid, signal.SIGTERM)
INFO:__main__:TypeError: an integer is required (got type NoneType)
Signed-off-by: Kefu Chai <kchai@redhat.com>
Add log messages to indicate start and end of execution of stop.sh and
vstart.sh. Running both of these scripts is a major step in execution of
vstart_runner.py and not passing --debug makes it impossible to know
whether these scripts have started or finished running.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
When --debug and --clear-old-log options are passed to vstart_runner.py,
it ends up resetting log level to the default level (which is
logging.INFO) despite of --debug. Set "log_level" as default parameter
to init_log() so that the log level code for clearing old log can pass
current logging level to init_log() to maintain the log level user
desired for.
Fixes: https://tracker.ceph.com/issues/51344
Signed-off-by: Rishabh Dave <ridave@redhat.com>
* refs/pull/40412/head:
vstart_runner: reuse code in LocalRemoteProcess
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>