* refs/pull/56997/head:
pybind/mgr: disable sqlite3/python autocommit
qa/tasks/mgr: add tests for sqlite autocommit
qa/tasks/vstart_runner: run daemons in foreground
qa/tasks/vstart_runner: add missing poll method
qa/suites/rados/mgr: add cli/devicehealth tasks
qa: reorganize mgr unit tests
qa: use position-independent link
qa: add missing terminating newline
pybind/mgr: add killpoint for sqlite3 database setup
mgr: allow specifying module option level
mon/MgrMonitor: promote standby when unsetting down flag
mon/MgrMonitor: only drop active if exists
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
The new mode of the vstart_runner allows for passing
paths to yaml configs that will be merged and then
run just as the teuthology would do it.
Building on the standard run method we can even
pass "-" as the config name and provide one on the stdin like
python3 ../qa/tasks/vstart_runner.py --config-mode "-" << END
tasks:
- quiescer:
quiesce_factor: 0.5
min_quiesce: 10
max_quiesce: 10
initial_delay: 5
cancelations_cap: 2
paths:
- a
- b
- c
- waiter:
on_exit: 100
END
This commit does the minimum to allow testing of the quiescer,
but it also lays the groundwork for running arbitrary configs.
The cornerstone of the approach is to inject our local implementations
of the main fs suite classes. To be able to do that, some minor
refactoring was required in the corresponding modules:
the standard classes were renamed to have a *Base suffix, and the
former class name without the suffix is made a module level variable
initialized with the *Base implementation. This refactoring
is meant to be backward compatible.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Since the timeout bug was fixed (https://tracker.ceph.com/issues/65533)
"Ceph API tests" sometimes fails because vstart.sh command had to be
aborted due to timeout.
Currently, "timeout" is set to 300 seconds which sometimes is not enough
for vstart.sh to run successfully for "Ceph API tests" CI job. 180
seconds usually suffices for vstart.sh to run successfully when used for
CephFS.
Increase value of "timeout" to avoid such failures on "Ceph API tests" CI.
Fixes: https://tracker.ceph.com/issues/65565
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f779b42868)
This issue was exposed by ceph API test failure. Link to this failure - https://jenkins.ceph.com/job/ceph-api/72373/
Copying traceback below from here https://jenkins.ceph.com/job/ceph-api/72373/consoleFull#-2010121386c212b007-e891-4176-9ee7-2f60eca393b7 -
2024-04-15 12:32:34,808.808 INFO:__main__:> ../src/vstart.sh -n
--nolockdep
Using guessed paths
/home/jenkins-build/build/workspace/ceph-api/build/lib/
['/home/jenkins-build/build/workspace/ceph-api/qa',
'/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3',
'/home/jenkins-build/build/workspace/ceph-api/src/pybind']
Traceback (most recent call last):
File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 1552, in <module>
exec_test()
File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 1402, in exec_test
remote.run(args=args, env=vstart_env, timeout=(3 * 60))
File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 452, in run
return self._do_run(**kwargs)
File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 491, in _do_run
proc.wait(timeout)
File
"/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py",
line 252, in wait
out, err = self.subproc.communicate(timeout=timeout)
File "/usr/lib/python3.10/subprocess.py", line 1152, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib/python3.10/subprocess.py", line 2004, in _communicate
self._check_timeout(endtime, orig_timeout, stdout, stderr)
File "/usr/lib/python3.10/subprocess.py", line 1196, in _check_timeout
raise TimeoutExpired(
subprocess.TimeoutExpired: Command '../src/vstart.sh -n --nolockdep'
timed out after 180 seconds
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Parameter "timeout" is accepted by LocalRemote.run() but the method
doesn't do anything about it besides accepting it. Thus, this parameter
has no effect.
In LocalRemote.run(), pass this parameter to LocalRemoteProcess.wait()
and from this method pass it to subprocess.Popen.communicate(). Thus,
command will be terminated by subprocess module at seconds specified by
"timeout" parameter. IOW, "timeout" parameter will have an effect.
Fixes: https://tracker.ceph.com/issues/65533
Signed-off-by: Rishabh Dave <ridave@redhat.com>
A journal flush sometimes takes more than 120 seconds and so the 'scrub
status' command after blocking for more than 120 seconds is declared
failed causing the job to be declared as failed.
This bumping up of the timeout gives more time for the 'scrub status'
command to wait and eventually let the journal flush to complete.
Fixes: https://tracker.ceph.com/issues/63411
Signed-off-by: Milind Changire <mchangir@redhat.com>
Commands issued by negtest_ceph_cmd() aren't printed because log level
(due to code for teuthology) changes from DEBUG to INFO in case of some
files.
This patch ensures that users can see commands being executed regardless
of whether log level is changed or not.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Since we pass option "--client_die_on_failed_dentry_invalidate=false"
to ceph-fuse commands issued by vstart_runner.py, passing "sudo" in
ceph-fuse command arguments is unnecessary.
This removes the error message (see below) from vstart_runner.py output
that informs the users that "sudo" was removed from command arguments.
This message is redundant and even misleading as the option above
is passed to ceph-fuse command.
This will have no functional changes on ceph-fuse mount command issued
by vstart_runner.py since FuseMount._run_mount_cmd() did not pass
"omit_sudo=False" (and since default value for omit_sudo in method
LocalRemote.run() of vstart_runner.py is true, vstart_runner.py removes
"sudo" from ceph-fuse command arguments of before execution).
The error message -
DEBUG:__main__:"sudo" was omitted from the following cmd args before execution and logging using function overriding; check vstart_runner.py for more details.
DEBUG:__main__:> sudo ./bin/ceph-fuse /tmp/tmp8o4s_6md/mnt.0 --id 0 --client_mountpoint=/ --client_die_on_failed_dentry_invalidate=false
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Now that the teuthology tools can be run in vstart_runner, there's no
reason to override this method.
Importantly, this enables the use of the new stdin-killer tool [1].
[1] https://github.com/ceph/teuthology/pull/1846
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
With [1], these tools are now installed in the teuthology virtualenv.
Update the path in the command arguments so these tools can be run via
sudo.
[1] https://github.com/ceph/teuthology/pull/1846
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
There's no technical reason to disallow this. The original intent was to
avoid deadlocks but this possibility is already present when interacting
with a teuthology RemoteProcess. Avoiding it only for local processes
does not make sense.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
With mClock scheduler enabled, a small subset of config options related
to recovery limits are not allowed to be modified unless
osd_mclock_override_recovery_settings option is enabled. This override
option is disabled by default. The following options cannot be modified
without enabling the override option:
- osd_max_backfills
- osd_recovery_max_active[_(hdd|ssd)]
The above options are removed from the mon kv store which effectively
restores them to the default values.
This was resulting in tests for example,
test_cluster_configuration.ClusterConfigurationTest to fail since it
modifies the recovery options and expects to verify the modified value.
Therefore, for tests, osd_mclock_override_recovery_settings option is
enabled in vstart_runner.py so that current and future tests
are not affected.
Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Before unmounting check if the client has been evicted and, if so, run
"umount -f -l" for the mount point of the client and cleanup the mount
right after it.
Attempting to unmount, cleanup or operate in any way over mount point
of a evicted client will hang the operation (and thereby our Python
code too). Lazy-force unmount prevents such hangs for our Python code
and also frees the mount point.
This commit also adds code to gather session info for kernel mounts
after mounting is successful. This is a necessity since network address
of session is needed to check if it is blocked by Ceph cluster.
Fixes: https://tracker.ceph.com/issues/56476
Signed-off-by: Rishabh Dave <ridave@redhat.com>
do_rados() prefixes extra arguments to every command because they are
helpful during execution of tests with teuthology. This patch
eliminates these extra arguments entirely (through overriding) for test
executions with vstart_runner.py.
Note: "timeout 120" is now prefixed to rados commands too. AFAIS, it
shouldn't have any side-effects on anything.
This commit is similar to commit 93677576c1.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
qa/cephfs: set omit_sudo False when sudo is set to True
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Nikhilkumar Shelke <nshelke@redhat.com>
1. Method cluster() in ceph.py creates a dictionary "ctx.ceph", attaches
a namespace to ctx.ceph[cluster_name], create an attribute "fsid" and
stores Ceph cluster's FSID in it.
2. The method kernel_mount.KernelMount._get_debug_dir() uses that "fsid"
attribute to get Ceph cluster's FSID. (The exact that does that is
"fsid = self.ctx.ceph[cluster_name].fsid").
3. Test test_readahead.TestReadahead.test_flush() crashes with
vstart_runner.py because that test eventually calls _get_debug_dir()
and "ctx" in case of vstart_runner.py doesn't hold "ceph" dictionary
or anything similar.
Adding a dictionary, similar to the one added in ceph.py, to
vstart_runner.LocalContext's instances will fix this issue.
Fixes: https://tracker.ceph.com/issues/55694
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Result of os.path.join() before "./bin/ceph-mds" and after
"./bin/./ceph-mds".
Before -
2022-05-05 19:36:11,100.100 DEBUG:__main__:> ./bin/./ceph-mds -i a
After -
2022-05-05 19:38:48,179.179 DEBUG:__main__:> ./bin/ceph-mds -i a
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The message regarding deletion of helper tools is printed for every
command. This message should be printed only when applicable.
Besides -
* Move XXX comments to _do_run() since it increases visibility of
these messages.
* Move omission of arguments stuff to new method clear up the clutter.
* And remove shell as a parameter from _perform_checks_and_adjustments
since it's redundant.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This methods fails to collect return value from
FuseMount._run_mount_cmd() and return it. This leads to a bug for tests
that expect mount command to fail when executed with vstart_runner.py.
Fixes: https://tracker.ceph.com/issues/55553
Signed-off-by: Rishabh Dave <ridave@redhat.com>
In these methods, parameter "sudo" indicates whether or not sudo is set
to True but the same is not indicated to methods underneath. This value
needs to be passed for the parameter to fulfill it's commitment.
Fixes: https://tracker.ceph.com/issues/55557
Signed-off-by: Rishabh Dave <ridave@redhat.com>
And therefore get rid of methods duplicated in LocalRemote and add a
call to empty constructor of RemoteShell in LocalRemote.__init__().
Signed-off-by: Rishabh Dave <ridave@redhat.com>
vstart_runner.py is written assuming that it can run commands with
superuser privileges whenever possible and vstart_runner.py is meant to
be executed without sudo.
So, it's better kill a process using "sudo kill -9 <PID>", instead of
using os.kill() because os.kill() can't kill process launched with
superuser privileges.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
About the commit date: this commit got dropped from the patch series
during some PR branch update but is added back now.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Passing "exec sudo" to "ceph -w" caused "Ceph API test" CI job to fail.
Error was not related to this tracker issue but the code added for it
is reversed now in this commit. The tracker issue -
https://tracker.ceph.com/issues/49644.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
We convert all cmd args to str and pass bash functions along to override
certain arguments in those command arguments. Let's save cmd args
without those bash functions since they can be useful later (for
example, printing cmd args in logs, which is the case in this patch.)
Signed-off-by: Rishabh Dave <ridave@redhat.com>
The intention behind copying these note points is to document the
behaviour of vstart_runner.py inside vstart_runner.py as well so that
developer don't miss it out while working on it.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Overridding commands is much better than deleting these commands from
command argument string using Python since, unlike deleting, overridding
doesn't require parsing. A note has been added for this to
vstart_runner.py's module docstring and to Ceph Developer's Guide
document.
Since functions don't work with sh shell, to make overriding work
vstart_runner.py will use bash shell here onwards.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Convert all command arguments to str from list, update checks and
adjustments performed on command arguments accordingly and update
documentation to include warnings about some critical parts of
vstart_runner.py and update tasks.cephfs.mount.MountCephFS.run_shell().
Fixes: https://tracker.ceph.com/issues/47849
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Use LocalContext instance to set LocalCephManager.cluster.
Fixes: https://tracker.ceph.com/issues/53601
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Restore ability to run radosgw_admin.py unit standalone--improved
to use vstart_runner hooks.
Local rgwadmin(...) wrapper suggested as a cleanup in review by Casey.
Fixes: https://tracker.ceph.com/issues/52837
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Without this plenty tests become incompatible with vstart_runner.py.
Ideally, vstart_runner.py should've been updated in commit 7812cfb674.
Fixes: https://tracker.ceph.com/issues/53043
Signed-off-by: Rishabh Dave <ridave@redhat.com>