RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-20 10:01:45 +00:00

Author	SHA1	Message	Date
Rishabh Dave	93677576c1	qa/ceph_manager: make it possible to reuse few methods Make minor adjustments to ceph_manager.CephManager so that methods run_ceph_w(), run_cluster_cmd() raw_cluster_cmd() and raw_cluster_cmd_result() can be reused, instead of duplicating, in subclasses. The adjustments are - * Having variables contain arguments that'll be prepended to every command received by the methods above. * Grouping variables that needs to be overridden together so that it is easy to spot and override them for users. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2021-08-02 11:37:49 +05:30
Rishabh Dave	047c90f881	qa/vstart_runner: don't use "shell=False" in run_ceph_w() Instead prepend "exec sudo" to the command arguments of LocalCephManager.run_ceph_w(). This makes the default parameter "shell=False" redundant in case of ceph_manager.CephManager.run_ceph_w(), so get rid of it too and update calls to run_ceph_w() accordingly. The reason behind using any of these workarounds is that running "ceph -w" with "shell" set to True leads to crash for Ceph API CI job. See this ticket for more details: https://tracker.ceph.com/issues/49644. The reason behind switching the workaround is that in the following commits to reduce duplication LocalCephManager.run_ceph_w() will be deleted and CephManager.run_ceph_w() will be used by LocalCephManager via inheritance. However, due to the issue described above, Ceph API test will fail since "shell" is set to "True" for the command issued by CephManager.run_ceph_w(). Prepending "exec sudo" to the command when it is used in LocalCephManager makes this duplication unnecessary and also prevents Ceph API test from failing. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2021-08-02 11:37:44 +05:30
Rishabh Dave	4101f76ed6	qa/ceph_manager: minor refactor Save the return value of method "teuthology.get_testdir()" instead of calling it repeatedly in the same class. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2021-08-02 10:07:23 +05:30
Kefu Chai	7afd38f846	tasks/ceph_manager: ignore EACCES when waiting for quorum mon_tick_interval is 5 seconds by default. monitors update their rotating keys every mon_tick_interval. before monitors forms a quorum, the auth requests from clients are put into the wait list. these requests are re-enqueued once the monitors form a quorum. but there is a small window of mon_tick_interval, before they are able to serve the auth requests even after their claim to be able to server requests. if these re-enqueued requests happen to be served in this window, and if authx is enabled, they will be greeted with errors like handle_auth_bad_method server allowed_methods [2] but i only support [2] in the case of ceph cli, the error would look like: [errno 13] RADOS permission denied (error connecting to the cluster) so, to address this issue, the EACCES error is ignored when waiting for a quorum. Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-06-10 20:29:50 +08:00
Kefu Chai	3908c1f4cd	tasks/ceph_manager: use safe_while() to refactor the wait for quorum for better readability Signed-off-by: Kefu Chai <kchai@redhat.com>	2021-06-10 20:29:50 +08:00
Sage Weil	4574ed70f4	qa/tasks/rook: deploy ceph via rook on top of kubernetes This assumes that k8s is installed and kubectl works. The ceph container to use is selected the same way the cephadm task does it. All scratch devices are consumed as OSDs. A ceph.conf and client.admin keyring are deployed on all test nodes, so normal tasks should work (if/when packages are installed). Fixes: https://tracker.ceph.com/issues/47507 Signed-off-by: Sage Weil <sage@newdream.net>	2021-05-18 15:19:04 -05:00
Kefu Chai	73925c488a	Merge pull request #39969 from batrick/i49684 qa: wait for daemons to come up via cephadm Reviewed-by: Sage Weil <sage@redhat.com>	2021-03-28 20:01:32 +08:00
Patrick Donnelly	42270a5338	Merge PR #38443 into master * refs/pull/38443/head: qa: set "shell" to False for run_ceph_w() vstart_runner: make "shell" a default argument Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>	2021-03-22 20:00:46 -07:00
Patrick Donnelly	24bb1aa31b	qa: improve usability of do_rados helper Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2021-03-21 10:35:07 -07:00
Kefu Chai	181dc1a43f	Merge pull request #39757 from aclamk/wip-qa-test-bluestore-reshard qa: Add bluestore resharding test Reviewed-by: Josh Durgin <jdurgin@redhat.com>	2021-03-17 22:41:34 +08:00
Rishabh Dave	df88ec3822	qa: set "shell" to False for run_ceph_w() Setting shell to True in call to run() in LocalCephManager.run_ceph_w() leads to a crash when self.subproc.communicate() is executed for the process created by running "ceph -w". Signed-off-by: Rishabh Dave <ridave@redhat.com>	2021-03-12 09:03:13 +05:30
Sebastian Wagner	340281fe76	qa/tasks: some type annotations Mostly for making my IDE aware of things Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>	2021-03-10 15:02:41 +01:00
Adam Kupczyk	a84820b743	qa: Add bluestore resharing test Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>	2021-03-10 10:21:09 +01:00
Kefu Chai	434c1ce400	Merge pull request #39775 from rishabh-d-dave/fs-qa-accept-cmds-as-str qa/ceph_manager: accepts commands as str too Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>	2021-03-07 23:52:54 +08:00
Kefu Chai	3cdb88b0ac	Merge pull request #39690 from rishabh-d-dave/qa-raw_cluster_cmd qa/ceph_manger: fixes bugs in CephManager Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>	2021-03-07 23:51:05 +08:00
Rishabh Dave	a1dc6b6c19	qa/ceph_manager: accepts commands as str too Modify CephManager.run_cluster_cmd() to accept command arguments as string as well since typing commands as strings is much lesser effort than typing as list. This brings the interface a step closer to teuthology.orchestra.remote.run()'s interface since it too can accept commands arguments as string. The change in cephfs_test_case.py is just to allow testing this PR locally and on teuthology. Signed-off-by: Rishabh Dave <ridave@redhat.com>	2021-03-04 09:42:44 +05:30
Patrick Donnelly	3e5e03d4d2	qa: skip chdir for fuse_mount The use of chdir will muck up the use of nsenter with valgrind: 2021-03-03T02:13:49.897 DEBUG:teuthology.orchestra.run.smithi144:> sudo nsenter --net=/var/run/netns/ceph-ns--home-ubuntu-cephtest-mnt.0 cd /home/ubuntu/cephtest && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term env 'OPENSSL_ia32cap=~0x1000000000000000' valgrind --trace-children=no --child-silent-after-fork=yes '--soname-synonyms=somalloc=tcmalloc' --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/client.0.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck --leak-check=full --show-reachable=yes ceph-fuse -f --admin-socket '/var/run/ceph/$cluster-$name.$pid.asok' --id 0 /home/ubuntu/cephtest/mnt.0 2021-03-03T02:13:49.899 DEBUG:teuthology.orchestra.run.smithi144:> sudo modprobe fuse 2021-03-03T02:13:49.914 INFO:teuthology.orchestra.run:Running command with timeout 30 2021-03-03T02:13:49.914 DEBUG:teuthology.orchestra.run.smithi144:> sudo mount -t fusectl /sys/fs/fuse/connections /sys/fs/fuse/connections 2021-03-03T02:13:49.919 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi144.stderr:nsenter: failed to execute cd: No such file or directory It's not necessary to chdir at all to do the mount, so don't. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2021-03-03 09:30:21 -08:00
Patrick Donnelly	3681e3a1a8	qa: move get_valgrind_args to qa This method is unused in the teuthology repo. The helper method better belongs here where it is more easily modified. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2021-03-03 09:30:08 -08:00
Rishabh Dave	793980ec0e	qa: don't override with args when it's empty In methods raw_cluster_cmd_result() of CephManager and LocalCephManager and raw_cluster_cmd of LocalCephManager when keyword arguments are passed instead of positional arguments, the methods run ceph command with no arguments. This is because the methods do "kwargs['args'] = args" unconditionally. Fixes: https://tracker.ceph.com/issues/49486 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2021-03-01 18:38:55 +05:30
Rishabh Dave	d1a0608b50	qa/ceph_manager: make raw_cluster_cmd() keywords args compatible In CephManager.raw_cluster_cmd(), pass only kwargs to run_cluster_cmd() instead of both args and kwargs since passing both will lead to "TypeError: got multiple values". Fixes: https://tracker.ceph.com/issues/49495 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2021-03-01 18:38:55 +05:30
Josh Durgin	29b5f90c66	qa/tasks/ceph_manager: let c-o-t log errors This will capture e.g. bluestore fsck issues in teuthology.log Signed-off-by: Josh Durgin <jdurgin@redhat.com>	2021-01-15 19:10:13 -05:00
Patrick Donnelly	166bb4d551	qa: add ceph cmd helper A more programmer friendly command to use. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2021-01-12 07:24:28 -08:00
Patrick Donnelly	c0907f99e8	qa: allow kwargs for raw_cluster_cmd And refactor. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>	2021-01-12 07:24:28 -08:00
Neha Ojha	df7adbf387	qa/tasks/ceph_manager.py: remove redundant quorum status logging 2020-10-21T03:42:45.985 INFO:teuthology.orchestra.run.smithi114:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph quorum_status 2020-10-21T03:42:58.574 INFO:teuthology.orchestra.run.smithi114.stdout:{"election_epoch":1650,"quorum":[0,2],"quorum_names":["a","c"],"quorum_leader_name":"a","quorum_age":0,"features":{"quorum_con":"4540138297136906239","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"]},"monmap":{"epoch":1,"fsid":"807c36f1-9e85-4fa3-81fc-95915ab50584","modified":"2020-10-21T00:34:48.421341Z","created":"2020-10-21T00:34:48.421341Z","min_mon_release":16,"min_mon_release_name":"pacific","election_strategy":3,"disallowed_leaders":"","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.114:6789","nonce":0}]},"addr":"172.21.15.114:6789/0","public_addr":"172.21.15.114:6789/0","priority":0,"weight":0},{"rank":1,"name":"b","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.133:6789","nonce":0}]},"addr":"172.21.15.133:6789/0","public_addr":"172.21.15.133:6789/0","priority":0,"weight":0},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.114:6790","nonce":0}]},"addr":"172.21.15.114:6790/0","public_addr":"172.21.15.114:6790/0","priority":0,"weight":0}]}} 2020-10-21T03:42:58.589 INFO:tasks.mon_thrash.ceph_manager:quorum_status is {"election_epoch":1650,"quorum":[0,2],"quorum_names":["a","c"],"quorum_leader_name":"a","quorum_age":0,"features":{"quorum_con":"4540138297136906239","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"]},"monmap":{"epoch":1,"fsid":"807c36f1-9e85-4fa3-81fc-95915ab50584","modified":"2020-10-21T00:34:48.421341Z","created":"2020-10-21T00:34:48.421341Z","min_mon_release":16,"min_mon_release_name":"pacific","election_strategy":3,"disallowed_leaders":"","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.114:6789","nonce":0}]},"addr":"172.21.15.114:6789/0","public_addr":"172.21.15.114:6789/0","priority":0,"weight":0},{"rank":1,"name":"b","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.133:6789","nonce":0}]},"addr":"172.21.15.133:6789/0","public_addr":"172.21.15.133:6789/0","priority":0,"weight":0},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v1","addr":"172.21.15.114:6790","nonce":0}]},"addr":"172.21.15.114:6790/0","public_addr":"172.21.15.114:6790/0","priority":0,"weight":0}]}} Signed-off-by: Neha Ojha <nojha@redhat.com>	2020-10-27 21:14:54 +00:00
Changcheng Liu	dbdcb2535d	common: remove log_early configuration option After deciding to always enable tracking log in early phase, there's no need to keep "log_early" option here and remove it directly. Suggested-by: Kefu Chai <kefu@redhat.com> Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>	2020-10-19 14:30:28 +08:00
Neha Ojha	e7eddec5a0	qa/tasks/ceph_manager.py: remove redundant check in raw_cluster_cmd_result Fixes `530982129e`. The check for cephadm is no longer needed since it was moved to run_cluster_cmd. Fixes: https://tracker.ceph.com/issues/47239 Signed-off-by: Neha Ojha <nojha@redhat.com>	2020-09-15 17:56:33 +00:00
Kefu Chai	eda90040ad	qa: always use subprocess.{DEVNULL,check_output} no need to check for their existence, and prepare a replacement. because we've migrated to python3. and we only support python3.6 and up. Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-09-03 13:09:16 +08:00
Sage Weil	dfd01d7653	blacklist -> blocklist Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Neha Ojha <nojha@redhat.com>	2020-08-24 19:53:08 +00:00
Rishabh Dave	530982129e	qa: add method run ceph cluster command with better interface This new method should allow better control on the process launched by the passed command. This is achieved by allowing arguments provided by teuthology.orchestra.run.run(). Signed-off-by: Rishabh Dave <ridave@redhat.com>	2020-08-21 22:16:21 +05:30
Kefu Chai	a7f18e46b9	qa/tasks/{ceph,ceph_manager}: drop py2 support Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-07-05 10:58:28 +08:00
Patrick Donnelly	af4d4ee6f1	Merge PR #35522 into master * refs/pull/35522/head: vstart_runner: set default values of stdout and stderr to None Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>	2020-06-24 11:34:04 -07:00
Kefu Chai	2fa726b88c	qa/tasks: flake8 fixes Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-06-23 23:00:56 +08:00
Rishabh Dave	cc8f15818a	vstart_runner: set default values of stdout and stderr to None Not doing so leads to tests run successfully with vstart_runner.py but crash when triggered with teuthology since the default values of these variables there is None. Fixes: https://tracker.ceph.com/issues/45815 Signed-off-by: Rishabh Dave <ridave@redhat.com>	2020-06-17 14:42:53 +05:30
Neha Ojha	4deba4e8bd	qa/tasks/ceph_manager.py: dump more useful info before failing Add helpers that dump information only about PGs that haven't reached the desired state when we fail. Previously we dumped the output of "ceph pg dump" before failing, which prints a lot of unnecessary information about PGs that are not responsible for the failure, making debugging harder. Also, try to make the failure messages distinct. Signed-off-by: Neha Ojha <nojha@redhat.com>	2020-06-11 15:22:04 +00:00
Kefu Chai	6bc09c5041	qa/tasks/ceph_manager.py: do not return a filter as the caller might want to `len(manager.get_osd_status()['raw'])`, and `len()` does not accept a `filter` object. also, the filtered osd statuses are printed out using `self.log()`, so we should materialize the `filter` object before sending it to logging facility. otherwise we will have something like: ``` 2020-04-08T02:58:37.001 INFO:tasks.ceph.ceph_manager.ceph:<filter object at 0x7f5a080e1518> ``` in the logging message. Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-04-08 11:02:29 +08:00
Kefu Chai	8bfe977854	qa/tasks: use StringIO for capturing string output see d8d44ed1566b19eec055e07da2a0fed88fed4152 Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-04-07 21:51:22 +08:00
Kefu Chai	9ca45bd942	qa/tasks: do not random.choice(a_view) use `random.sample()` instead of `random.choice(list(a_view))` for better performance. Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-04-07 20:33:47 +08:00
Kefu Chai	d7258ea7fd	qa/tasks: use next(iter(..)) for accessing first element in a view in python2, dict.values() and dict.keys() return lists. but in python3, they return views, which cannot be indexed directly using an integer index. there are three use cases when we access these views in python3: 1. get the first element 2. get all the elements and then might want to access them by index 3. get the first element assuming there is only a single element in the view 4. iterate thru the view in the 1st case, we cannot assume the number of elements, so to be python3 compatible, we should use `next(iter(a_dict))` instead. in the 2nd case, in this change, the view is materialized using `list(a_dict)`. in the 3rd case, we can just continue using the short hand of ```py (first_element,) = a_dict.keys() ``` to unpack the view. this works in both python2 and python3. in the 4th case, the existing code works in both python2 and python3, as both list and view can be iterated using `iter`, and `len` works as well. Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-04-07 20:33:47 +08:00
Kefu Chai	9039db5962	Merge pull request #33805 from tchaikov/wip-44500 qa/tasks/ceph_manager: capture stderr for COT Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>	2020-03-10 21:26:29 +08:00
Kefu Chai	d8d44ed156	qa/tasks/ceph_manager: use StringIO for capturing COT output there are couple factors we should consider when choosing between BytesIO and StringIO: - if the producer is producing binary - if we are expecting binary - if the layers in between them are doing the decoding/encoding automatically. in our case, the producer is either the ChannelFile instances returned by paramiko.SSHClient or subprocess.CompletedProcess insances returned by subprocess.run(). the former are file-like objects opened in "r" mode, but their contents are decoded with utf-8 when reading if ChannelFile.FLAG_BINARY is not specified. that's why we always try to add this flag in orchestra/run.py when collecting the stdout and stderr from paramiko.SSHClient after executing a command. back in python2, this works just fine. as we don't differentiate bytes from str by then. but in python3, we have to make a decision. in the case of ceph-objectstore-tool (COT for short), it does not produce binary and we don't check its output with binary, so, if neither Remote.run() nor LocalRemote.run() decodes/encodes for us, it's fine. so it boils down to `copy_to_log()`: i think we we should respect the consumer's expectation, and only decode the output if a StringIO is passed in as stdout or stderr. as we always log the output with logging we could either set `ChannelFile.FLAG_BINARY` depending on the type of `capture` or not. if it's not set, paramiko will return str (bytes) on python2, and str on python3. if it's not set paramiko will return str (bytes) on python2, and bytes on python3. if there is non-ASCII in the output, logging will bail fail with `UnicodeDecodeError` exception. and paramiko throws the same exception when trying to decode for us if `ChannelFile.FLAG_BINARY` is not specified. so to ensure that we always have logging messages no matter if the producer follows the rule of "use StringIO if you only emit text" or not, we have to use `ChannelFile.FLAG_BINARY`, and force paramiko to send us the bytes. but we still have the luxury to use StringIO and do the decode when the caller asks for str explicitly. that'd save the pain of using `str.decode()` or `six.ensure_str()` everywhere even if we can assure that the program does not write binary. Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-03-09 10:47:48 +08:00
Kefu Chai	78308f7207	qa/tasks/ceph_manager: capture stderr for COT as we are expecting the error message written to stderr, and we need to check for the error messages in it. this change addresses the regression introduced by 204ceee156cbb8a20bdf56efb0cd0610ee4c107e Fixes: https://tracker.ceph.com/issues/44500 Signed-off-by: Kefu Chai <kchai@redhat.com>	2020-03-08 14:43:13 +08:00
Sage Weil	96220c0c05	qa/tasks/cephadm: put bootstrap config etc directly in /etc/ceph This puts the conf and keyring in /etc/ceph earlier rather than later, making them useful for debugging a live system during bootstrap. It's also less code. Signed-off-by: Sage Weil <sage@redhat.com>	2020-03-07 15:18:45 -06:00
Thomas Bechtold	46e22c422b	qa: Enable basic mypy support for qa/ directory A first step to do more automatic code checks on the qa/ directory. This is useful while transitioning to python3. Also use log_exc to top-level to not run into: error: Argument 1 to "log_exc" has incompatible type "Callable[[OSDThrasher], Any]"; expected "OSDThrasher" Signed-off-by: Thomas Bechtold <tbechtold@suse.com>	2020-03-05 06:54:56 +01:00
Kyr Shatskyy	4c992baf25	qa/tasks/ceph_manager: ensure str for py3 compat Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>	2020-03-04 13:09:17 +08:00
Kyr Shatskyy	e46eb8348e	qa/tasks: fix imports for py3 compatibility Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>	2020-03-04 13:09:16 +08:00
Kyr Shatskyy	204ceee156	qa/tasks/ceph_manager: get rid of CStringIO for py3 Use io.BytesIO instead cStringIO.StringIO for py3 compatibility Signed-off-by: Kefu Chai <kchai@redhat.com> Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>	2020-03-04 13:09:16 +08:00
Sage Weil	b66f5df514	Merge PR #32986 into master * refs/pull/32986/head: qa/tasks/ceph_manager: fix movement of cot exports with cephadm Reviewed-by: Neha Ojha <nojha@redhat.com>	2020-02-01 10:47:56 -06:00
Sage Weil	d8a7c73a48	Merge PR #32987 into master * refs/pull/32987/head: qa/tasks/ceph_manager: make fix_pgp_num behave when no pool is found Reviewed-by: Neha Ojha <nojha@redhat.com>	2020-01-31 17:40:23 -06:00
Sage Weil	42768600d4	qa/tasks/ceph_manager: fix movement of cot exports with cephadm I think this will finally work... Signed-off-by: Sage Weil <sage@redhat.com>	2020-01-31 17:26:10 -06:00
Sage Weil	8c87110b54	qa/tasks/ceph_manager: add --log-early to raw_cluster_cmd This is harmless if logging is low, but adds useful info when it is turned up. Hunting bug https://tracker.ceph.com/issues/43914 Signed-off-by: Sage Weil <sage@redhat.com>	2020-01-30 10:36:28 -06:00

1 2 3 4

193 Commits