Commit Graph

85 Commits

Author SHA1 Message Date
Sridhar Seshasayee
f539bedc96 qa/standalone: Add standalone test to validate osd-mclock-skip-benchmark option
Add a standalone test - test_activate_osd_skip_benchmark() in ceph-helpers.sh
that exercises the osd-mclock-skip-benchmark option.

Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-09-01 14:19:03 +05:30
Sridhar Seshasayee
5a85a6a035 qa/standalone: Modify ceph-helpers.sh tests for mclock scheduler.
List of changes:

1. Remove the enforcement to use osd_op_queue=wpq when an osd is brought
   up in the following functions:
   - run_osd()
   - run_osd_filestore() and
   - activate_osd()

2. New functions:
   - get_op_scheduler() - Get the current osd_op_queue for an osd.

3. Modified test cases:
   - test_run_osd() - Add check for osd_max_backfill count.
     The mclock scheduler overrides the count to 1000.

4. New test cases:
   - test_activate_osd_after_mark_down()
   - test_get_op_scheduler()

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Patrick Donnelly
d6c66f3fa6
qa,pybind/mgr: allow disabling .mgr pool
This is mostly for testing: a lot of tests assume that there are no
existing pools. These tests relied on a config to turn off creating the
"device_health_metrics" pool which generally exists for any new Ceph
cluster. It would be better to make these tests tolerant of the new .mgr
pool but clearly there's a lot of these. So just convert the config to
make it work.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-06-11 19:35:17 -07:00
Sridhar Seshasayee
94826eaadc qa/standalone: Use osd op queue = wpq in activate_osd()
This change is a follow-up to commit
b6e9c0903d5ad9a699b675f9fa7739e9cce9a5f3 that set the scheduler to wpq in
run_osd() and run_osd_filestore(). In addition, activate_osd() too has to
set the scheduler type to 'wpq' in order to be consistent and avoid test
failures.

The above is a temporary measure until all the standalone tests are
modified to run well with the mclock_scheduler.

Fixes: https://tracker.ceph.com/issues/51074
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-06-09 15:02:58 +05:30
Neha Ojha
b6e9c0903d qa/standalone: use osd op queue = wpq
mclock_scheduler is now the default and some of these tests need to be modified
to run well with it. Continue using wpq until
https://tracker.ceph.com/issues/50574 is addressed.

Signed-off-by: Neha Ojha <nojha@redhat.com>
2021-05-06 17:54:38 +00:00
Sage Weil
72c4fc75ad qa/standalone: default to disable insecure global id reclaim
Signed-off-by: Sage Weil <sage@newdream.net>
2021-04-06 17:29:23 -04:00
Ronen Friedman
bb848cfd90 qa/standalone/ceph-helpers.sh: log meaningful PIDs for run_in_background()
While the relevant comment says:
'# Execute the command and prepend the output with its pid'
the actual PID logged is the same for all background processes,
which isn't very helpful.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2020-12-28 10:47:02 +02:00
David Zafman
ef47a3e708 test: set mon_allow_pool_size_one for consistency with original test intention
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-03 21:49:00 +00:00
David Zafman
41322eaa62 test: flush_pg_stats() ignore OSDs that don't respond to getting sequence
This eliminates bogus errors in the logs and returned from flush_pg_stats()

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-06-16 17:45:26 -07:00
Deepika Upadhyay
21508bd9dd mon/OSDMonitor: add flag --yes-i-really-mean-it for setting pool size 1
Adds option `mon_allow_pool_size_one` which will be disabled by default
to ensure pools are not configured without replicas.
If the user still wants to use pool size 1, they will have to change the
value of `mon_allow_pool_size_one` to true and then have to pass flag
`--yes-i-really-mean-it` to cli command:

Example:
`ceph osd pool test set size 1 --yes-i-really-mean-it`

Fixes: https://tracker.ceph.com/issues/44025
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
2020-03-09 23:27:36 +05:30
Sage Weil
455cdcf89a qa/standalone/ceph-helpers: disable device monitoring
Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-19 15:31:26 -06:00
Sage Weil
78ec6aec90 qa/standalone/ceph-helpers: add wait_for_peered
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-20 13:23:56 -06:00
Sage Weil
3a62d166a7 qa/standalone/ceph-helpers.sh: remove osd down check
A kill doesn't induce a mark-down of the OSD with osd_fast_shutdown=true.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-24 12:19:33 -06:00
Sage Weil
ede1d36773 qa/standalone/ceph-helpers.sh: destroy_osd: mark osd down
Stopping the OSD doesn't guarantee that it will be marked down.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-11-19 20:05:16 -06:00
Sage Weil
70367de903 qa: change mon_status calls to quorum_status or tell commands
The tests were doing logs of 'ceph mon_status'; change that to
quorum_status or tell.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-12 12:05:36 -05:00
Sage Weil
f71672c6ad qa/standalone/ceph-helpers: default pg autoscale mode off for standalone
Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-22 16:59:07 -05:00
Sage Weil
0d0759531a qa/standalone/ceph-helpers: more osd debug
debug_ms=1
debug_monc=20

Hunting down http://tracker.ceph.com/issues/40666

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-03 16:53:00 -05:00
Kefu Chai
cdba0f1420 qa/standalone/ceph-helpers: resurrect all OSD before waiting for health
address the regression introduced by e62cfceb
in e62cfceb, we wanted to test the newly introduced TOO_FEW_OSDS
warning, so we increased the number of OSD to the size of pool, so if
the number of OSD is less than pool size, monitor will send a warning
message.

but we need to bring all OSDs back if we are expecting a healthy
cluster. in this change, all OSDs are resurrect before
`wait_for_health_ok`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-05-30 23:52:36 +08:00
zjh
e62cfceb95 qa/standalone: remove osd_pool_default_size in test_wait_for_health_ok
Signed-off-by: zjh <jhzeng93@foxmail.com>
2019-05-06 14:35:54 +08:00
David Zafman
3a234164d0
Merge pull request #27279 from dzafman/wip-divergent
Improvements to standalone tests

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-24 10:58:11 -07:00
David Zafman
7e77898001 test: Divergent testing of _merge_object_divergent_entries() cases
Case 1: A more recent update exists
Case 2: The first entry in the divergent sequence is a create
Case 3  NOT TESTED - Ohject currently missing
Case 4: We can rollback all of the entries
Case 5: We cannot rollback at least 1 of the entries

Support starting OSDs even when "noup" is set (don't wait for up).
Move create_ec_pool() to ceph-helpers.sh

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-22 18:50:24 -07:00
David Zafman
69fa515c95 test: Make most tests use default objectstore bluestore
Change run_osd() to default objectstore bluestore
Use run_osd_filestore() to use the non-default objectstore
Fix inject_eio to handle any objectstore if config prefixed with type

Remaining tests using filestore:
	osd-pool-create.sh TEST_pool_create_rep_expected_num_objects
		Test filestore directory creation
	qa/standalone/osd/osd-dup.sh TEST_filestore_to_bluestore
		Obvious
	qa/standalone/osd/osd-rep-recov-eio.sh TEST_rep_read_unfound
		Requires data digest in object info
	qa/standalone/scrub/osd-scrub-repair.sh multiple tests
		Erasure code pools append mode for filestore is tested
	qa/standalone/special/ceph_objectstore_tool.py
		Test code verifies COT by directly examining filestore contents

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-10 08:55:04 -07:00
Sage Weil
aa33a26e32 mon/MDSMonitor: add 'mds ok-to-stop' command
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-01 14:58:50 -05:00
Sage Weil
30fc7f5e97 qa/standalone/ceph-helpers: fix test_wait_for_clean
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-08 18:07:10 -06:00
Sage Weil
1e2b0c7252 qa/standalone/ceph-helpers.sh: fix test_run_mon
- Only create each osd once
- forget the first osdmap dump test; it's pointless

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-08 17:43:00 -06:00
Sage Weil
cba0483b09 qa/standalone: make sure an osd is running before create_rbd_pool
'rbd pool init' now does IO.  Drop the pool, or change the pool size to 1.

Fixes: http://tracker.ceph.com/issues/38585
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-06 16:27:56 -06:00
David Zafman
690ff9a21f
Merge pull request #26213 from dzafman/wip-38041
osd: Fix recovery and backfill priority handling

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-02-07 17:26:34 -08:00
Sage Weil
dcdca44aa4 qa/standalone/ceph-helpers: fix health_ok test
Stopping the osd daemon won't reliably get you HEALTH_WARN or ERR; you have
to make sure it is also marked down.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-02-07 12:10:34 -06:00
David Zafman
bca4fe98b1 test: Fix kill_daemon() to check after last large sleep
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-05 11:30:04 -08:00
David Zafman
70b5136208 test: Add option to wait_for_clean() to execute at every sleep
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-01-30 09:35:51 -08:00
Kefu Chai
94a84b6f5a test: listen on random port in tests which start ceph-mon
See-also: http://tracker.ceph.com/issues/36737
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-01-27 21:16:54 +08:00
David Zafman
3b8f86c8b0 test: Add testing for backfill out of space detection
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-18 09:30:44 -08:00
Igor Fedotov
79fd227639 qa: replace raw_bytes_used field access in QA test cases
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-12-06 18:54:21 +03:00
John Spray
67d147c00d
Merge pull request #23622 from renhwztetecs/renhw-wip-25103
mgr: fixup pgs show in unknown state

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: John Spray <john.spray@redhat.com>
2018-10-10 13:28:33 +01:00
huanwen ren
ed442447c0 qa: modify the format for add pgmap_ready.
Signed-off-by: huanwen ren <ren.huanwen@zte.com.cn>
2018-09-27 23:22:50 +08:00
Kefu Chai
f46523e464
Merge pull request #23955 from wjwithagen/wjw-fix-ceph-helpers.sh
test: Start using GNU awk and fix archiving directory

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-09-17 15:44:06 +08:00
David Zafman
6e3f04365f test: Trap termination so we can capture logs on teuthology timeout
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-09-10 12:23:07 -07:00
Willem Jan Withagen
bfe7a2afaa test: Start using GNU awk and fix archiving directory
awk uses some tests that the native FreeBSD awk does not support:
    like: BEGIN{print 0 < 90}

And TESTDIR is not set when calling ceph-helpers from smoke.sh
    So fix with keeping the archive in /tmp

Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2018-09-06 15:50:20 +02:00
David Zafman
d0b260c272 test: Fix test to use -gt instead of creating an empty file "0"
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-08-17 19:33:44 -07:00
Noah Watkins
7d3fa9bda3 qa/standalone/ceph-helpers.sh: fix mgr module path
callers of get_python_path were not passing in a $1 parameter, so
ceph_lib was an empty string resulting in an invalid path to the built
cython modules. assume this is called from the `lib` parent directory.

pass path to the manager modules when starting ceph-mgr.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
2018-08-17 15:21:57 -07:00
David Zafman
fbc8bcfe05 test: test_get_timeout_delays() fix
Caused by: 7b0d1c8b8acff2a7010bfb0400df09786033ac63

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-07-03 14:01:36 -07:00
David Zafman
663d96e934
Merge pull request #22727 from dzafman/wip-21664
qa/standalone/scrub: When possible show side-by-side diff in addition to regular diff

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-06-28 19:59:21 -04:00
David Zafman
3ff56a82a4
Merge pull request #22763 from dzafman/wip-remove-sudo
qa: Don't use sudo when moving logs

Reviewed-by: Neha Ojha <nojha@redhat.com>
2018-06-28 18:37:24 -04:00
David Zafman
23ed63e15f
Merge pull request #22441 from ErwanAliasr1/evelu-makecheck
Improving make check reliability

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: David Zafman <dzafman@redhat.com>
2018-06-28 14:55:12 -04:00
David Zafman
808c628304 qa: Don't use sudo when moving logs
Caused by: f0964beac5

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-28 09:17:06 -07:00
David Zafman
ebb05b2542 test: When possible show side-by-side diff in addition to regular diff
Fixes: https://tracker.ceph.com/issues/21664

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-26 18:23:07 -07:00
David Zafman
f0964beac5 qa: For teuthology copy logs to teuthology expected location
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-25 18:06:01 -07:00
Erwan Velu
57df91380b qa/standalone/ceph-helpers.sh: Setup ulimit in setup()
If ulimit is set to a 1024 value, ceph-osd will segfault with the
following error :
    filestore(td/smoke/0)  error (24) Too many open files not handled on operation 0x55565d1fd004 (2182.1.0, or op 0, counting from 0)

This patch is about to insure that before setting up ceph daemons in tests, a valid ulimit value is setup.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-25 22:09:14 +02:00
Erwan Velu
7b0d1c8b8a qa/standalone/ceph-helpers.sh: Thinner resolution in get_timeout_delays()
get_timeout_delays() is a generic function to compute delays for a long
period of time without saturating the CPU is busy loops.

It works pretty fine when the delay is short like having the following
series when requesting a 20seconds timeout : "0.1 0.2 0.4 0.8 1.6 3.2 6.4 7.3 ".
Here the maximum between two loops is 7.3 which is perfectly fine.

When the timeout reaches 300sec, the same code produces the following
series : "0.1 0.2 0.4 0.8 1.6 3.2 6.4 12.8 25.6 51.2 102.4 95.3 "
In such example there is delays which are nearly 2 minutes !

That is not efficient as the expected event, between two loops, could
arrive just after this long sleep occurs making a minute+ sleep for
nothing. On a local system that could be ok while on a CI, if all jobs
run like CI the overall is pretty unefficient by generating useless CPU
waits.

This patch is about adding a maximum acceptable delay time between two
loops while keeping the same rampup behavior.

On the same 300 seconds delay example, with MAX_TIMEOUT set to 10, we
now have the following series: "0.1 0.2 0.4 0.8 1.6 3.2 6.4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 7.3"
We can see that the long 12/25/51/102/95 values vanished and being
replaced by a series of 10 seconds. It's up to every test defining the
probability of having a soonish event to complete.

The MAX_TIMEOUT is set to 15seconds.
Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-25 22:09:14 +02:00
Sage Weil
3cd7d5eb22 Merge PR #22343 into master
* refs/pull/22343/head:
	qa/standalone remove ceph-disk from activate_osd helper
	cmake: remove subman.sh tests
	test remove ceph-disk directory
	debian: remove ceph_detect_init python files from base
	qa/standalone remove virtualenv paths for ceph-disk and ceph-detect-init
	debian: remove ceph-disk ceph-detect-init python files
	rpm: remove ceph-disk ceph-detect-init python files
	alpine: remove ceph-disk ceph-detect-init python files
	alpine: remove ceph-osd and parttypeuuid udev rules
	debian: remove ceph-osd and parttypeuuid udev rules
	rpm: remove ceph-osd and parttypeuuid udev rules
	ceph-helpers.sh: remove ceph-disk, set up osds directly
	CMakeLists.txt: add back CEPH_BUILD_VIRTUALENV
	alpine: remove ceph-disk, add ceph-volume in APKBUILD.in
	upstart: remove ceph-disk activation call
	doc/install add anchor for manual osd deployment in freebsd guide
	doc/dev remove ceph-disk from freebsd guide, link to manual reference
	doc/dev/config-key remove ceph-disk references
	doc/dev remove ceph-disk.rst
	doc/dev: change ceph-disk suite examples for ceph-deploy
	doc/man_index: remove ceph-disk, ceph-detect-init refs
	doc/install: remove ceph-disk from freebsd examples
	doc/rados remove ceph-disk from man references
	doc/man remove ceph-disk ref from ceph-volume-systemd
	doc/man: update reference from ceph-disk to ceph-volume
	doc/man: remove ceph-disk, ceph-detect-init from cmake
	doc/man/ceph-volume remove doc reference to ceph-disk
	doc/man: remove ceph-disk, ceph-detect-init
	qa/suites: remove ceph-disk
	qa/run-standalone.sh: remove requirement for ceph-detect-init virtualenv
	qa/workunits: remove ceph-detect-init from rbdmapfile test
	qa/workunits: remove ceph-detect-init from ceph-helpers-root.sh
	qa/workunits: remove ceph-disk
	build: remove ceph-disk from freebsd script
	cmake: remove ceph-disk, ceph-detect-init tox tests
	init-ceph: remove ceph-disk
	cmake: remove top-level entries for ceph-disk, ceph-detect-init
	debian: remove ceph-detect-init references
	debian: remove ceph-disk references
	src: remove ceph-detect-init tool
	rpm: remove ceph-disk, ceph-detect-init from spec file
	test: remove subman script
	script: remove subman script
	udev: remove parttypeuuid rules for ceph-disk
	tool remove ceph-disk from ps-ceph.pl
	upstart: remove ceph-disk conf file
	systemd: remove ceph-disk from CMakeLists
	systemd: remove ceph-disk service
	udev: remove ceph-disk rules
	src: remove ceph-disk tool
2018-06-19 07:07:55 -05:00