Commit Graph

85 Commits

Author SHA1 Message Date
Ilya Dryomov
237aa221eb qa/suites/krbd: stress test for recovering from watch errors
Fixes: https://tracker.ceph.com/issues/63010
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-10-02 12:21:12 +02:00
Ilya Dryomov
2094a0450d qa/suites/krbd: rename singleton to singleton-msgr-failures
A "singleton without msgr-failures" is wanted in the next commit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-09-28 12:06:11 +02:00
Ilya Dryomov
0b68a8b4c0 qa/suites/krbd: disable POOL_APP_NOT_ENABLED health check
... same as for rbd suite.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-09-21 23:36:07 +02:00
Ilya Dryomov
acb270a3dd qa/workunits/rbd: make continuous export-diff test actually work
The current version is pretty useless:

- "rbd bench" writes the same byte (0xff) over and over again, so
  almost all checksumming is in vain
- snapshots are taken in a steady state (i.e. not under I/O), so no
  race conditions can get exposed
- even with these caveats, it's not wired up into the suite

Redo this workunit to be a reliable reproducer for the issue fixed
in the previous commit and wire it up for both krbd and rbd-nbd.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-06-20 22:14:39 +02:00
Nitzan Mordechai
4c4967dbb9 qa/suites: change all osd objectstore filestore
Removing and changing all suites to no longer use filestore

Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>

ceph_volume: remove all filestore tests suites
Since filestore removed, no need to test it

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2023-02-12 06:11:29 +00:00
Christopher Hoffman
19d46b9181 qa/suites/krbd: add rbd_default_map_options override coverage
Add coverage to test precedence, override, and option merge on rbd map.

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
2022-02-18 17:19:45 +01:00
Ilya Dryomov
7f391c5688 qa/suites/krbd: rename rxbounce subsuite
A new job that doesn't want ms_mode to be set underneath it is about to
be added.  Rename rxbounce to ms_modeless to make this purpose obvious.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-02-18 17:19:45 +01:00
Ilya Dryomov
bad21fa497
Merge pull request #44842 from idryomov/wip-krbd-rxbounce-option
rbd: recognize rxbounce map option

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
2022-02-06 20:37:31 +01:00
Ilya Dryomov
fbf8c1d68b qa/suites/krbd: add legacy+rxbounce and crc+rxbounce coverage
For basic, rbd and rbd-nomount subsuites, replace legacy and crc
facets with "legacy or legacy+rxbounce" and "crc or crc+rxbounce"
facets (chosen at random).

For fsx, singleton and thrash subsuites, add legacy+rxbounce and
crc+rxbounce facets and drop prefer-crc facet.  The expected behaviour
of the latter depends on cluster configuration and should be tested
separately.

The total number of jobs remains the same.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-02-04 19:04:38 +01:00
Ilya Dryomov
95d30b534e qa: krbd rxbounce test
Lives in its own directory since ms_mode doesn't need to be permuted
here.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-02-04 19:04:37 +01:00
Patrick Donnelly
1f714da814
qa: fix or add missing .qa links
Using this command:

    find qa/suites/ -type d -execdir ln -sfT ../.qa/ {}/.qa \;

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2022-02-03 10:08:30 -05:00
Ilya Dryomov
4027eb864e qa/suites/krbd: don't require CEPHX_V2 for unmap subsuite
Starting with pacific, CEPHX_V2 is required by default but
pre-single-major.yaml kernel doesn't support it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-03 11:16:58 +02:00
Ilya Dryomov
37d56e1354 qa/suites/krbd: bump scratch image size to 15G
Allow generic/038 and generic/048 to run.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-02-27 15:25:39 +01:00
Ilya Dryomov
d2bdf0ac43 qa/suites/krbd: exclude ext4/002
ext4/002 exercises obsolete EXT4_EOFBLOCKS_FL feature that was removed
in kernel 5.7 and therefore always fails.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-02-27 15:25:39 +01:00
Ilya Dryomov
65948736a4 qa/suites/krbd: add msgr2 modes to most subsuites
basic, rbd and rbd-nomount subsuites are expanded to run with each
of ms_mode=legacy, ms_mode=crc and ms_mode=secure.  This increases
the total number of jobs in the suite from 100 to 220.

fsx, singleton and thrash subsuites choose ms_mode at random (from
the above plus ms_mode=prefer-crc).

unmap and wac subsuites remain msgr1-only.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-01-25 21:17:44 +01:00
Ilya Dryomov
5adfc15b87 qa: krbd_stable_pages_required.sh: move to stable_writes attribute
bdi/stable_pages_required attribute was deprecated in 5.10 and now
always returns 0.  The replacement is queue/stable_writes.  (It is
also writeable, so we can simplify these test cases somewhat in the
future.)

Fixes: https://tracker.ceph.com/issues/48232
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-17 13:41:34 +01:00
Ilya Dryomov
6827bbbcfb
Merge pull request #36927 from idryomov/wip-krbd-noudev
krbd: optionally skip waiting for udev events

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Sébastien Han <seb@redhat.com>
2020-09-21 19:39:30 +02:00
Ilya Dryomov
d2884adb15 qa: add test for mapping and unmapping from a network namespace
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-09-21 13:54:08 +02:00
Ilya Dryomov
7ccd2c0dce qa: add test for krbd symlinks created by udev
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-09-09 09:21:54 +02:00
Neha Ojha
01fb7e7f7b qa/suites/krbd/thrash: log-whitelist -> log-ignorelist
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-08-24 19:53:08 +00:00
Sage Weil
2ee9365d0b qa: log-whitelist -> log-ignorelist
Signed-off-by: Sage Weil <sage@newdream.net>
2020-08-24 19:53:08 +00:00
Ilya Dryomov
d15e0cad1a qa/suites/krbd: turn on balanced reads for the fsx subsuite
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-07-10 20:12:53 +02:00
Kefu Chai
f1072fcadc qa/suites/krbd: whitelist MON_DOWN health warning
see also

- 93de19adcf
- 608e002195

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-06-11 08:36:45 +08:00
Ilya Dryomov
2aefc097f9 qa: rbd_workunit_suites_fsx: install build dependencies
xfstests.git repo at git.ceph.com got updated and we are checking
out a newer version since commit 92c19067de ("qa: update xfstests
version").  It requires libtool and additional build dependencies.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-02-19 13:01:59 +01:00
Ilya Dryomov
cff2e49ff0 qa/suites/krbd: fsx with object-map and fast-diff
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-01-06 14:52:52 +01:00
Ilya Dryomov
80528fcb6c qa: add krbd_get_features.t test
Run it together with krbd_blkroset.t.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-11-21 14:40:41 +01:00
Ilya Dryomov
5011cc926c qa/suites/krbd: run unmap subsuite with msgr1 only
pre-single-major.yaml kernel doesn't have any of the monitor client
fixes that came in 4.6.  If the connection is closed, it closes the
session and retries only after 10 seconds.  On top of that, there is
nothing to prevent it from picking the same monitor when reconnecting.
This means that when given both v1 and v2 ports (which look like two
different monitors), it is susceptible to mount_timeout (60 seconds):

  $ sudo rbd map img
  rbd: sysfs write failed
  In some cases useful info is found in syslog - try "dmesg | tail".
  rbd: map failed: (5) Input/output error

  [  822.242313] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  832.265494] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  842.296175] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  852.326924] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  862.357611] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  872.388373] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)
  [  882.676136] libceph: mon0 172.21.15.132:3300 socket closed (con state CONNECTING)

Unlike newer kernels that return ETIMEDOUT, it returns EIO.

Newer kernels are much more aggressive about retries and will pick
a different monitor when reconnecting, hence they are always able to
establish the session in time.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-30 19:51:55 +01:00
Ilya Dryomov
b7a0e2adcb qa: add script to stress udev_enumerate_scan_devices()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-25 22:05:38 +02:00
Ilya Dryomov
a80185d02c
Merge pull request #30965 from idryomov/wip-krbd-udev-socket-overrun
krbd: avoid udev netlink socket overrun

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-10-21 11:00:46 +02:00
Ilya Dryomov
898c113f93 qa: add script to test udev event reaping
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-18 21:56:30 +02:00
Ilya Dryomov
286bdbfe24 krbd: modprobe before calling build_map_buf()
Otherwise add_key() in set_kernel_secret() fails as if running against
an ancient kernel and we fall back to secret= in options for the first
image being mapped on the machine.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-17 16:52:43 +02:00
Sage Weil
71d74aa8c6 qa: more tries for mon tell when injecting msgr failures
With failure injection the default 2 tries isn't quite enough

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-11 14:16:42 -05:00
David Zafman
ded58ef91d test: Ignore OSD_SLOW_PING_TIME* if injecting socket failures
Fixes: https://tracker.ceph.com/issues/41743

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-10-03 09:09:10 -07:00
Ilya Dryomov
81becbdc68 qa: add script to test how libceph handles huge osdmaps
That code will also handle moderately-sized osdmaps when the memory is
fragmented.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-09-13 19:21:54 +02:00
Ilya Dryomov
9c736f57ee qa: krbd_wac.sh: add lvm test case
The script isn't generic anymore, move it to the rbd directory.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-29 11:20:30 +01:00
Ilya Dryomov
a337cc58cd qa: add krbd_discard_granularity.t test
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-19 11:49:19 +01:00
Ilya Dryomov
481b6c2146 qa: update and rename krbd_discard_1b.t
Passing 1 for alloc_size is no longer allowed.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-18 19:17:44 +01:00
Ilya Dryomov
7615012224 Merge PR #26858 into master
* refs/pull/26858/head:
	qa: krbd deep-flatten test
	qa/suites/krbd: enable deep-flatten feature

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-11 14:38:01 +01:00
Ilya Dryomov
6892da1c0b qa: krbd deep-flatten test
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-08 18:14:37 +01:00
Ilya Dryomov
7ab3153902 qa/suites/krbd/wac: bluestore snippet is placed incorrectly
Instead of generating three tests, each with bluestore-bitmap.yaml, it
generates four tests: one consisting of just bluestore-bitmap.yaml and
the other three without any trace of bluestore.  This was introduced in
commit 711df71790 ("qa: objectstore snippets for krbd").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05 23:07:27 +01:00
Ilya Dryomov
b550968d8a qa/suites/krbd: enable deep-flatten feature
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05 10:10:34 +01:00
Ilya Dryomov
7fdb879004 qa: krbd namespaces test
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-02-08 15:29:20 +01:00
Ilya Dryomov
711df71790 qa: objectstore snippets for krbd
krbd was being tested with filestore, up until recently when the
default for osd_objectstore was changed to bluestore.  This broke
rbd_simple_big.yaml because bluestore_block_size defaults to 10G.
Pick up the sepia setting of 90G from bluestore-bitmap.yaml.

Run fsx subsuite with both filestore and bluestore.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-02-05 11:23:42 +01:00
Ilya Dryomov
04f5b343f9 qa: update krbd tests for zeroout
Discard no longer guarantees zeroing, use BLKZEROOUT and "fallocate -z"
instead (blkdiscard(8) in xenial doesn't support -z).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-02-03 10:57:07 +01:00
Ilya Dryomov
031bbea739 qa: krbd discard with alloc_size vs zeroout tests
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-02-02 18:33:32 +01:00
Ilya Dryomov
870e42ac6a qa/suites/krbd: more fsx tests
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-10-01 16:48:47 +02:00
Sage Weil
09ee3f3538 Merge PR #20469 into master
* refs/pull/20469/head:
	osd/PG: remove warn on delete+merge race
	osd: base project_pg_history on is_new_interval
	osd: make project_pg_history handle concurrent osdmap publish
	osd: handle pg delete vs merge race
	osd/PG: do not purge strays in premerge state
	doc/rados/operations/placement-groups: a few minor corrections
	doc/man/8/ceph: drop enumeration of pg states
	doc/dev/placement-groups: drop old 'splitting' reference
	osd: wait for laggy pgs without osd_lock in handle_osd_map
	osd: drain peering wq in start_boot, not _committed_maps
	osd: kick split children
	osd: no osd_lock for finish_splits
	osd/osd_types: remove is_split assert
	ceph-objectstore-tool: prevent import of pg that has since merged
	qa/suites: test pg merging
	qa/tasks/thrashosds: support merging pgs too
	mon/OSDMonitor: mon_inject_pg_merge_bounce_probability
	doc/rados/operations/placement-groups: update to describe pg_num reductions too
	doc/rados/operations: remove reference to lpgs
	osd: implement pg merge
	osd/PG: implement merge_from
	osdc/Objecter: resend ops on pg merge
	osd: collect and record pg_num changes by pool
	osd: make load_pgs remove message more accurate
	osd/osd_types: pg_t: add is_merge_target()
	osd/osd_types: pg_t::is_merge -> is_merge_source
	osd/osd_types: adding or substracting invalid stats -> invalid stats
	osd/PG: clear_ready_to_merge on_shutdown (or final merge source prep)
	osd: debug pending_creates_from_osd cleanup, don't use cbegin
	ceph-objectstore-tool: debug intervals update
	mgr/ClusterState: discard pg updates for pgs >= pg_num
	mon/OSDMonitor: fix long line
	mon/OSDMonitor: move pool created check into caller
	mon/OSDMonitor: adjust pgp_num_target down along with pg_num_target as needed
	mon/OSDMonitor: add mon_osd_max_initial_pgs to cap initial pool pgs
	osd/OSDMap: set pg[p]_num_target in build_simple*() methods
	mon/PGMap: adjust SMALLER_PGP_NUM warning to use *_target values
	mon/OSDMonitor: set CREATING flag for force-create-pg
	mon/OSDMonitor: start sending new-style pg_create2 messages
	mon/OSDMonitor: set last_force_resend_prenautilus for pg_num_pending changes
	osd: ignore pg creates when pool FLAG_CREATING is not set
	mgr: do not adjust pg_num until FLAG_CREATING removed from pool
	mon/OSDMonitor: add FLAG_CREATING on upgrade if pools still creating
	mon/OSDMonitor: prevent FLAG_CREATING from getting set pre-nautilus
	mon/OSDMonitor: disallow pg_num changes while CREATING flag is set
	mon/OSDMonitor: set POOL_CREATING flag until initial pool pgs are created
	osd/osd_types: add pg_pool_t FLAG_POOL_CREATING
	osd/osd_types: introduce last_force_resend_prenautilus
	osd/PGLog: merge_from helper
	osd: no cache agent or snap trimming during premerge
	osd: notify mon when pending PGs are ready to merge
	mgr: add simple controller to adjust pg[p]_num_actual
	mon/OSDMonitor: MOSDPGReadyToMerge to complete a pg_num change
	mon/OSDMonitor: allow pg_num to adjusted up or down via pg[p]_num_target
	osd/osd_types: make pg merge an interval boundary
	osd/osd_types: add pg_t::is_merge() method
	osd/osd_types: add pg_num_pending to pg_pool_t
	osd: allow multiple threads to block on wait_min_pg_epoch
	osd: restructure advance_pg() call mechanism
	mon/PGMap: prune merged pgs
	mon/PGMap: track pgs by state for each pool
	osd/SnapMapper: allow split_bits to decrease (merge)
	os/bluestore: fix osr_drain before merge
	os/bluestore: allow reuse of osr from existing collection
	os/filestore: (re)implement merge
	os/filestore: add _merge_collections post-check
	os: implement merge_collection
	os/ObjectStore: add merge_collection operation to Transaction
2018-09-07 15:55:21 -05:00
Sage Weil
44de03d5e6 qa/suites: test pg merging
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 12:09:05 -05:00
Ilya Dryomov
592f566b4e qa/tasks/cram: tasks now must live in the repository
Commit 0d8887652d ("qa/tasks/cram: use suite_repo repository for all
cram jobs") removed hardcoded git.ceph.com links, but as it turned out
it is still used for nightlies.  There is no good way to accommodate
the different URL schemes, so let's get rid of URLs altogether.

Fixes: https://tracker.ceph.com/issues/27211
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-09-06 22:32:39 +02:00
Ilya Dryomov
0d8887652d qa/tasks/cram: use suite_repo repository for all cram jobs
Currently git.ceph.com is hardcoded for all cram jobs.  Testing
modifications is a pain: one needs to push to either ceph/ceph.git or
ceph/ceph-ci.git (depending on where the ceph branch is at, triggering
unnecessary builds in the latter case) and wait for the mirror to sync.
Runs scheduled against branches in developer's forks fail.

Move away from git.ceph.com to allow mixing branches and repositories,
similar to workunits.

Fixes: https://tracker.ceph.com/issues/27211
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-09-03 22:07:20 +02:00