Commit Graph

135 Commits

Author SHA1 Message Date
NitzanMordhai
c916f568aa standalone/osd: Test adjust with new trimming function
Change the number of dups trimmied according to the new loop.

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2022-08-24 08:19:18 +00:00
NitzanMordhai
2aecf0d7b6 standalone/osd: Test log_dups_size output from pg query
Add to the current test of log_size the log_dups_size output test

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2022-06-30 08:10:29 +00:00
Gabriel BenHanokh
a39b1f3cf7 tools/ceph-bluestore-tool: Fix bluefs-bdev-expand command
Update allocation file when we expand-device
Add the expended space to the allocator and then force an update to the allocation file

There is also a new standalone test case for expand

Fixes: https://tracker.ceph.com/issues/53699
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
2022-01-12 18:07:59 +02:00
Igor Fedotov
efb67445c2 qa/osd-bluefs-volume-ops: retry data writing if spillover hasn't
happened.

Fixes: https://tracker.ceph.com/issues/52676
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2021-11-02 17:26:39 +03:00
Igor Fedotov
0b0f8ef12f qa/osd-bluefs-volume-ops: reproduce bluefs migrate bug
Reproduces: https://tracker.ceph.com/issues/40434
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2021-08-31 16:23:22 +03:00
Sridhar Seshasayee
2c577040cb qa/standalone/osd: Modify osd tests for mclock scheduler
Modified test cases:
1. osd-recovery-prio.sh:
   Set osd_op_queue = wpq for all tests since mclock
   doesn't consider recovery priority as part of its
   scheduling algorithm.

2. osd-recovery-stats.sh:
   a. TEST_recovery_undersized():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for recovery timeout to 300 secs.

3. osd-rep-recov-eio.sh:
   a. TEST_rep_backfill_unfound():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for backfill_unfound to 360 secs.

4. repeer-on-acting-back.sh:
   a. TEST_repeer_on_down_act():
     - Set osd_mclock_profile to high_recovery_ops profile.
       (To improve the test duration)

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
a96c34f0ee qa/standalone: Add missing teardowns to a subset of osd tests
The following files and tests in them did not teardown the
cluster after a test completed.

1. osd/osd-force-create.sh
2. osd/osd-reuse-id.sh
3. osd/pg-split-merge.sh

This wouldn't cause issues if the tests are run individually. But when
running all the tests in the files mentioned above, it could introduce
unexpected test failures down the line. For e.g., multiple tests may
create pools with same name and if they are not cleaned up properly, this
could result in unexpected failures in a subsequent test.

Fixes: https://tracker.ceph.com/issues/51580
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-08 13:28:31 +05:30
Sage Weil
0f65e5cffa qa/standalone: split osd/ into 2 directories
The whole osd/ directory takes 3 hours to run.  Of that, about half is
osd-backfill*:

2021-04-05T20:38:55.932 INFO:tasks.workunit:Running workunit osd/osd-backfill-prio.sh...
2021-04-05T20:47:27.184 INFO:tasks.workunit:Running workunit osd/osd-backfill-recovery-log.sh...
2021-04-05T20:55:59.497 INFO:tasks.workunit:Running workunit osd/osd-backfill-space.sh...
2021-04-05T21:48:47.549 INFO:tasks.workunit:Running workunit osd/osd-backfill-stats.sh...
2021-04-05T22:17:09.197 INFO:tasks.workunit:Running workunit osd/osd-bench.sh...

Signed-off-by: Sage Weil <sage@newdream.net>
2021-04-12 09:59:17 -05:00
David Zafman
4814648155 test: osd-recovery-prio.sh replace sleep with wait for both PGs
recovering

fixes: https://tracker.ceph.com/issues/48842

Signed-off-by: David Zafman <dzafman@redhat.com>
2021-01-11 17:30:00 -08:00
David Zafman
ef47a3e708 test: set mon_allow_pool_size_one for consistency with original test intention
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-03 21:49:00 +00:00
David Zafman
3ba7ebd3e2 test: Avoid races by waiting for PGs go clean before query
Fixes: https://tracker.ceph.com/issues/46405

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-10-01 19:43:57 +00:00
David Zafman
b20a277f05 test: Inconsequential change to get object names as desired
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-09-29 18:01:24 +00:00
Kefu Chai
4f6443737e
Merge pull request #30838 from ifed01/wip-ifed-single-alloc
os/bluestore: use single allocator for shared bluestore/bluefs device

Reviewed-by: Sage Weil <sage@redhat.com>
2020-08-03 18:00:16 +08:00
Igor Fedotov
9a8f1ae492 os/bluestore: fix bluefs migrate/expand to match single allocator.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2020-07-31 15:36:47 +03:00
Dan van der Ster
b550112dba qa/standalone/osd: add bad-inc-map.sh
Test that the osd doesn't crash when it gets a bad incremental osdmap.

Related-to: https://tracker.ceph.com/issues/46443
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
2020-07-28 23:15:42 +02:00
David Zafman
661996d434 mgr: Warn when too many reads are repaired on an OSD
Include test case
Configurable by setting mon_osd_warn_num_repaired (default 10)
Ignore new health warning with random eio injection test

Fixes: https://tracker.ceph.com/issues/41564

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-06-16 17:45:27 -07:00
David Zafman
92f970cbed test: osd-backfill-stats.sh use nobackfill to avoid races in remaining test
Fixes: https://tracker.ceph.com/issues/44314

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-06-05 17:48:10 -07:00
Sage Weil
04e0b9c2f8 Merge PR #34126 into master
* refs/pull/34126/head:
	qa/*/osd-backfill-recovery-log.sh: flush_pg_stats before checking log length

Reviewed-by: Sage Weil <sage@redhat.com>
2020-03-23 13:55:16 -05:00
Neha
cfebec1b12 qa/*/osd-backfill-recovery-log.sh: flush_pg_stats before checking log length
It is possible for the pg dump to not be the latest when we check for newprimary
in _common_test(). This is because mgr_stats_period is 5 seconds, and we may not
have fetched the latest stats just yet. This causes the test to look at the same
stats before and after wait_for_clean.

Fixes: https://tracker.ceph.com/issues/43807 (2)
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-03-23 15:37:12 +00:00
Kefu Chai
b0dca75a59
Merge pull request #34056 from xiexingguo/wip-44662
qa/*/osd-markdown.sh: propagate map to osd before testing its reaction

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-03-21 14:27:51 +08:00
xie xingguo
afdff0cd3f qa/*/osd-markdown.sh: propagate map to osd before testing its reaction
Mon might fail to share the newest map with any of up osds, e.g.,
due to an injected broken pipe. Since we don't have any client
activities during the osd-markdown tests, osds might be unaware of
the map changes made through CLI. Make sure osds have pulled the
newest map down before we can test its reaction correctly.

Fixes: https://tracker.ceph.com/issues/44662
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2020-03-19 18:17:28 +08:00
Neha
6edd1cb686 qa/standalone/osd/osd-backfill-stats.sh: get_latest_osdmap to propagate map change
Fixes: https://tracker.ceph.com/issues/44518
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-03-18 22:57:41 +00:00
Deepika Upadhyay
21508bd9dd mon/OSDMonitor: add flag --yes-i-really-mean-it for setting pool size 1
Adds option `mon_allow_pool_size_one` which will be disabled by default
to ensure pools are not configured without replicas.
If the user still wants to use pool size 1, they will have to change the
value of `mon_allow_pool_size_one` to true and then have to pass flag
`--yes-i-really-mean-it` to cli command:

Example:
`ceph osd pool test set size 1 --yes-i-really-mean-it`

Fixes: https://tracker.ceph.com/issues/44025
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
2020-03-09 23:27:36 +05:30
xie xingguo
023524a26d osd/PeeringState: restart peering on any previous down acting member coming back
One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
  pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
  hence leaving the corresponding pg pinning undersized for a long time until all backfill
  targets completes

It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.

The right way to achieve the above goal is for:

	boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)

to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
2020-02-21 17:52:52 +08:00
Sage Weil
f10cc22c60 Merge PR #32961 into master
* refs/pull/32961/head:
	qa/standalone/osd/osd-bench: debug bluestore

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-30 10:42:17 -06:00
Sage Weil
b99e506a3f qa/standalone/osd/osd-bench: debug bluestore
Looking for https://tracker.ceph.com/issues/43888

Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 07:43:41 -06:00
David Zafman
e18519ad09 test: Update pg log test for new trimming behavior
Fixes: https://tracker.ceph.com/issues/43864

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-28 15:23:45 -08:00
Neha
b20817795a qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_2
Fixes: https://tracker.ceph.com/issues/43807
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-01-24 22:42:04 +00:00
Neha
994698277b qa/standalone/osd/osd-backfill-recovery-log.sh: fix TEST_backfill_log_1
Fixes: https://tracker.ceph.com/issues/43807
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-01-24 22:20:21 +00:00
David Zafman
9f7aabbe9f test: Fix wait_for_state() to wait for a PG to get into a state
To avoid confusion fix function names in osd-backfill-space.sh for how
they actually work.

Fixes: https://tracker.ceph.com/issues/43592

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-01-13 18:39:38 -08:00
David Zafman
676d882649 test: Improve races by using kill_daemons which waits for OSDs terminate
osd-backfill-space.sh: More sleep time to make sure the backfill gets started

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-06 19:44:06 -08:00
David Zafman
43f6218993 test: Use activate_osd() when restarting OSDs
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-12-05 15:13:31 -08:00
Sage Weil
8994a65242 qa/standalone/osd/divergent-priors: add reproducer for bug 41816
Reproducer for https://tracker.ceph.com/issues/41816

Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-21 10:09:15 -05:00
David Zafman
b98950e707 osd: Rename dump_reservations to dump_recovery_reservations
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-09-10 13:32:29 -07:00
David Zafman
fa698e18e1 mon: Improve health status for backfill_toofull and recovery_toofull
Treat backfull_toofull as a warning condition because it can resolve itself.
Includes test case for PG_BACKFILL_FULL
Includes test case for recovery_toofull / PG_RECOVERY_FULL

Fixes: https://tracker.ceph.com/issues/39555

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-06-20 02:22:01 +00:00
David Zafman
7959159e83 test: Adding standalone test of log copy handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-05-10 15:31:51 -07:00
sjust@redhat.com
252d5c20cf osd/: move stat updates and publishing to PeeringState
Signed-off-by: Samuel Just <sjust@redhat.com>
2019-05-01 11:22:24 -07:00
David Zafman
66b041fa4a
Merge pull request #27769 from dzafman/wip-39333
osd-backfill-space.sh test failed in TEST_backfill_multi_partial()

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-26 11:55:04 -07:00
David Zafman
9931023457 test: osd-backfill-spsace.sh doesn't matter which PG wins the race
Fixes: http://tracker.ceph.com/issues/39333

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-26 10:11:00 -07:00
David Zafman
39cc14bdc1
Merge pull request #27503 from dzafman/wip-39099
osd: Give recovery for inactive PGs a higher priority

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-25 15:06:56 -07:00
David Zafman
444aa9f9fe osd, mon: New pool recovery priority range -10 to 10
Use OSD_POOL_PRIORITY_MAX and OSD_POOL_PRIORITY_MIN constants
Scale legacy priorities if exceeds maximum

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
3a234164d0
Merge pull request #27279 from dzafman/wip-divergent
Improvements to standalone tests

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-24 10:58:11 -07:00
David Zafman
7e77898001 test: Divergent testing of _merge_object_divergent_entries() cases
Case 1: A more recent update exists
Case 2: The first entry in the divergent sequence is a create
Case 3  NOT TESTED - Ohject currently missing
Case 4: We can rollback all of the entries
Case 5: We cannot rollback at least 1 of the entries

Support starting OSDs even when "noup" is set (don't wait for up).
Move create_ec_pool() to ceph-helpers.sh

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-22 18:50:24 -07:00
Sage Weil
755e8c4ef2 Merge PR #27595 into master
* refs/pull/27595/head:
	osd: add 'ceph osd stop <osd.nnn>' command

Reviewed-by: Sage Weil <sage@redhat.com>
2019-04-20 08:52:01 -05:00
xie xingguo
5dbae13ce0 osd: add 'ceph osd stop <osd.nnn>' command
stop command can be used to force stopping a specified osd daemon, e.g.,
you don't have to pre-figure out where it located.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-18 13:55:02 +08:00
Sage Weil
dc97651cbd Merge PR #27499 into master
* refs/pull/27499/head:
	qa/standalone/osd/osd-markdown: fix dup command disabling

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-12 06:54:58 -05:00
Sage Weil
f7216d0b2c qa/standalone/osd/osd-markdown: fix dup command disabling
The ceph cli tool checks for the presence of the variable, not its value.

Fixes: http://tracker.ceph.com/issues/38359
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-10 16:44:38 -05:00
David Zafman
69fa515c95 test: Make most tests use default objectstore bluestore
Change run_osd() to default objectstore bluestore
Use run_osd_filestore() to use the non-default objectstore
Fix inject_eio to handle any objectstore if config prefixed with type

Remaining tests using filestore:
	osd-pool-create.sh TEST_pool_create_rep_expected_num_objects
		Test filestore directory creation
	qa/standalone/osd/osd-dup.sh TEST_filestore_to_bluestore
		Obvious
	qa/standalone/osd/osd-rep-recov-eio.sh TEST_rep_read_unfound
		Requires data digest in object info
	qa/standalone/scrub/osd-scrub-repair.sh multiple tests
		Erasure code pools append mode for filestore is tested
	qa/standalone/special/ceph_objectstore_tool.py
		Test code verifies COT by directly examining filestore contents

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-10 08:55:04 -07:00
xie xingguo
6a8aedc107 qa: add new test case for pulling error
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-04 11:04:43 +08:00
David Zafman
11f072fee1 Add checking of num_shards_repaired in osd stats
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-04 11:04:42 +08:00