a0b453ad33 added the wait state, which can
make PGs stay in active+clean+wait for a while instead of going into
active+clean directly. As far as TEST_auto_repair_bluestore_failed is
concerned, we only care about the repair state being cleared.
Fixes: https://tracker.ceph.com/issues/45075
Signed-off-by: Neha Ojha <nojha@redhat.com>
to address the test failures like
```
2020-04-07T15:44:58.693 INFO:tasks.workunit.client.0.smithi049.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:498: TEST_auto_repair_bluestore_failed: ceph pg dump
pgs
2020-04-07T15:44:58.694 INFO:tasks.workunit.client.0.smithi049.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:498: TEST_auto_repair_bluestore_failed: pgid
2020-04-07T15:44:58.694 INFO:tasks.workunit.client.0.smithi049.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh: line 498: pgid: command not found
```
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/33809/head:
qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
qa/standalone/scrub/osd-scrub-test: wait longer for update
Reviewed-by: David Zafman <dzafman@redhat.com>
This is cleaner. All users are currently standalone tests; updated.
It also means that *all* commands that have a name=pgid arg are pg tell
commands.
Signed-off-by: Sage Weil <sage@redhat.com>
This is similar to how recovery reservations are split between
local and remote.
It was the case that scrubs_pending was used for reservations at
the replicas as well as at the primary while requesting reservations
from the replicas. There was no need for scrubs_pending to turn
into scrubs_active at the primary as nothing treated that value
as special. scrubber.active = true when scrubbing is
actually going.
Now scurbber.local_reserved indicates scrubs_local incremented
Now scrubber.remote_reserved indicates scrubs_remote incremented
Fixes: https://tracker.ceph.com/issues/41669
Signed-off-by: David Zafman <dzafman@redhat.com>
osd: scrub error on big objects; make bluestore refuse to start on big objects
Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
1. always take osd_scrub_sleep for manually initiated
scrubs
2. when scrub_time_permit() return true for scheduled
ones, the existing osd_scrub_sleep is used
3. when scrub_time_permit() return false for scheduled
ones, there may be 2 scenarios
3.1 if osd_scrub_extended_sleep <= osd_scrub_sleep,
let's take osd_scrub_sleep
3.2 otherwise, let's take osd_scrub_extended_sleep
Fixes: http://tracker.ceph.com/issues/40955
Signed-off-by: Jeegn Chen <jeegnchen@tencent.com>
osd_repair_during_recovery=true allow explicitly requested reqair
to be scheduled on OSDs with active recovering.
Fixes: http://tracker.ceph.com/issues/40620
Signed-off-by: Jeegn Chen <jeegnchen@tencent.com>
We no longer have a snaps field with real values, so dumping this as a
"snap_context" is silly. Instead, just dump the seq.
Adjust qa/standalone/scrub/osd-scrub-repair.sh accordingly.
Signed-off-by: Sage Weil <sage@redhat.com>
Case 1: A more recent update exists
Case 2: The first entry in the divergent sequence is a create
Case 3 NOT TESTED - Ohject currently missing
Case 4: We can rollback all of the entries
Case 5: We cannot rollback at least 1 of the entries
Support starting OSDs even when "noup" is set (don't wait for up).
Move create_ec_pool() to ceph-helpers.sh
Fixes: https://tracker.ceph.com/issues/39162
Signed-off-by: David Zafman <dzafman@redhat.com>
Change run_osd() to default objectstore bluestore
Use run_osd_filestore() to use the non-default objectstore
Fix inject_eio to handle any objectstore if config prefixed with type
Remaining tests using filestore:
osd-pool-create.sh TEST_pool_create_rep_expected_num_objects
Test filestore directory creation
qa/standalone/osd/osd-dup.sh TEST_filestore_to_bluestore
Obvious
qa/standalone/osd/osd-rep-recov-eio.sh TEST_rep_read_unfound
Requires data digest in object info
qa/standalone/scrub/osd-scrub-repair.sh multiple tests
Erasure code pools append mode for filestore is tested
qa/standalone/special/ceph_objectstore_tool.py
Test code verifies COT by directly examining filestore contents
Fixes: https://tracker.ceph.com/issues/39162
Signed-off-by: David Zafman <dzafman@redhat.com>
Leave repair pg state on until recovery finishes or a new scrub starts
Fixes: http://tracker.ceph.com/issues/38616
Signed-off-by: David Zafman <dzafman@redhat.com>
Allow auto_repair for replicated bluestore pools
Regular scrub within auto repair parameters will trigger deep scrub
New state failed_repair if PG repair attempt could not fix everything
Set failed_repair if not possible to repair anything
Fixes: http://tracker.ceph.com/issues/38616
Signed-off-by: David Zafman <dzafman@redhat.com>
Fix for argument handling of create_ec_pool()
Always pass a value for allow_overwrites for consistency
Caused by: 3ca750d41d
Signed-off-by: David Zafman <dzafman@redhat.com>
It's now "1/2 unfound":
1/2 objects unfound (50.000%)
..presumably due to the rbd pool init creating the rbd_directory.
Signed-off-by: Sage Weil <sage@redhat.com>
Add trigger_deep_scrub osd command for testing
Publish stats when trigger_scrub/trigger_deep_scrub is used for testing
Add optional argument to trigger_scrub/trigger_deep_scrub
for amount of extra time to change last scrub stamps
Signed-off-by: David Zafman <dzafman@redhat.com>
Scrub testing requires an orderly control of scrubbing. Most but not
all the time, the duplicate scrub request is ignored because the first
request hasn't finished. Teuthology enables this environment variable
in the workunit handling.
Fixes: https://tracker.ceph.com/issues/36525
Signed-off-by: David Zafman <dzafman@redhat.com>
Use --rmtype snapmap with new obj16 to remove snapmap only, check for repair message
Use --rmtype nosnapmap to remove obj5 while leaving snapmap behind
Signed-off-by: David Zafman <dzafman@redhat.com>