Commit Graph

145 Commits

Author SHA1 Message Date
David Zafman
444aa9f9fe osd, mon: New pool recovery priority range -10 to 10
Use OSD_POOL_PRIORITY_MAX and OSD_POOL_PRIORITY_MIN constants
Scale legacy priorities if exceeds maximum

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
3a234164d0
Merge pull request #27279 from dzafman/wip-divergent
Improvements to standalone tests

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-24 10:58:11 -07:00
David Zafman
7e77898001 test: Divergent testing of _merge_object_divergent_entries() cases
Case 1: A more recent update exists
Case 2: The first entry in the divergent sequence is a create
Case 3  NOT TESTED - Ohject currently missing
Case 4: We can rollback all of the entries
Case 5: We cannot rollback at least 1 of the entries

Support starting OSDs even when "noup" is set (don't wait for up).
Move create_ec_pool() to ceph-helpers.sh

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-22 18:50:24 -07:00
Sage Weil
755e8c4ef2 Merge PR #27595 into master
* refs/pull/27595/head:
	osd: add 'ceph osd stop <osd.nnn>' command

Reviewed-by: Sage Weil <sage@redhat.com>
2019-04-20 08:52:01 -05:00
xie xingguo
5dbae13ce0 osd: add 'ceph osd stop <osd.nnn>' command
stop command can be used to force stopping a specified osd daemon, e.g.,
you don't have to pre-figure out where it located.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-18 13:55:02 +08:00
Sage Weil
dc97651cbd Merge PR #27499 into master
* refs/pull/27499/head:
	qa/standalone/osd/osd-markdown: fix dup command disabling

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-12 06:54:58 -05:00
Sage Weil
f7216d0b2c qa/standalone/osd/osd-markdown: fix dup command disabling
The ceph cli tool checks for the presence of the variable, not its value.

Fixes: http://tracker.ceph.com/issues/38359
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-10 16:44:38 -05:00
David Zafman
69fa515c95 test: Make most tests use default objectstore bluestore
Change run_osd() to default objectstore bluestore
Use run_osd_filestore() to use the non-default objectstore
Fix inject_eio to handle any objectstore if config prefixed with type

Remaining tests using filestore:
	osd-pool-create.sh TEST_pool_create_rep_expected_num_objects
		Test filestore directory creation
	qa/standalone/osd/osd-dup.sh TEST_filestore_to_bluestore
		Obvious
	qa/standalone/osd/osd-rep-recov-eio.sh TEST_rep_read_unfound
		Requires data digest in object info
	qa/standalone/scrub/osd-scrub-repair.sh multiple tests
		Erasure code pools append mode for filestore is tested
	qa/standalone/special/ceph_objectstore_tool.py
		Test code verifies COT by directly examining filestore contents

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-10 08:55:04 -07:00
xie xingguo
6a8aedc107 qa: add new test case for pulling error
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-04 11:04:43 +08:00
David Zafman
11f072fee1 Add checking of num_shards_repaired in osd stats
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-04 11:04:42 +08:00
Sage Weil
420edba243 Merge PR #27169 into master
* refs/pull/27169/head:
	common/config: parse --default-$option as a default value

Reviewed-by: Sébastien Han <seb@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-27 09:48:33 -05:00
Sage Weil
fdd2000631 common/config: parse --default-$option as a default value
Sometimes it is useful to specify an alternative default value for an
option via the command line such that it has a lower priority than the
mon config database, config file, the rest of the command line, or the
environment.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-26 11:00:27 -05:00
David Zafman
d2ca3d2feb osd: Track num_objects_repaired in pg stats 2(3)
Leave repair pg state on until recovery finishes or a new scrub starts

Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-25 16:03:36 -07:00
Sage Weil
be1187575b Merge PR #27021 into master
* refs/pull/27021/head:
	msg: remove XioMessenger
	qa/suites/rados/thrash-old-clients: add nautilus
	qa/suites/rados/thrash-old-clients: add mimic v1 variant
	qa/suites/rados/thrash-old-clients: add mimic
	qa/suites/rados/thrash-old-clients: collapse msgr and client choice
	qa: remove simplemessenger tests
	ceph_test_msgr: remove simple
	msg: remove SimpleMessenger

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-03-22 04:42:30 -05:00
Sage Weil
28b4392a71 qa: remove simplemessenger tests
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-20 06:10:25 -05:00
Sage Weil
fb915c4805 osd/PG: invalidate PG if merging with unexpected version
If the source or target PG version is 0'0, we may silently take the max
of the source and target and still leave the PG complete.  This
specifically can happen with an empty PG, as seen with bug 38655.  In
theory we could encounter one of the PGs with some other last_update
that doesn't match what we expect.  If that ever happens, make sure the
result is incomplete so that backfill can clean up.

Additionally check that the pool metadata for the last merge matches the
PGs at all.  This could mismatch if we have an osdmap gap and are forced
to do some merge without merge info at all... in which case we should
definitely invalidate: there should be newer copies of the PG(s), and we
have no idea whether the PGs we are merging are what we want.  If this is
some disaster recovery situation, an operator is always free to use
ceph-objectstore-tool to re-mark a PG complete (at their own peril!).

Fixes: http://tracker.ceph.com/issues/38655
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-12 10:08:46 -05:00
Sage Weil
f978b27d2b qa/standalone/osd/pg-split-merge.sh: reproduce pg merge problem with empty pgs
This reproduces http://tracker.ceph.com/issues/38655

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-11 17:10:28 -05:00
Sage Weil
bf74c1adc4 qa/standalone/osd/osd-rep-recov-eio: fix better
- no need for the default pool size
- no initial osds or it will collide with setup_osds later
- no need for rbd pool at all

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-08 17:41:11 -06:00
Sage Weil
b59ff3860f qa/standalone/osd/osd-force-create-pg: create more pgs
Avoid warnings about too few pgs.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-06 16:27:56 -06:00
Sage Weil
cba0483b09 qa/standalone: make sure an osd is running before create_rbd_pool
'rbd pool init' now does IO.  Drop the pool, or change the pool size to 1.

Fixes: http://tracker.ceph.com/issues/38585
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-06 16:27:56 -06:00
Sage Weil
01316aa7bd qa/standalone/osd/pg-split-merge: fix import_after_merge_and_gap
This test introduces a map gap.  What *should* happen is that when there is
such a gap, we cannot import.  Previously, the test didn't reliably produce
a map gap at all, and didn't check that import failed--it verified that it
passed.

Fix the test so that it reliably produces a gap *and* reports
min_last_epoch_clean to the mon so we can trim.  Then verify we fail to
import, but can with --force.  But remove the pg again, because if we
force an import with a map gap the osd will refuse to start.

Fixes: http://tracker.ceph.com/issues/38525
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-03 10:23:27 -06:00
Sage Weil
c6a7b2cbd1 qa/standalone/osd/osd-markdown: disable CLI command dups
The markdown test is based on marking down a specific number of times, but
the duplicate commands from the CLI may not get absorbed/batched by the
mon, breaking the test.  Override the default qa/tasks/workunit.py
behavior of sending dups.

Fixes: http://tracker.ceph.com/issues/38359
Signed-off-by: Sage Weil <sage@redhat.com>
2019-02-18 15:02:25 -06:00
David Zafman
64beabc4c6 test: Limit loops waiting for force-backfill/force-recovery to happen
Fixes: http://tracker.ceph.com/issues/38309

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-13 17:44:53 -08:00
David Zafman
910a95b9c8 test: osd-backfill-stats.sh Fix check of multi backfill OSDs, skip remapped test
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-07 20:05:58 -08:00
David Zafman
690ff9a21f
Merge pull request #26213 from dzafman/wip-38041
osd: Fix recovery and backfill priority handling

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-02-07 17:26:34 -08:00
David Zafman
ca5cf14fa8 test: Add scripts to test backfill/recovery priority handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-07 15:46:23 -08:00
David Zafman
36e305c4b6 test: Ignore kill_daemons() error
Workaround for: http://tracker.ceph.com/issues/38195

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-05 11:31:32 -08:00
David Zafman
cc6339c0cd test: Increase timeouts in osd-backfill-space.sh because of failure seen
Fixes: http://tracker.ceph.com/issues/38027

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-05 11:29:32 -08:00
David Zafman
99ddd3666b
Merge pull request #22797 from dzafman/wip-19753
osd: Deny reservation if expected backfill size would put us over bac…

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-01-18 07:42:00 -08:00
Vikhyat Umrao
8a694fc2f9 qa: specify filestore for misc tests
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-16 13:09:19 -06:00
Sage Weil
b92be2ca9b qa/standalone/osd/osd-fast-mark-down: use v1 addr w/ simplemessenger
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
David Zafman
094d39aa09 test: Add testing for erasure code backfill out of space detection
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-18 09:30:44 -08:00
David Zafman
3b8f86c8b0 test: Add testing for backfill out of space detection
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-18 09:30:44 -08:00
Igor Fedotov
d07c10dfc0 os/bluestore: add main device expand capability.
One can do that via ceph-bluestore-tool's bluefs-bdev-expand command

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-11-29 12:48:20 +03:00
Sage Weil
c8a8dc21fd Merge PR #24828 into master
* refs/pull/24828/head:
	qa/osd-bluefs-volume-ops: use ceph-bluestore-tool for fsck
	qa/osd-bluefs-volume-ops: reduce space usage for the test case

Reviewed-by: David Zafman <dzafman@redhat.com>
2018-11-08 16:26:52 -06:00
Sage Weil
9ab9dcfc0d Merge PR #24809 into master
* refs/pull/24809/head:
	os/bluestore: omit redundant '/' in OSD path for ceph-bluestore-tool if
	os/bluestore: improve error handling for migrate ops in
	qa/standtalone/osd-bluefs-volume-ops: remove redundant code.

Reviewed-by: Sage Weil <sage@redhat.com>
2018-10-30 15:09:45 -05:00
Igor Fedotov
f5520ea304 qa/osd-bluefs-volume-ops: use ceph-bluestore-tool for fsck
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-30 15:38:16 +03:00
Igor Fedotov
80e67abdfd qa/osd-bluefs-volume-ops: reduce space usage for the test case
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-30 15:38:15 +03:00
Sage Weil
c40685ebdd Merge PR #24787 into master
* refs/pull/24787/head:
	Merge PR #24796 into nautilus
	osd: fix heartbeat_reset unlock
	Merge PR #24780 into nautilus
	Merge PR #24761 into nautilus
	Merge PR #24651 into nautilus
	osd: fix race between op_wq and context_queue
	test: Make sure kill_daemons failure will be easy to find
	test: Add flush_pg_stats to make test more deterministic
2018-10-29 08:36:34 -05:00
Igor Fedotov
5d38f8b49b qa/standtalone/osd-bluefs-volume-ops: remove redundant code.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-29 16:30:36 +03:00
Kefu Chai
4af71e7c00
Merge pull request #23103 from ifed01/wip-ifed-bluefs-migrate
os/bluestore: allow ceph-bluestore-tool to coalesce, add and migrate BlueFS backing volumes

Reviewed-by:  Sage Weil <sage@redhat.com>
2018-10-22 22:33:08 +08:00
David Zafman
da3c556aa2 test: Make sure kill_daemons failure will be easy to find
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-10-17 16:54:45 -07:00
David Zafman
b33edbc4f6 test: Add flush_pg_stats to make test more deterministic
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-10-17 16:54:45 -07:00
Igor Fedotov
02b5768a4f tests: add qa test case for bluefs volume coalescence
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-17 22:39:27 +03:00
huanwen ren
f1219d716d qa/osd: fixup osd-rep-recov-eio.sh fails to parse pg dump
Fixes: http://tracker.ceph.com/issues/36418
Signed-off-by: huanwen ren <ren.huanwen@zte.com.cn>
2018-10-16 02:18:22 +08:00
Sage Weil
9bf7c810a7 Merge PR #23985 into master
* refs/pull/23985/head:
	ceph-objectstore-tool: add back pool dne check
	qa/suites/rados/singleton/reg11184: remove old test
	ceph-objectstore-tool: import pg at original epoch
	osd: handle null pg slot on startup
	ceph-objectstore-tool: drop support for ancient export files
	osd: avoid dropping osd_lock when pg osdmaps are not laggy
	qa/standalone/osd/pg-merge.sh: add merge vs pg import test

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-09-21 08:21:53 -05:00
Kefu Chai
4b0e2c8ed4 qa: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:41:42 +08:00
Sage Weil
26cb966cab ceph-objectstore-tool: import pg at original epoch
- In the jewel era, we fast-forwarded the PG to the OSD's latest epoch
and cleared past_intervals.

- In mimic, as of 2347ecb961, we brought the
PG up to date while updating past_intervals.  (At the same time we removed
the OSD's parallel past_intervals regeneration.)

The problem is that the tool then has to reimplement the past_intervals
update logic, and *also* has to cope with splits and merges.  Splits are
somewhat easier (until now we enable partial import of a PG into a split
child), but merges are not so easy.

This patch changes it so we import the PG and leave the pg_epoch matching
the import file.  The OSD is then responsible for bringing it up to date
with the latest map, and dealing with any intervening splits or merges.

We also adjust the safety check to ensure that we don't collide with
any existing PG, either a child we eventually split into, or a parent
we eventually merge into.

Fixes: http://tracker.ceph.com/issues/35955
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-20 12:58:00 -05:00
Sage Weil
da887c82ce qa/standalone/osd/pg-merge.sh: add merge vs pg import test
- You can't import the source half a PG that's since merged.  Sorry!  We
could implement this later.
- You can import the target half, but the result will then be incomplete,
and you rely on backfill to clean it up.
- Map gaps don't affect this behavior.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-17 12:52:46 -05:00
David Zafman
ef6940fbb6 test: osd-backfill-stats.sh: Fix subtests to get primary which can change
Fixes: http://tracker.ceph.com/issues/35982

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-09-13 13:19:23 -07:00
Kefu Chai
510d9e1345
Merge pull request #23723 from xiexingguo/wip-list-missing
osd/PrimaryLogPG: rename list_missing -> list_unfound command

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-09-11 20:25:21 +08:00
Sage Weil
f47921f293 qa/standalone/osd/osd-backfill-stats: fixes
Grep from the primary's log, not every osd's log.

For the backfill_remapped task in particular, after the pg_temp change it
just so happens that the primary changes across the pool size change and
thus two different primaries do (some) backfill.  Fix that test to pass
the correct primary.

Other tests are unaffected as they do not (happen to) trigger a primary
change and already satisfied the (removed) check that only one OSD does
backfill.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 17:11:18 -05:00
xie xingguo
85ba2f0a82 osd/PrimaryLogPG: s/list_missing/list_unfound/
Also:
- Do not print **offset** until specified
- Count missing objects correctly (used to be primary's local missing)

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-09-06 09:52:20 +08:00
Xie Xingguo
0857124d23
Merge pull request #23663 from xiexingguo/wip-incompat-async-fixes
osd: some recovery improvements and cleanups


Reviewed-by: Sage Weil <sage@redhat.com>
2018-09-01 14:27:27 +08:00
xie xingguo
22786cffa8 osd/PG: force auth_log_shard to be primary when appropriate
So if there are a lot fo missing objects on primary, we can
make use of auth_log_shard to restore client I/O quickly.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-08-31 16:29:25 +08:00
Sage Weil
85083f39b5 Merge PR #23572 into master
* refs/pull/23572/head:
	qa/standalone/osd/osd-force-create-pg: add force-create-pg test
	mon/MonCommands: fix 'osd force-create-pg'

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-08-30 08:52:44 -05:00
Josh Durgin
cc41b51c6a
Merge pull request #23518 from dzafman/wip-25084
osd: When possible check CRC in build_push_op() so repair can eventually stop

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-08-23 11:39:05 -07:00
David Zafman
bc33170310 test: Use pids instead of jobspecs which were wrong
Fixes: http://tracker.ceph.com/issues/27056

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-08-22 10:57:04 -07:00
David Zafman
c1b2bd7f16 test: Fix test to detect a test setup failure
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-08-15 15:45:44 -07:00
David Zafman
72c34949fc test: Add test for filestore bad CRC in primary pull request
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-08-15 15:45:44 -07:00
Sage Weil
ed9ec42c42 qa/standalone/osd/osd-force-create-pg: add force-create-pg test
Signed-off-by: Sage Weil <sage@redhat.com>
2018-08-15 06:47:47 -05:00
Sage Weil
4108ebc0ab qa/standalone/osd/ec-error-rollforward: reproduce bug 24597
This reproduces http://tracker.ceph.com/issues/24597

Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-11 16:15:49 -05:00
Sage Weil
4f9fdd98e2 qa/standalone/osd/repro_long_log.sh: fix test
The log trimming case wasn't quite right.  Before HEAD^ we were
rolling forward too aggressively and miscalculating the can_rollforward_to,
which affected the trim_to calculation.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-11 16:15:49 -05:00
David Zafman
33538aca35 test: Fix standalone main usage
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-18 14:09:14 -07:00
David Zafman
39fc43556f test: Put files in private test directory
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-18 14:08:23 -07:00
Erwan Velu
e6e10246c6 tests: Protecting rados bench against endless loop
If the cluster dies during the rados bench, the maximum running time is
no more considered and all emitted aios are pending.

rados bench never quits and the global testing timeout (3600 sec : 1
hour) have to be reach to get a failure.

This situation is dramatic for a background test or a CI run as it locks
the whole job for too long for an event that will never occurs.

This ideal solution would be having 'rados bench' considering a failure
once the timeout is reached when aios are pending.

A possible workaround here is to put use the system command 'timeout'
before calling rados bench and fail if rados didn't completed on time.

To avoid side effects, this patch is doubling rados timeout. If rados
didn't completed after twice the expected time, it have to fail to avoid
locking the whole testing job.

Please find below the way it worked on a real test case.
We can see no IO after t>2 but despite timeout=4 the bench continue.
Thanks to this patch, the bench is stopped at t=8 and return 1.

5: /home/erwan/ceph/src/test/smoke.sh:55: TEST_multimon:  timeout 8 rados -p foo bench 4 write -b 4096 --no-cleanup
5: hints = 1
5: Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for up to 4 seconds or 0 objects
5: Object prefix: benchmark_data_mr-meeseeks_184960
5:   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
5:     0       0         0         0         0         0           -           0
5:     1      16      1144      1128   4.40538   4.40625  0.00412965   0.0141116
5:     2      16      2147      2131   4.16134   3.91797  0.00985654   0.0109079
5:     3      16      2147      2131   2.77424         0           -   0.0109079
5:     4      16      2147      2131    2.0807         0           -   0.0109079
5:     5      16      2147      2131   1.66456         0           -   0.0109079
5:     6      16      2147      2131   1.38714         0           -   0.0109079
5:     7      16      2147      2131   1.18897         0           -   0.0109079
5: /home/erwan/ceph/src/test/smoke.sh:55: TEST_multimon:  return 1
5: /home/erwan/ceph/src/test/smoke.sh:18: run:  return 1

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-14 11:06:52 +02:00
Neha Ojha
7f6f4f90fe qa: modify TEST_recovery_sizeup() to handle async recovery
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-03-15 11:13:34 -07:00
David Zafman
8a7e6c2349
Merge pull request #20220 from dzafman/wip-calc-stats3
osd: Improve recovery stat handling by using peer_missing and missing_loc info

Reviewed-by: Sage Weil <sage@redhat.com>
2018-03-14 11:07:44 -07:00
David Zafman
af85f3cc48 test: osd-backfill-stats.sh parallel osd-recovery-stats.sh check() changes
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
acc1f80684 test: Use "(est)" in log message when an osd doesn't have peer_missing
Consolidate check() code and common script code
TEST_recovery_multi() wasn't reliable due to delayed peer_missing

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
12e331b742 test: osd-recovery-stats.sh: New test with different missing objs on multiple OSDs
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
09b5697ba2 test: Correction for better degraded/misplaced handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
d7fd9174b9 osd: Fix for handling more than 1 missing target
Fix test case to test more than 1 target

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:03 -07:00
Josh Durgin
1c15458a00 PrimaryLogPG: only trim up to osd_pg_log_trim_max entries at once
This prevents the fix for http://tracker.ceph.com/issues/22050 or
potential future bugs from causing too much latency by trimming too
many log entries at once.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-03-09 19:14:28 -05:00
Josh Durgin
b50186bfe6 PG, PrimaryLogPG: trim log and rollback info for error log entries
Regular updates piggyback some osd state for this purpose with
MOSDRepOp[Reply]. Do the same thing for pure log entry updates (write
errors and lost/revert additions) via MOSDPGUpdateLogMissing[Reply].

Fixes: http://tracker.ceph.com/issues/22050
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-03-09 17:54:08 -05:00
Josh Durgin
2067f7c679
Merge pull request #20786 from dzafman/wip-zafman-log-trim
tools/ceph-objectstore-tool: command to trim the pg log

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-03-08 16:42:31 -08:00
Josh Durgin
b01e4ea5e2 tools: Add pg log trim command to ceph-objectstore-tool
Add test script that verifies the command in qa/standalone/osd

Fixes: http://tracker.ceph.com/issues/23242

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-08 15:58:55 -08:00
Sage Weil
c9e974800f qa: --no-mon-config for ceph-objectstore-tool --op mkfs ..
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-06 14:44:50 -06:00
Kefu Chai
ac56a202fd qa/standalone: extract delete_pool()
some tests, like osd-backfill-stats.sh are using delete_pool(), but
they don't have this function defined. and this function is defined
in standalone tests separately, so would be simpler if we can
consolidate them in ceph-helper.sh.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-02-28 15:40:28 +08:00
David Zafman
7ccb7b7023
Merge pull request #19850 from dzafman/wip-calc-stats
osd/PG: re-write of _update_calc_stats and improve pg degraded state

Fixes: http://tracker.ceph.com/issues/20059

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-01-16 11:58:49 -08:00
Kefu Chai
7aba57b9b4
Merge pull request #18191 from hjwsm1989/osd-mark-down
qa/standalone/osd/osd-mark-down: create pool to get updated osdmap faster

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-01-15 11:09:02 +08:00
David Zafman
88ce0c1a91 test: Verify stat calculations during backfill
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-01-14 18:17:23 -08:00
David Zafman
f5af1af6d3 test: Verify stat calculations during recovery
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-01-14 18:17:23 -08:00
Kefu Chai
e7097593a7 qa/standalone: remove osd-map-max-advance related tests
this setting was removed in 8967b73

Fixes: http://tracker.ceph.com/issues/22596
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-01-06 19:40:15 +08:00
Kefu Chai
2ceff9eb4e qa/stanalone: pass options using --<option-name>=<value>
not "--<option-name> <value>', otherwise `ceph-authtool` would error
out:

$ CEPH_ARGS='--osd-map-max-advance 1000' bin/ceph-authtool --gen-print-key
bin/ceph-authtool: unexpected '1000'
usage: ceph-authtool keyringfile [OPTIONS]...
....

but using the syntax of `--<option-name>=<value>', it works:

$ CEPH_ARGS='--osd-map-max-advance=1000' bin/ceph-authtool --gen-print-key
AQBAhTNamf5+ABAASkAp/6IGq7LkUTEOMp/fgw==

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-12-15 16:19:15 +08:00
David Zafman
c2572bee3c test: Add replicated recovery/backfill test
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-10-18 11:12:14 -07:00
huangjun
ee618a38a9 qa/standalone/osd/osd-mark-down: create pool to get updated osdmap faster
Mon send osdmap to random osds after we mark osd down, the down osd
may use more than $sleep time to get updated osdmap if there is no
osd ping between osds. So create pool after setup cluster.

Signed-off-by: huangjun <huangjun@xsky.com>
2017-10-09 22:19:29 +08:00
Kefu Chai
30b5b4627c Merge pull request #16494 from asomers/bin_bash
misc: Fix bash path in shebangs

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-08-27 10:14:14 +08:00
David Zafman
4db5124e1a qa: For FreeBSD skip osd-dup.sh because there is no bluestore
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
99ad4bbd91 qa: Add create_pool() which sleeps 1 second like python variant
wait_for_clean() can miss the new pool if it races with pool create.

Fixes: http://tracker.ceph.com/issues/20465

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
xie xingguo
734b5f2c60 test/osd-fast-mark-down: enable 'osd-class-update-on-start' by default
116cf759c8
will now hide all shadow trees(roots), so this is not applicable anymore
(actually it is misleading).

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-03 17:26:26 -04:00
Alan Somers
3aae5ca6fd scripts: fix bash path in shebangs
/bin/bash is a Linuxism.  Other operating systems install bash to
different paths.  Use /usr/bin/env in shebangs to find bash.

Signed-off-by: Alan Somers <asomers@gmail.com>
2017-07-27 13:24:26 -06:00
Sage Weil
766229b034 qa/standalone/scrub: separate scrub/repair tests from rest of osd/
They are slow.  Run them separately.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
cabad62242 qa/standalone/ceph-helpers: factor rbd pool create out of run_mon
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
71ea171604 qa: move ceph-helpers and misc src/test/*.sh tests to qa/standalone
- stop running via make check
- add teuthology yamls to run them
- disable ceph_objecstore_tool.py for now (too slow for make check, and
we can't use vstart in teuthology via a package install)
- drop cephtool tests since those are already covered by other teuthology
tests
- leave a handful of (fast!) ceph-helpers tests for make check for minimal
integration tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:49 -04:00