Commit Graph

67 Commits

Author SHA1 Message Date
David Zafman
99ddd3666b
Merge pull request #22797 from dzafman/wip-19753
osd: Deny reservation if expected backfill size would put us over bac…

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-01-18 07:42:00 -08:00
Vikhyat Umrao
8a694fc2f9 qa: specify filestore for misc tests
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-16 13:09:19 -06:00
Sage Weil
b92be2ca9b qa/standalone/osd/osd-fast-mark-down: use v1 addr w/ simplemessenger
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
David Zafman
094d39aa09 test: Add testing for erasure code backfill out of space detection
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-18 09:30:44 -08:00
David Zafman
3b8f86c8b0 test: Add testing for backfill out of space detection
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-18 09:30:44 -08:00
Igor Fedotov
d07c10dfc0 os/bluestore: add main device expand capability.
One can do that via ceph-bluestore-tool's bluefs-bdev-expand command

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-11-29 12:48:20 +03:00
Sage Weil
c8a8dc21fd Merge PR #24828 into master
* refs/pull/24828/head:
	qa/osd-bluefs-volume-ops: use ceph-bluestore-tool for fsck
	qa/osd-bluefs-volume-ops: reduce space usage for the test case

Reviewed-by: David Zafman <dzafman@redhat.com>
2018-11-08 16:26:52 -06:00
Sage Weil
9ab9dcfc0d Merge PR #24809 into master
* refs/pull/24809/head:
	os/bluestore: omit redundant '/' in OSD path for ceph-bluestore-tool if
	os/bluestore: improve error handling for migrate ops in
	qa/standtalone/osd-bluefs-volume-ops: remove redundant code.

Reviewed-by: Sage Weil <sage@redhat.com>
2018-10-30 15:09:45 -05:00
Igor Fedotov
f5520ea304 qa/osd-bluefs-volume-ops: use ceph-bluestore-tool for fsck
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-30 15:38:16 +03:00
Igor Fedotov
80e67abdfd qa/osd-bluefs-volume-ops: reduce space usage for the test case
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-30 15:38:15 +03:00
Sage Weil
c40685ebdd Merge PR #24787 into master
* refs/pull/24787/head:
	Merge PR #24796 into nautilus
	osd: fix heartbeat_reset unlock
	Merge PR #24780 into nautilus
	Merge PR #24761 into nautilus
	Merge PR #24651 into nautilus
	osd: fix race between op_wq and context_queue
	test: Make sure kill_daemons failure will be easy to find
	test: Add flush_pg_stats to make test more deterministic
2018-10-29 08:36:34 -05:00
Igor Fedotov
5d38f8b49b qa/standtalone/osd-bluefs-volume-ops: remove redundant code.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-29 16:30:36 +03:00
Kefu Chai
4af71e7c00
Merge pull request #23103 from ifed01/wip-ifed-bluefs-migrate
os/bluestore: allow ceph-bluestore-tool to coalesce, add and migrate BlueFS backing volumes

Reviewed-by:  Sage Weil <sage@redhat.com>
2018-10-22 22:33:08 +08:00
David Zafman
da3c556aa2 test: Make sure kill_daemons failure will be easy to find
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-10-17 16:54:45 -07:00
David Zafman
b33edbc4f6 test: Add flush_pg_stats to make test more deterministic
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-10-17 16:54:45 -07:00
Igor Fedotov
02b5768a4f tests: add qa test case for bluefs volume coalescence
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-17 22:39:27 +03:00
huanwen ren
f1219d716d qa/osd: fixup osd-rep-recov-eio.sh fails to parse pg dump
Fixes: http://tracker.ceph.com/issues/36418
Signed-off-by: huanwen ren <ren.huanwen@zte.com.cn>
2018-10-16 02:18:22 +08:00
Sage Weil
9bf7c810a7 Merge PR #23985 into master
* refs/pull/23985/head:
	ceph-objectstore-tool: add back pool dne check
	qa/suites/rados/singleton/reg11184: remove old test
	ceph-objectstore-tool: import pg at original epoch
	osd: handle null pg slot on startup
	ceph-objectstore-tool: drop support for ancient export files
	osd: avoid dropping osd_lock when pg osdmaps are not laggy
	qa/standalone/osd/pg-merge.sh: add merge vs pg import test

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-09-21 08:21:53 -05:00
Kefu Chai
4b0e2c8ed4 qa: fix typos
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-09-21 12:41:42 +08:00
Sage Weil
26cb966cab ceph-objectstore-tool: import pg at original epoch
- In the jewel era, we fast-forwarded the PG to the OSD's latest epoch
and cleared past_intervals.

- In mimic, as of 2347ecb961, we brought the
PG up to date while updating past_intervals.  (At the same time we removed
the OSD's parallel past_intervals regeneration.)

The problem is that the tool then has to reimplement the past_intervals
update logic, and *also* has to cope with splits and merges.  Splits are
somewhat easier (until now we enable partial import of a PG into a split
child), but merges are not so easy.

This patch changes it so we import the PG and leave the pg_epoch matching
the import file.  The OSD is then responsible for bringing it up to date
with the latest map, and dealing with any intervening splits or merges.

We also adjust the safety check to ensure that we don't collide with
any existing PG, either a child we eventually split into, or a parent
we eventually merge into.

Fixes: http://tracker.ceph.com/issues/35955
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-20 12:58:00 -05:00
Sage Weil
da887c82ce qa/standalone/osd/pg-merge.sh: add merge vs pg import test
- You can't import the source half a PG that's since merged.  Sorry!  We
could implement this later.
- You can import the target half, but the result will then be incomplete,
and you rely on backfill to clean it up.
- Map gaps don't affect this behavior.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-17 12:52:46 -05:00
David Zafman
ef6940fbb6 test: osd-backfill-stats.sh: Fix subtests to get primary which can change
Fixes: http://tracker.ceph.com/issues/35982

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-09-13 13:19:23 -07:00
Kefu Chai
510d9e1345
Merge pull request #23723 from xiexingguo/wip-list-missing
osd/PrimaryLogPG: rename list_missing -> list_unfound command

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-09-11 20:25:21 +08:00
Sage Weil
f47921f293 qa/standalone/osd/osd-backfill-stats: fixes
Grep from the primary's log, not every osd's log.

For the backfill_remapped task in particular, after the pg_temp change it
just so happens that the primary changes across the pool size change and
thus two different primaries do (some) backfill.  Fix that test to pass
the correct primary.

Other tests are unaffected as they do not (happen to) trigger a primary
change and already satisfied the (removed) check that only one OSD does
backfill.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 17:11:18 -05:00
xie xingguo
85ba2f0a82 osd/PrimaryLogPG: s/list_missing/list_unfound/
Also:
- Do not print **offset** until specified
- Count missing objects correctly (used to be primary's local missing)

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-09-06 09:52:20 +08:00
Xie Xingguo
0857124d23
Merge pull request #23663 from xiexingguo/wip-incompat-async-fixes
osd: some recovery improvements and cleanups


Reviewed-by: Sage Weil <sage@redhat.com>
2018-09-01 14:27:27 +08:00
xie xingguo
22786cffa8 osd/PG: force auth_log_shard to be primary when appropriate
So if there are a lot fo missing objects on primary, we can
make use of auth_log_shard to restore client I/O quickly.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-08-31 16:29:25 +08:00
Sage Weil
85083f39b5 Merge PR #23572 into master
* refs/pull/23572/head:
	qa/standalone/osd/osd-force-create-pg: add force-create-pg test
	mon/MonCommands: fix 'osd force-create-pg'

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-08-30 08:52:44 -05:00
Josh Durgin
cc41b51c6a
Merge pull request #23518 from dzafman/wip-25084
osd: When possible check CRC in build_push_op() so repair can eventually stop

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-08-23 11:39:05 -07:00
David Zafman
bc33170310 test: Use pids instead of jobspecs which were wrong
Fixes: http://tracker.ceph.com/issues/27056

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-08-22 10:57:04 -07:00
David Zafman
c1b2bd7f16 test: Fix test to detect a test setup failure
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-08-15 15:45:44 -07:00
David Zafman
72c34949fc test: Add test for filestore bad CRC in primary pull request
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-08-15 15:45:44 -07:00
Sage Weil
ed9ec42c42 qa/standalone/osd/osd-force-create-pg: add force-create-pg test
Signed-off-by: Sage Weil <sage@redhat.com>
2018-08-15 06:47:47 -05:00
Sage Weil
4108ebc0ab qa/standalone/osd/ec-error-rollforward: reproduce bug 24597
This reproduces http://tracker.ceph.com/issues/24597

Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-11 16:15:49 -05:00
Sage Weil
4f9fdd98e2 qa/standalone/osd/repro_long_log.sh: fix test
The log trimming case wasn't quite right.  Before HEAD^ we were
rolling forward too aggressively and miscalculating the can_rollforward_to,
which affected the trim_to calculation.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-11 16:15:49 -05:00
David Zafman
33538aca35 test: Fix standalone main usage
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-18 14:09:14 -07:00
David Zafman
39fc43556f test: Put files in private test directory
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-18 14:08:23 -07:00
Erwan Velu
e6e10246c6 tests: Protecting rados bench against endless loop
If the cluster dies during the rados bench, the maximum running time is
no more considered and all emitted aios are pending.

rados bench never quits and the global testing timeout (3600 sec : 1
hour) have to be reach to get a failure.

This situation is dramatic for a background test or a CI run as it locks
the whole job for too long for an event that will never occurs.

This ideal solution would be having 'rados bench' considering a failure
once the timeout is reached when aios are pending.

A possible workaround here is to put use the system command 'timeout'
before calling rados bench and fail if rados didn't completed on time.

To avoid side effects, this patch is doubling rados timeout. If rados
didn't completed after twice the expected time, it have to fail to avoid
locking the whole testing job.

Please find below the way it worked on a real test case.
We can see no IO after t>2 but despite timeout=4 the bench continue.
Thanks to this patch, the bench is stopped at t=8 and return 1.

5: /home/erwan/ceph/src/test/smoke.sh:55: TEST_multimon:  timeout 8 rados -p foo bench 4 write -b 4096 --no-cleanup
5: hints = 1
5: Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for up to 4 seconds or 0 objects
5: Object prefix: benchmark_data_mr-meeseeks_184960
5:   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
5:     0       0         0         0         0         0           -           0
5:     1      16      1144      1128   4.40538   4.40625  0.00412965   0.0141116
5:     2      16      2147      2131   4.16134   3.91797  0.00985654   0.0109079
5:     3      16      2147      2131   2.77424         0           -   0.0109079
5:     4      16      2147      2131    2.0807         0           -   0.0109079
5:     5      16      2147      2131   1.66456         0           -   0.0109079
5:     6      16      2147      2131   1.38714         0           -   0.0109079
5:     7      16      2147      2131   1.18897         0           -   0.0109079
5: /home/erwan/ceph/src/test/smoke.sh:55: TEST_multimon:  return 1
5: /home/erwan/ceph/src/test/smoke.sh:18: run:  return 1

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-14 11:06:52 +02:00
Neha Ojha
7f6f4f90fe qa: modify TEST_recovery_sizeup() to handle async recovery
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-03-15 11:13:34 -07:00
David Zafman
8a7e6c2349
Merge pull request #20220 from dzafman/wip-calc-stats3
osd: Improve recovery stat handling by using peer_missing and missing_loc info

Reviewed-by: Sage Weil <sage@redhat.com>
2018-03-14 11:07:44 -07:00
David Zafman
af85f3cc48 test: osd-backfill-stats.sh parallel osd-recovery-stats.sh check() changes
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
acc1f80684 test: Use "(est)" in log message when an osd doesn't have peer_missing
Consolidate check() code and common script code
TEST_recovery_multi() wasn't reliable due to delayed peer_missing

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
12e331b742 test: osd-recovery-stats.sh: New test with different missing objs on multiple OSDs
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
09b5697ba2 test: Correction for better degraded/misplaced handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
d7fd9174b9 osd: Fix for handling more than 1 missing target
Fix test case to test more than 1 target

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:03 -07:00
Josh Durgin
1c15458a00 PrimaryLogPG: only trim up to osd_pg_log_trim_max entries at once
This prevents the fix for http://tracker.ceph.com/issues/22050 or
potential future bugs from causing too much latency by trimming too
many log entries at once.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-03-09 19:14:28 -05:00
Josh Durgin
b50186bfe6 PG, PrimaryLogPG: trim log and rollback info for error log entries
Regular updates piggyback some osd state for this purpose with
MOSDRepOp[Reply]. Do the same thing for pure log entry updates (write
errors and lost/revert additions) via MOSDPGUpdateLogMissing[Reply].

Fixes: http://tracker.ceph.com/issues/22050
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-03-09 17:54:08 -05:00
Josh Durgin
2067f7c679
Merge pull request #20786 from dzafman/wip-zafman-log-trim
tools/ceph-objectstore-tool: command to trim the pg log

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-03-08 16:42:31 -08:00
Josh Durgin
b01e4ea5e2 tools: Add pg log trim command to ceph-objectstore-tool
Add test script that verifies the command in qa/standalone/osd

Fixes: http://tracker.ceph.com/issues/23242

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-08 15:58:55 -08:00
Sage Weil
c9e974800f qa: --no-mon-config for ceph-objectstore-tool --op mkfs ..
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-06 14:44:50 -06:00