Commit Graph

628 Commits

Author SHA1 Message Date
David Zafman
8a7e6c2349
Merge pull request #20220 from dzafman/wip-calc-stats3
osd: Improve recovery stat handling by using peer_missing and missing_loc info

Reviewed-by: Sage Weil <sage@redhat.com>
2018-03-14 11:07:44 -07:00
David Zafman
af85f3cc48 test: osd-backfill-stats.sh parallel osd-recovery-stats.sh check() changes
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
acc1f80684 test: Use "(est)" in log message when an osd doesn't have peer_missing
Consolidate check() code and common script code
TEST_recovery_multi() wasn't reliable due to delayed peer_missing

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
12e331b742 test: osd-recovery-stats.sh: New test with different missing objs on multiple OSDs
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
09b5697ba2 test: Correction for better degraded/misplaced handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
d7fd9174b9 osd: Fix for handling more than 1 missing target
Fix test case to test more than 1 target

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:03 -07:00
David Zafman
51b740ad41 test: Fail upon flush_pg_stats timeout
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-11 16:26:11 -07:00
Josh Durgin
1c15458a00 PrimaryLogPG: only trim up to osd_pg_log_trim_max entries at once
This prevents the fix for http://tracker.ceph.com/issues/22050 or
potential future bugs from causing too much latency by trimming too
many log entries at once.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-03-09 19:14:28 -05:00
Josh Durgin
b50186bfe6 PG, PrimaryLogPG: trim log and rollback info for error log entries
Regular updates piggyback some osd state for this purpose with
MOSDRepOp[Reply]. Do the same thing for pure log entry updates (write
errors and lost/revert additions) via MOSDPGUpdateLogMissing[Reply].

Fixes: http://tracker.ceph.com/issues/22050
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-03-09 17:54:08 -05:00
Josh Durgin
2067f7c679
Merge pull request #20786 from dzafman/wip-zafman-log-trim
tools/ceph-objectstore-tool: command to trim the pg log

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-03-08 16:42:31 -08:00
Josh Durgin
b01e4ea5e2 tools: Add pg log trim command to ceph-objectstore-tool
Add test script that verifies the command in qa/standalone/osd

Fixes: http://tracker.ceph.com/issues/23242

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-08 15:58:55 -08:00
David Zafman
317b3d3b36
Merge pull request #20759 from dzafman/wip-cleanup
test: Make clearer by moving code out of loop

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2018-03-08 10:45:38 -08:00
Sage Weil
c9e974800f qa: --no-mon-config for ceph-objectstore-tool --op mkfs ..
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-06 14:44:50 -06:00
Sage Weil
5ee5bbace1 qa/standalone: drop CEPH_LIB hacks
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-06 14:44:49 -06:00
David Zafman
fa5e75d046 test: Make code clearer by moving code out of loop
Caused by 33e747724a

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-06 11:30:08 -08:00
Kefu Chai
fc43ae1724 qa/standalone: s/delete_erasure_pool/delete_erasure_coded_pool/
it's a regression introduced by ac56a202

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-03-01 19:09:31 +08:00
Kefu Chai
ac56a202fd qa/standalone: extract delete_pool()
some tests, like osd-backfill-stats.sh are using delete_pool(), but
they don't have this function defined. and this function is defined
in standalone tests separately, so would be simpler if we can
consolidate them in ceph-helper.sh.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-02-28 15:40:28 +08:00
Josh Durgin
d1ca620698 mon/OSDMonitor: fix min_size default for replicated pools
This was accidentally changed to 0 by using the config value
directly in 582e567c93

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-02-23 00:39:13 -05:00
David Zafman
33e747724a osd: Add new snapset_inconsistency error check
Includes new test case

Caused by: 5f58301a13
This changed attr consistency checking to exclude system keys,
which required snapset to be handled just like object info.

Fixes: http://tracker.ceph.com/issues/22996

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-02-15 09:03:49 -08:00
Patrick Donnelly
46c25abd1c
test/encoding: refactor to avoid escaping shell magic
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-02-07 18:03:05 -08:00
Kefu Chai
4233cc02d4
Merge pull request #19651 from yanghonggang/master
mon/OSDMonitor.cc: fix expected_num_objects interpret error

Reviewed-by: Joao Eduardo Luis <joao@suse.de>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-01-26 14:34:11 +08:00
Yang Honggang
c24f2baec9 mon/OSDMonitor.cc: fix expected_num_objects interpret error
Fixes: http://tracker.ceph.com/issues/22530
Signed-off-by: Yang Honggang <joseph.yang@xtaotech.com>
2018-01-21 21:00:17 -05:00
David Zafman
7ccb7b7023
Merge pull request #19850 from dzafman/wip-calc-stats
osd/PG: re-write of _update_calc_stats and improve pg degraded state

Fixes: http://tracker.ceph.com/issues/20059

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-01-16 11:58:49 -08:00
Kefu Chai
7aba57b9b4
Merge pull request #18191 from hjwsm1989/osd-mark-down
qa/standalone/osd/osd-mark-down: create pool to get updated osdmap faster

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-01-15 11:09:02 +08:00
David Zafman
88ce0c1a91 test: Verify stat calculations during backfill
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-01-14 18:17:23 -08:00
David Zafman
f5af1af6d3 test: Verify stat calculations during recovery
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-01-14 18:17:23 -08:00
David Zafman
aeba36a660 ceph-helpers.sh: Add flush_pg_stats() to wait_for_clean() to make it reliable
osd-scrub-repair.sh: Fixes for omap keys landing on different OSDs due to flush

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-01-14 18:17:23 -08:00
Igor Fedotov
1653bcca3e qa/standalone/scrub/osd-scrub-repair.sh: remove extents flag from object_info_t
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-01-08 20:10:16 +03:00
Kefu Chai
e7097593a7 qa/standalone: remove osd-map-max-advance related tests
this setting was removed in 8967b73

Fixes: http://tracker.ceph.com/issues/22596
Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-01-06 19:40:15 +08:00
Sage Weil
f33ab7e03a Merge remote-tracking branch 'gh/mimic-dev1' 2017-12-20 15:08:30 -06:00
Sage Weil
06b7707cee
Merge pull request #19456 from liewegas/wip-22373
qa/standalone/ceph-helpers: pass --verbose to ceph-disk
2017-12-19 11:55:07 -06:00
Kefu Chai
2ceff9eb4e qa/stanalone: pass options using --<option-name>=<value>
not "--<option-name> <value>', otherwise `ceph-authtool` would error
out:

$ CEPH_ARGS='--osd-map-max-advance 1000' bin/ceph-authtool --gen-print-key
bin/ceph-authtool: unexpected '1000'
usage: ceph-authtool keyringfile [OPTIONS]...
....

but using the syntax of `--<option-name>=<value>', it works:

$ CEPH_ARGS='--osd-map-max-advance=1000' bin/ceph-authtool --gen-print-key
AQBAhTNamf5+ABAASkAp/6IGq7LkUTEOMp/fgw==

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-12-15 16:19:15 +08:00
Kefu Chai
4e621762ed qa/standalone/ceph-helpers.sh: silence ceph-disk DEPRECATION_WARNING
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-12-13 19:42:50 +08:00
Sage Weil
86dc162686 qa/standalone/ceph-helpers: pass --verbose to ceph-disk
Signed-off-by: Sage Weil <sage@redhat.com>
2017-12-12 12:56:45 -06:00
Sage Weil
4389b55435 Merge remote-tracking branch 'gh/mimic-dev1' 2017-12-11 22:27:35 -06:00
David Zafman
c4602c9ac8 test: ceph_objectstore_tool.py: Perform dump-import
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-12-08 18:50:04 -08:00
David Zafman
a8b8d541dd ceph-objectstore-tool: Add option "dump-import" to examine an export
Fixes: http://tracker.ceph.com/issues/22086

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-12-06 17:30:47 -08:00
Sage Weil
c6529ad93e qa/standalone/ceph-helpers.sh: fix full ratio ordering
Signed-off-by: Sage Weil <sage@redhat.com>
2017-11-29 16:07:12 -06:00
David Zafman
f94322066f
Merge pull request #18449 from dzafman/wip-zafman-misc
mark_unfound_lost fix and some other minor changes

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-10-27 10:21:25 -07:00
xie xingguo
f82228c4af osd/osd_type.cc: dump extents map object_info_t
which is good for bug hunting and diagnosing.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-10-24 11:46:23 +08:00
David Zafman
f918b1fac1 test: Remove bogus check in ceph_objectstore_tool.py
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-10-18 18:07:23 -07:00
David Zafman
69b5fc54fe test: Cleanup test-erasure-eio.sh code
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-10-18 11:12:14 -07:00
David Zafman
c2572bee3c test: Add replicated recovery/backfill test
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-10-18 11:12:14 -07:00
David Zafman
bb2bcb95f5 osd: Add new UnfoundBackfill and UnfoundRecovery pg transitions
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-10-18 11:01:39 -07:00
David Zafman
b9de5eec26 test: Test case that reproduces tracker 18162
recover_replicas: object added to missing set for backfill, but is not in recovering, error!

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-10-18 10:58:23 -07:00
huangjun
ee618a38a9 qa/standalone/osd/osd-mark-down: create pool to get updated osdmap faster
Mon send osdmap to random osds after we mark osd down, the down osd
may use more than $sleep time to get updated osdmap if there is no
osd ping between osds. So create pool after setup cluster.

Signed-off-by: huangjun <huangjun@xsky.com>
2017-10-09 22:19:29 +08:00
Sage Weil
96ddf5c3a0 Merge pull request #17708 from liewegas/wip-pg
osd: initial minimal efforts to clean up PG interface
2017-10-08 21:47:49 -05:00
Sage Weil
b6a5c09dba ceph-objectstore-tool: remove rm-past-intervals op
The OSD doesn't rebuild this on demand anymore.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-10-06 13:08:18 -05:00
Sage Weil
886606bfd7 qa/standalone/scrub/osd-scrub-repair.sh: drop omap_digest flag
This is no longer set if we are backed by bluestore, which we are by
default.  See be078c8b7b

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-10-06 19:25:40 +08:00
Sage Weil
eaa350be95 Merge pull request #18094 from xiexingguo/wip-tracker-21618
qa/standalone/scrub/osd-scrub-repair.sh: add extents flag into object_info_t

Reviewed-by: Sage Weil <sage@redhat.com>
2017-10-05 11:14:01 -05:00
Sage Weil
15b63d6795 qa/standalone/scrub/osd-scrub-repair: no -y to diff
With -y you can't see the entire line when it is long, which is
needed to identify the diff failure in
http://tracker.ceph.com/issues/21618

Instead, let the interactive user specify the option if they want it.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-10-03 14:35:35 -05:00
xie xingguo
2470ab4aba qa/standalone/scrub/osd-scrub-repair.sh: add extents flag into object_info_t
Introduced-by: https://github.com/ceph/ceph/pull/15199
Fixes: http://tracker.ceph.com/issues/21618
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-10-03 21:14:53 +08:00
Kefu Chai
3dfe209499 Merge pull request #17955 from asomers/bin_bash2
test: fix bash path in shebangs (part 2)

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-09-30 12:13:35 +08:00
David Zafman
2f466f8b26 Merge pull request #17920 from dzafman/wip-21382
Erasure code recovery should send additional reads if necessary

Fixes: http://tracker.ceph.com/issues/21382

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-09-29 09:04:43 -07:00
David Zafman
1235810c2a osd: Allow recovery to send additional reads
For now it doesn't include non-acting OSDs
Added test for this case

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-28 23:31:18 -07:00
David Zafman
f92aa6c824 test: Allow modified options to existing setup functions
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-28 23:31:18 -07:00
David Zafman
43e3206de2 test: Use feature to get last array element
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-28 23:31:18 -07:00
Alan Somers
d1cbb90daa scripts: fix bash path in shebangs (part 2)
/bin/bash is a Linuxism.  Other operating systems install bash to
different paths.  Use /usr/bin/env in shebangs to find bash.

Signed-off-by: Alan Somers <asomers@gmail.com>
2017-09-25 17:20:40 -06:00
Sage Weil
ec2bdbc44c qa/standalone/scrub/osd-scrub-snaps: adjust test for lack of snapdir objects
The head_exists stuff is totally gone; those test failures go away.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-09-22 17:49:19 -04:00
Kefu Chai
73d4afbf8c Merge pull request #17747 from tchaikov/wip-qa
qa/standalone: respect $TEMPDIR

Reviewed-by: David Zafman <dzafman@redhat.com>
2017-09-20 23:08:47 +08:00
Kefu Chai
f27251432a Merge pull request #17785 from dzafman/wip-add-repair
test: Fix ceph-objectstore-tool usage check

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-09-20 12:35:16 +08:00
Sage Weil
6767f841e5 Merge pull request #17427 from liewegas/wip-pg-num-limits
mon/OSDMonitor: implement cluster pg limit

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-09-19 12:57:10 -05:00
David Zafman
0364ae104a test: Fix ceph-objectstore-tool usage check
Caused by: c7b7a1f04f

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-18 15:29:22 -07:00
Kefu Chai
085778b80a Merge pull request #17703 from dzafman/wip-misc
Erasure code read test and code cleanup

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-09-15 19:54:58 +08:00
Kefu Chai
279d2980fa qa/standalone/ceph-helpers.sh: pass btrfs subvolume options the right way
with the latest btrfs-progs, it complains with

$ sudo btrfs subvolume list . -t
btrfs subvolume list: too many arguments

so, we need to pass `-t` right after `list` subcommand.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-15 12:19:50 +08:00
Kefu Chai
0c47aa8217 qa: respect $TEMPDIR
ceph-disk and ceph-detect-init are build in $TEMPDIR if it's defined.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-15 12:19:50 +08:00
Sage Weil
c9ffeeebeb qa/standalong/mon/osd-pool-create: fewer pgs in test
This runs afoul of the new max pg per osd limit.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-09-14 12:10:13 -04:00
David Zafman
50e08b0a5d test: Add a removal test for erasure code read
Test feature: http://tracker.ceph.com/issues/14513

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-13 13:15:52 -07:00
Xie Xingguo
0e604b112e Merge pull request #17515 from xiexingguo/wip-data-digest
osd/PrimaryLogPG: do not set data digest for bluestore

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2017-09-13 18:31:10 +08:00
xie xingguo
afcb617dc9 osd/PrimaryLogPG: do not generate data digest for BlueStore by default
BlueStore enables CRC by default, so this is a dup and gains
no more benefits.

Turn this off by default, which is good for performance.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-09-13 12:17:16 +08:00
David Zafman
44f51024cc Merge pull request #17538 from dzafman/wip-21272
Add export and remove ceph-objectstore-tool command option

Reviewed-by: Sage Weil <sage@redhat.com>
2017-09-11 20:12:27 -07:00
David Zafman
3bb20f6d75 ceph-objectstore-tool: Make pg removal require --force
Add new export-remove to combine the 2 operations

Fixes: http://tracker.ceph.com/issues/21272

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-08 17:56:05 -07:00
David Zafman
49ca1fff7f ceph-objectstore-tool: Better messages for bad --journal-path
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-08 17:50:46 -07:00
David Zafman
3ac219df2d test: Fix ceph-objectstore-tool test for standalone and latest code
vstart.sh now defaults to bluestore, so specify filestore
Set environment for run-standalone.sh and cmake build
Create td/cot_dir as test directory
Crush output format change
Change dir into test directory
Give a little time after pool creation
Check for core files as ceph-helpers.sh does

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-08 16:53:53 -07:00
David Zafman
495c32fd31 test: Move ceph-objectstore-tool test to standalone
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-08 16:53:30 -07:00
Yuri Weinstein
0c2a139ee6 Merge pull request #17513 from Liuchang0812/wip-max-avail-in-df
mon: incorrect MAX AVAIL in "ceph df"

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2017-09-08 13:41:07 -07:00
Sage Weil
e2bc8883ba qa/standalone/mon/misc.sh: fix mon feature test
Signed-off-by: Sage Weil <sage@redhat.com>
2017-09-06 10:18:07 -04:00
liuchang0812
365558571c mon: incorrect MAX AVAIL in "ceph df"
Fixes: http://tracker.ceph.com/issues/21243

Signed-off-by: liuchang0812 <liuchang0812@gmail.com>
2017-09-06 21:09:29 +08:00
xie xingguo
2ee80aead8 mon/OSDMonitor: make 'osd crush class rename' idempotent
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-29 10:43:35 +08:00
Kefu Chai
30b5b4627c Merge pull request #16494 from asomers/bin_bash
misc: Fix bash path in shebangs

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-08-27 10:14:14 +08:00
Sage Weil
5db94f4786 Merge pull request #17126 from xiexingguo/wip-nicenum
common/types: make numbers a bit nicer when displaying space usage

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-08-25 10:11:06 -05:00
Sage Weil
84465bf5a5 qa/standalone/scrub/osd-scrub-repair: fix grep pattern
PGMap shows

    ss << pg_sum.stats.sum.num_objects_unfound
       << "/" << pg_sum.stats.sum.num_objects << " objects unfound (" << b << "%)";

but we were grepping for "1/1 unfound" instead of "1/1 objects
unfound".

Introduced by fe81b7e3a5.

Fixes: http://tracker.ceph.com/issues/21127
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-25 11:03:44 -04:00
Kefu Chai
85b63670d9 Merge pull request #17039 from dzafman/wip-18206
osd: Fixes for osd_scrub_during_recovery handling

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-08-22 22:50:24 +08:00
xie xingguo
1ea448ac75 common/types: make numbers a bit nicer when displaying space usage
Was:
----------------------------------------------------------------------------
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27050M        3861M         12.49
POOLS:
    NAME                  ID     USED        %USED     MAX AVAIL     OBJECTS
    rbd                   0      101216k      1.10         8913M        1178
    cephfs_data_a         1            0         0         8913M           0
    cephfs_metadata_a     2          892         0         8913M          21
----------------------------------------------------------------------------

Now:
----------------------------------------------------------------------------
GLOBAL:
    SIZE      AVAIL     RAW USED     %RAW USED
    30.2G     26.4G        3.77G         12.50
POOLS:
    NAME                  ID     USED      %USED     MAX AVAIL     OBJECTS
    rbd                   0      99.2M      1.10         8.70G        1180
    cephfs_data_a         1          0         0         8.70G           0
    cephfs_metadata_a     2        892         0         8.70G          21
----------------------------------------------------------------------------

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-22 12:33:10 +08:00
David Zafman
367c32c69a osd: Fixes for osd_scrub_during_recovery handling
Fixes: http://tracker.ceph.com/issues/18206

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-21 17:08:14 -07:00
David Zafman
9f3d970a0d tests: osd-scrub-snaps.sh minor cleanup
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-21 17:08:14 -07:00
David Zafman
4c949b6258 osd, rados: Adding ss_attr_missing and ss_attr_corrupt errors to list-inconsistent-obj
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-11 11:37:32 -07:00
David Zafman
5f58301a13 osd, rados: Improve size scrub error handling
Fixes: http://tracker.ceph.com/issues/20243

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-11 11:37:32 -07:00
David Zafman
8ad4b29113 osd: Add whether shard is primary in list-inconsistent-obj
Add new field in the client interface
Update test case

Fixes: http://tracker.ceph.com/issues/18836

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-11 11:37:03 -07:00
Yuri Weinstein
11c57701c6 Merge pull request #16961 from xiexingguo/wip-class-rename
crush: "osd crush class rename" support

Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-11 06:18:57 -07:00
Sage Weil
d2d9b41275 Merge pull request #16709 from dzafman/wip-standalone
qa/standalone: misc fixes
2017-08-10 21:33:43 -05:00
xie xingguo
d792e8d528 crush: "osd crush class rename" support
In 076a6abd80 I killed the 'class rename' command
and thought it was totally useless but I was wrong.

Consider the following user case:
(1) randomly choose some OSDs(e.g., from different hosts) and try to make them for private use only,
    say, by grouping them into 'pool1'
(2) ceph osd crush set-device-class pool1 'OSDs from (1)'
(3) ceph osd crush rule create-replicated rule_for_pool1 default host pool1
(4) ceph osd pool rename pool1 pool2
(5) ceph osd crush class rename pool1 pool2

From the above user case, we need to safely change a pool name without worrying
any risk of data migration. That is why the 'osd crush class rename' command
is still needed here.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-11 08:32:39 +08:00
David Zafman
e24ac51a82 qa: Fix broken test_activate_osd() due to missing space
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
ae2c5331fb qa: Fix races with waiting for scrubs
The trigger_scrub sets the last_scrub_stamp backwards to
force a scheduled scrub.  In a small window this stamp could get propagated
to the mgr.  A test failure occurred because wait_for_scrub() was confused
by seeing a backward moving date.

The most critical change is having wait_for_scrub() make sure that the
date advances past the previous in value.

A test failed because the random backoff kept delayed triggered scrub, so
set osd_scrub_backoff throughout.

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
dddda523d1 qa: Testing of ceph-helpers.sh, teardown on fail to dump logs, save cores
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
1fe6cb0f02 osd: Avoid confusion over legacy snaps when head_exists corrupt
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
229de6b71d qa: Add support for core dumps
Save core dumps when running tests locally
Dump logs to output whenever cores seen

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:04 -07:00
David Zafman
4db5124e1a qa: For FreeBSD skip osd-dup.sh because there is no bluestore
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
61bfd236ad qa: Raise mon-data-avail-warn to pass tests with less space
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
574b3cd3d4 qa: Add common generalized inject_eio() to ceph-helpers.sh
Retry for a while to allow pool to appear

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
3988ebab43 qa: osd-scrub-repair.sh handle older versions of jq
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
2a679a36de qa: Add support for specifying sub-tests with run-standalone.sh
Fix test-ceph-helpers.sh to pass additional arguments on

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
69413618a0 qa: ceph-helpers.sh fixes
Add missing teardown to cleanup test directory
Fix pgid due to elimination of initial default pool
Testing could never fail because run_tests return ignored

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
xie xingguo
87952fc68d crush: automatically kill dead classes
If a class is no more referenced by any devices or crush rules,
it shall be considered as dead.

This patch makes Ceph automatically recycles those dead classes,
so user does not to explicitly call 'class rm', which is unsafe
and annoying.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 18:53:39 +08:00
xie xingguo
b863883ca7 crush: remove 'class rm' command
The current version is broken. E.g., it should only remove a class
which is never referenced by any device.

Since we now create new classes automatically, we shall automatically
recycle dead classes too. So this command is definitely unuseful.
(Actually it is weird that we keep 'class rm' without keeping the
 corresponding 'class create' command).

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 18:52:30 +08:00
xie xingguo
f1d80ff750 crush: do not automatically recycle class for 'rm-device-class'
This will prevent the current crush rule from referencing a non-existent
shadow tree and hence avoid a coredump such as below:

 0> 2017-08-05 09:54:19.943349 7f73887d6700 -1 /clove/vm/xxg/rpm/ceph/rpmbuild/BUILD/ceph-12.1.2.1/src/crush/CrushWrapper.cc: In function 'int CrushWrapper::get_rule_weight_osd_map(unsigned
 int, std::map<int, float>*)' thread 7f73887d6700 time 2017-08-05 09:54:19.941291
/clove/vm/xxg/rpm/ceph/rpmbuild/BUILD/ceph-12.1.2.1/src/crush/CrushWrapper.cc: 1631: FAILED assert(b)

 ceph version 12.1.2.1-11-gd0f812a (d0f812a3a757b319c26794f558b57770663ab324) luminous (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f7398b66ea0]
 2: (CrushWrapper::get_rule_weight_osd_map(unsigned int, std::map<int, float, std::less<int>, std::allocator<std::pair<int const, float> > >*)+0x54e) [0x7f7398daac4e]
 3: (PGMap::get_rule_avail(OSDMap const&, int) const+0x68) [0x7f73989a6428]
 4: (PGMap::get_rules_avail(OSDMap const&, std::map<int, long, std::less<int>, std::allocator<std::pair<int const, long> > >*) const+0x35c) [0x7f73989b748c]
 5: (PGMap::encode_digest(OSDMap const&, ceph::buffer::list&, unsigned long) const+0x16) [0x7f73989b7506]
 6: (DaemonServer::send_report()+0x2a4) [0x7f73989f5474]
 7: (DaemonServer::maybe_ready(int)+0x2f9) [0x7f73989f6129]
 8: (DaemonServer::ms_dispatch(Message*)+0xce) [0x7f73989ff68e]
 9: (DispatchQueue::entry()+0x792) [0x7f7398dd2a22]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f7398c1429d]
 11: (()+0x7df3) [0x7f739640cdf3]
 12: (clone()+0x6d) [0x7f73954f23ed]

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 18:44:59 +08:00
David Zafman
99ad4bbd91 qa: Add create_pool() which sleeps 1 second like python variant
wait_for_clean() can miss the new pool if it races with pool create.

Fixes: http://tracker.ceph.com/issues/20465

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
David Zafman
b20dfc2864 qa: Add special test_failure.sh script (not run by default)
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
David Zafman
8c768050a5 qa: run-standalone.sh improvements
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
David Zafman
4314cdd666 qa: Dump logs after daemons are killed to make sure everything is flushed
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
xie xingguo
734b5f2c60 test/osd-fast-mark-down: enable 'osd-class-update-on-start' by default
116cf759c8
will now hide all shadow trees(roots), so this is not applicable anymore
(actually it is misleading).

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-03 17:26:26 -04:00
Sage Weil
41bcf2fee5 Merge pull request #16281 from badone/wip-PG-cluster-log-audit
osd: Log audit

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-07-27 16:25:30 -05:00
Alan Somers
3aae5ca6fd scripts: fix bash path in shebangs
/bin/bash is a Linuxism.  Other operating systems install bash to
different paths.  Use /usr/bin/env in shebangs to find bash.

Signed-off-by: Alan Somers <asomers@gmail.com>
2017-07-27 13:24:26 -06:00
Sage Weil
e469a8044c qa/standalone/crush/crush-classes: fix test
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:25:25 -04:00
Sage Weil
380de3395f qa/standalone/README
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:24:52 -04:00
xie xingguo
076a6abd80 crush: kill 'class rename'
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:40:50 +08:00
xie xingguo
a27fd9d25c crush: kill "class create" command
The device class is now self and automatically managed.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:40:17 +08:00
xie xingguo
edd8930346 crush: allow "crush class rm" to automatically recycle shadow tree(s)
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:39:41 +08:00
xie xingguo
9d908c14f6 crush: rm-device-class support
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:39:08 +08:00
xie xingguo
32fb548797 crush: guard set-device-class
If a device has already been bounded to a class,
do not allow to change its class silently.
Require user call rm-device-class first.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:34:08 +08:00
xie xingguo
e4e83a0dd7 crush: fix class_is_in_use()
A class can be considered as in-use only if it is referenced by
any of the existing crush rules.

The patch also makes the output more human readable. For example:

./bin/ceph osd crush rule create-replicated myrule default host ssd
./bin/ceph osd crush class rm ssd
Error EBUSY: class 'ssd' still referenced by crush_rule 'myrule'

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:31:39 +08:00
xie xingguo
f3a3180cca crush: rebuild shadow tree on "crush create-or-move/move"
This patch solves the problem below:

./bin/ceph osd crush move osd.0 root=foo rack=foo-rack host=foo-host
moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

 ./bin/ceph osd crush rule create-replicated foo-rule foo host ssd
Error EINVAL: root foo has no devices with class ssd

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:30:59 +08:00
xie xingguo
10bf2a633f crush: fix "crush create-or-move/move" would drop osd's class
Was:
     ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -1       3.00000 root default
    -2       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     0   ssd 1.00000         osd.0                                         up  1.00000 1.00000
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    ./bin/ceph osd crush move osd.0 root=foo rack=foo-rack  host=foo-host
    moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

     ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -7       1.00000 root foo
    -6       1.00000     rack foo-rack
    -5       1.00000         host foo-host
     0       1.00000             osd.0                                     up  1.00000 1.00000
    -1       2.00000 root default
    -2       2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    Now:
    ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -1       3.00000 root default
    -2       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     0   ssd 1.00000         osd.0                                         up  1.00000 1.00000
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    ./bin/ceph osd crush move osd.0 root=foo rack=foo-rack  host=foo-host
    moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

    ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -7       1.00000 root foo
    -6       1.00000     rack foo-rack
    -5       1.00000         host foo-host
     0   ssd 1.00000             osd.0                                     up  1.00000 1.00000
    -1       2.00000 root default
    -2       2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:30:26 +08:00
Brad Hubbard
f8acc53d82 osd: Log audit
Review current log messages for consistency, accuracy and necessesity as
part of usability initiative. First in a series.

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2017-07-26 17:34:28 +10:00
Sage Weil
766229b034 qa/standalone/scrub: separate scrub/repair tests from rest of osd/
They are slow.  Run them separately.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
cabad62242 qa/standalone/ceph-helpers: factor rbd pool create out of run_mon
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
b12bebe432 qa/standalone/mon/osd-pool-create: stop testing create pool output
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:49 -04:00
Sage Weil
71ea171604 qa: move ceph-helpers and misc src/test/*.sh tests to qa/standalone
- stop running via make check
- add teuthology yamls to run them
- disable ceph_objecstore_tool.py for now (too slow for make check, and
we can't use vstart in teuthology via a package install)
- drop cephtool tests since those are already covered by other teuthology
tests
- leave a handful of (fast!) ceph-helpers tests for make check for minimal
integration tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:49 -04:00