Commit Graph

4680 Commits

Author SHA1 Message Date
Sage Weil
13d7c4f4ec Merge PR #26898 into nautilus
* refs/pull/26898/head:
	osd/PG: invalidate PG if merging with unexpected version
	osd,mon: include more pg merge metadata in pg_pool_t
	qa/standalone/osd/pg-split-merge.sh: reproduce pg merge problem with empty pgs
	osd: add osd_debug_no_{acting_change,purge_strays}

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-14 22:37:18 -05:00
Patrick Donnelly
7de8cb405c
Merge PR #26935 into nautilus
* refs/pull/26935/head:
	qa: extend MDS heartbeat grace for valgrind

Reviewed-by: Sage Weil <sage@redhat.com>
2019-03-13 20:37:03 -07:00
Patrick Donnelly
505a05f351
Merge PR #26916 into nautilus
* refs/pull/26916/head:
	qa: ignore MON_DOWN for volume-client testing

Reviewed-by: Sage Weil <sage@redhat.com>
2019-03-13 20:31:01 -07:00
Sage Weil
4bb4f7a891 Merge PR #26894 into nautilus
* refs/pull/26894/head:
	qa/standalone/erasure-code/test-erasure-code: adjust test to avoid m=0
	erasure-code: ensure m >= 1
	mon/OSDMonitor: set ec min_size to k + min(1, m - 1)

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-13 22:07:45 -05:00
Sage Weil
52d5797c3d qa/standalone/erasure-code/test-erasure-code: adjust test to avoid m=0
_DD is k=2 m=0, which we don't allow.  Switch it to cDD.

I confess I don't fully understand why this was _DD to begin with, but
I'm pretty sure mapping is there to control the order of results so that
it can be mapped to the CRUSH rule output sanely, and the coding portion
is not relevant to the test.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-13 12:46:50 -05:00
Patrick Donnelly
7b520755ce
qa: extend MDS heartbeat grace for valgrind
Valgrind makes the MDS slowwwww. The newish mds_heartbeat_grace config allows
us to keep sending beacons to the mons even if the internal heartbeat is slow.
This avoids the laggy messages which are useful to grep for unrelated messaging
issues.

Fixes: http://tracker.ceph.com/issues/38723
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-03-13 09:18:32 -07:00
Sage Weil
96b837830c Merge PR #26920 into master
* refs/pull/26920/head:
	qa/tasks/mgr/test_module_selftest: fix localized value test
	mgr/BaseMgrStandbyModule: parse prefix properly

Reviewed-by: Volker Theile <vtheile@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2019-03-13 08:16:20 -05:00
Sage Weil
ebdd003bf4 qa/tasks/mgr/test_module_selftest: fix localized value test
When mgr/selftest/testkey = foo and mgr/selftest/x/testkey is not set,
then get_localized() should return foo.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-13 07:11:47 -05:00
Sage Weil
0eaad2d8d8 Merge PR #26886 into master
* refs/pull/26886/head:
	crush/CrushWrapper: ensure crush_choose_arg_map.size == max_buckets

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-03-13 06:56:16 -05:00
David Zafman
3ab9f38799
Merge pull request #26899 from dzafman/wip-38678
Minor cleanups in tests and log output

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-12 12:41:40 -07:00
Sage Weil
ab0a652826 erasure-code: ensure m >= 1
Fixes: http://tracker.ceph.com/issues/38682
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-12 13:12:58 -05:00
Patrick Donnelly
4f3df2cc82
Merge PR #26893 into master
* refs/pull/26893/head:
	qa: unmount clients prior to marking fs down

Reviewed-by: Zheng Yan <zyan@redhat.com>
2019-03-12 10:47:53 -07:00
Patrick Donnelly
1ceadf0f07
qa: ignore MON_DOWN for volume-client testing
The test restarts the monitors.

Fixes: http://tracker.ceph.com/issues/38704
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-03-12 10:38:55 -07:00
Patrick Donnelly
c859be5022
Merge PR #26892 into master
* refs/pull/26892/head:
	qa: stop testing simple messenger in CephFS suites

Reviewed-by: Sage Weil <sage@redhat.com>
2019-03-12 10:26:27 -07:00
Sage Weil
ccda488815 crush/CrushWrapper: ensure crush_choose_arg_map.size == max_buckets
The crush/builder.c crush_add_bucket method resizes the max_buckets array
but a power of 2 when it has to expand, but the code in CrushWrapper was
assuming that if the array grew the pos for the new bucket would be the
last position in the new array.  This led to a situation where the
crush_choose_arg_map args array size didn't match max_buckets, and
eventually caused a crash.

Fixes: http://tracker.ceph.com/issues/38664
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-12 11:26:43 -05:00
Sage Weil
fb915c4805 osd/PG: invalidate PG if merging with unexpected version
If the source or target PG version is 0'0, we may silently take the max
of the source and target and still leave the PG complete.  This
specifically can happen with an empty PG, as seen with bug 38655.  In
theory we could encounter one of the PGs with some other last_update
that doesn't match what we expect.  If that ever happens, make sure the
result is incomplete so that backfill can clean up.

Additionally check that the pool metadata for the last merge matches the
PGs at all.  This could mismatch if we have an osdmap gap and are forced
to do some merge without merge info at all... in which case we should
definitely invalidate: there should be newer copies of the PG(s), and we
have no idea whether the PGs we are merging are what we want.  If this is
some disaster recovery situation, an operator is always free to use
ceph-objectstore-tool to re-mark a PG complete (at their own peril!).

Fixes: http://tracker.ceph.com/issues/38655
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-12 10:08:46 -05:00
David Zafman
51a45e796e qa/test-erasure-code.sh: Don't grep entire bluestore directory
Bluestore caused grep crash with "grep: memory exhausted" due to
size of "block" storage.

Fixes: http://tracker.ceph.com/issues/38678

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-11 18:47:29 -07:00
David Zafman
d4915ee503 qa: Don't create rbd pool because it creates an object
This also reverts commit 10b9626ea7.

Fixes: http://tracker.ceph.com/issues/38631

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-11 16:57:51 -07:00
David Zafman
8114a2619b qa: Can't wait for clean when there aren't any pools/PGs.
Fixes: http://tracker.ceph.com/issues/38678

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-11 16:02:48 -07:00
Sage Weil
f978b27d2b qa/standalone/osd/pg-split-merge.sh: reproduce pg merge problem with empty pgs
This reproduces http://tracker.ceph.com/issues/38655

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-11 17:10:28 -05:00
Volker Theile
bc9643657a mgr: Fix broken get_localized_module_option function
Fixes: https://tracker.ceph.com/issues/38560

Signed-off-by: Volker Theile <vtheile@suse.com>
2019-03-11 17:25:18 +01:00
Patrick Donnelly
e7e4eea3a6
Merge PR #26818 into master
* refs/pull/26818/head:
	qa/cephfs: relax min_caps_per_client check

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-03-11 09:21:43 -07:00
Patrick Donnelly
9aaf6118a4
qa: unmount clients prior to marking fs down
Evicted RHEL7.5 clients may hang.

Fixes: http://tracker.ceph.com/issues/38677
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-03-11 09:15:37 -07:00
Patrick Donnelly
897a1f7385
qa: stop testing simple messenger in CephFS suites
Simple messenger is on it's way out and it doesn't work with msgr2.

Fixes: http://tracker.ceph.com/issues/38676
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-03-11 09:06:32 -07:00
Ilya Dryomov
7615012224 Merge PR #26858 into master
* refs/pull/26858/head:
	qa: krbd deep-flatten test
	qa/suites/krbd: enable deep-flatten feature

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-11 14:38:01 +01:00
Patrick Donnelly
58039163e3
Merge PR #26859 into master
* refs/pull/26859/head:
	qa: ignore slow metadata io wrn during osd thrash

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-10 10:38:16 -07:00
Sage Weil
2ad02fbfe3 qa/standalone/erasure-code/test-erasure-eio.sh: still need to create rbd pool
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-09 09:34:49 -06:00
Sage Weil
10b9626ea7 qa/standalone/scrub/osd-scrub-repair: fix unfound grep
It's now "1/2 unfound":

             1/2 objects unfound (50.000%)

..presumably due to the rbd pool init creating the rbd_directory.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-08 18:23:48 -06:00
Sage Weil
30fc7f5e97 qa/standalone/ceph-helpers: fix test_wait_for_clean
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-08 18:07:10 -06:00
Sage Weil
1e2b0c7252 qa/standalone/ceph-helpers.sh: fix test_run_mon
- Only create each osd once
- forget the first osdmap dump test; it's pointless

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-08 17:43:00 -06:00
Sage Weil
bf74c1adc4 qa/standalone/osd/osd-rep-recov-eio: fix better
- no need for the default pool size
- no initial osds or it will collide with setup_osds later
- no need for rbd pool at all

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-08 17:41:11 -06:00
Patrick Donnelly
5abcc32ff6
qa: ignore slow metadata io wrn during osd thrash
Fixes: http://tracker.ceph.com/issues/38651
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-03-08 10:49:10 -08:00
Ilya Dryomov
6892da1c0b qa: krbd deep-flatten test
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-08 18:14:37 +01:00
Sage Weil
3e83a6e960 Merge PR #26823 into master
* refs/pull/26823/head:
	qa/suites: disable valgrind leak checks on ceph-mgr
	mgr: skip shutdown and exit

Reviewed-by: Tim Serong <tserong@suse.com>
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-03-08 09:04:21 -06:00
Sage Weil
62136d381a Merge PR #26794 into master
* refs/pull/26794/head:
	mon/MgrMonitor: only try to update always_on_modules if >= NAUTILUS
	qa/standalone/mon/msgr-v2-transition: add some tests for enabling msgr v2
	mon/MonmapMonitor: add 'ceph mon set-addrs <name> <addrvec>' command
	Revert "mon/MonClient: disable ms_bind_msgr2 if NAUTILUS feature not set"
	mon/OSDMonitor: use legacy_equals to compare osd addrs
	msg/msg_types: make legacy_equals() symmetrical
	mon/MDSMonitor: stop using get_orig_source_inst()

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-07 22:12:52 -06:00
Sage Weil
4d33b6d56a Merge PR #26770 into master
* refs/pull/26770/head:
	qa/standalone/osd/osd-force-create-pg: create more pgs
	qa/standalone: make sure an osd is running before create_rbd_pool

Reviewed-by: Mykola Golub <mgolub@suse.com>
2019-03-07 22:10:12 -06:00
Sage Weil
c939eefa16 qa/standalone/mon/msgr-v2-transition: add some tests for enabling msgr v2
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-07 16:35:35 -06:00
Sage Weil
ec7c9976d7 Merge PR #26802 into master
* refs/pull/26802/head:
	qa/suites/upgrade/mimic-x/parallel: run master rados/test.sh

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-07 13:49:58 -06:00
Sage Weil
e79dc454db qa/suites: disable valgrind leak checks on ceph-mgr
We've disabled the "clean" shutdown in ceph-mgr due to
https://tracker.ceph.com/issues/38621

Until then, no valgrind leak checks!

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-07 13:03:28 -06:00
Sage Weil
4c5ed29925 Merge PR #26764 into master
* refs/pull/26764/head:
	mgr: 'osd df' by specified class or (crush) name
	mon/OSDMonitor: add 'osd crush get-device-class' command
2019-03-07 08:52:56 -06:00
Yan, Zheng
8e81bd74c5 qa/cephfs: relax min_caps_per_client check
new kernel client proactively release caps. caps count can go below
mds_min_caps_per_client

Fixes: http://tracker.ceph.com/issues/38270
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2019-03-07 21:32:20 +08:00
Ilya Dryomov
e50aa559f6 Merge PR #26775 into master
* refs/pull/26775/head:
	qa/suites/krbd/wac: bluestore snippet is placed incorrectly

Reviewed-by: Mike Christie <mchristi@redhat.com>
2019-03-07 12:19:36 +01:00
Sage Weil
a376a151ea qa/suites/upgrade/mimic-x/parallel: run master rados/test.sh
We rename ceph_test_rados_api_tier to add _pp, so the mimic version doesn't
work.  And in any case, at this stage the client host has master installed.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-06 16:50:12 -06:00
Sage Weil
b59ff3860f qa/standalone/osd/osd-force-create-pg: create more pgs
Avoid warnings about too few pgs.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-06 16:27:56 -06:00
Sage Weil
cba0483b09 qa/standalone: make sure an osd is running before create_rbd_pool
'rbd pool init' now does IO.  Drop the pool, or change the pool size to 1.

Fixes: http://tracker.ceph.com/issues/38585
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-06 16:27:56 -06:00
Sebastian Wagner
7ba6bece41
Merge pull request #26633 from jtlayton/wip-nfs-scale
mgr/orchestrator: Allow the orchestrator to scale the NFS server count

Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
2019-03-06 19:08:48 +01:00
Jeff Layton
a256735d4c mgr/orchestrator: allow scaling the NFS server count up and down
Add a new 'ceph orchestrator nfs update' command that will take the
NFS clustername and a new count as arguments. That will get translated
to a StatelessServiceSpec and passed to update_stateless_service.

Also, add the necessary stubs to the test_orchestrator and the CLI
QA test.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2019-03-06 07:15:14 -05:00
xie xingguo
af02d1031d mgr: 'osd df' by specified class or (crush) name
For large clusters, we use device classes to isolate storage pools.
The existing 'osd df' output turns out to be too nosiy, say, if
you care about only single storage pool with osds possibly spanning over
all hosts.

With this change you are now being able to do 'osd df' by class (or by pool,
if you simply use classes to separate different pools), or by a specified
crush bucket name you are currently interested in, which is much more
convenient.

Some examples:
```
$ bin/ceph osd df tree
ID CLASS WEIGHT  REWEIGHT SIZE   RAW USE DATA    OMAP META  AVAIL   %USE  VAR  PGS STATUS TYPE NAME
-1       0.05878        - 60 GiB 6.4 GiB  23 MiB  0 B 6 GiB  54 GiB 10.60 1.00   -        root default
-3       0.02939        - 30 GiB 3.2 GiB  12 MiB  0 B 3 GiB  27 GiB 10.60 1.00   -            host ceph11
 3   aaa 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  56     up         osd.3
 4   bbb 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  58     up         osd.4
 5   ccc 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  60     up         osd.5
-5       0.02939        - 30 GiB 3.2 GiB  12 MiB  0 B 3 GiB  27 GiB 10.60 1.00   -            host ceph12
 0   aaa 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  50     up         osd.0
 1   bbb 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  61     up         osd.1
 2   ccc 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  51     up         osd.2
                    TOTAL 60 GiB 6.4 GiB  23 MiB  0 B 6 GiB  54 GiB 10.60
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

$ bin/ceph osd df tree class aaa
ID CLASS WEIGHT  REWEIGHT SIZE   RAW USE DATA    OMAP META  AVAIL   %USE  VAR  PGS STATUS TYPE NAME
-1       0.05878        - 20 GiB 2.1 GiB 7.8 MiB  0 B 2 GiB  18 GiB 10.60 1.00   -        root default
-3       0.02939        - 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00   -            host ceph11
 3   aaa 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  56     up         osd.3
-5       0.02939        - 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00   -            host ceph12
 0   aaa 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  50     up         osd.0
                    TOTAL 20 GiB 2.1 GiB 7.8 MiB  0 B 2 GiB  18 GiB 10.60
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

$ bin/ceph osd df tree name ceph11
ID CLASS WEIGHT  REWEIGHT SIZE   RAW USE DATA    OMAP META  AVAIL   %USE  VAR  PGS STATUS TYPE NAME
-3       0.02939        - 30 GiB 3.2 GiB  12 MiB  0 B 3 GiB  27 GiB 10.60 1.00   -            host ceph11
 3   aaa 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  56     up         osd.3
 4   bbb 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  58     up         osd.4
 5   ccc 0.00980  1.00000 10 GiB 1.1 GiB 3.9 MiB  0 B 1 GiB 9.0 GiB 10.60 1.00  60     up         osd.5
                    TOTAL 30 GiB 3.2 GiB  12 MiB  0 B 3 GiB  27 GiB 10.60
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

```

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-03-06 11:10:56 +08:00
Ilya Dryomov
7ab3153902 qa/suites/krbd/wac: bluestore snippet is placed incorrectly
Instead of generating three tests, each with bluestore-bitmap.yaml, it
generates four tests: one consisting of just bluestore-bitmap.yaml and
the other three without any trace of bluestore.  This was introduced in
commit 711df71790 ("qa: objectstore snippets for krbd").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05 23:07:27 +01:00
Xie Xingguo
ad8e7d33b1
Merge pull request #26729 from xiexingguo/wip-recovery-priority-restrictions
mon/OSDMonitor: add boundary check for pool recovery_priority

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-03-05 20:16:18 +08:00