* refs/pull/26898/head:
osd/PG: invalidate PG if merging with unexpected version
osd,mon: include more pg merge metadata in pg_pool_t
qa/standalone/osd/pg-split-merge.sh: reproduce pg merge problem with empty pgs
osd: add osd_debug_no_{acting_change,purge_strays}
Reviewed-by: Neha Ojha <nojha@redhat.com>
* refs/pull/26894/head:
qa/standalone/erasure-code/test-erasure-code: adjust test to avoid m=0
erasure-code: ensure m >= 1
mon/OSDMonitor: set ec min_size to k + min(1, m - 1)
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
_DD is k=2 m=0, which we don't allow. Switch it to cDD.
I confess I don't fully understand why this was _DD to begin with, but
I'm pretty sure mapping is there to control the order of results so that
it can be mapped to the CRUSH rule output sanely, and the coding portion
is not relevant to the test.
Signed-off-by: Sage Weil <sage@redhat.com>
Valgrind makes the MDS slowwwww. The newish mds_heartbeat_grace config allows
us to keep sending beacons to the mons even if the internal heartbeat is slow.
This avoids the laggy messages which are useful to grep for unrelated messaging
issues.
Fixes: http://tracker.ceph.com/issues/38723
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
When mgr/selftest/testkey = foo and mgr/selftest/x/testkey is not set,
then get_localized() should return foo.
Signed-off-by: Sage Weil <sage@redhat.com>
The crush/builder.c crush_add_bucket method resizes the max_buckets array
but a power of 2 when it has to expand, but the code in CrushWrapper was
assuming that if the array grew the pos for the new bucket would be the
last position in the new array. This led to a situation where the
crush_choose_arg_map args array size didn't match max_buckets, and
eventually caused a crash.
Fixes: http://tracker.ceph.com/issues/38664
Signed-off-by: Sage Weil <sage@redhat.com>
If the source or target PG version is 0'0, we may silently take the max
of the source and target and still leave the PG complete. This
specifically can happen with an empty PG, as seen with bug 38655. In
theory we could encounter one of the PGs with some other last_update
that doesn't match what we expect. If that ever happens, make sure the
result is incomplete so that backfill can clean up.
Additionally check that the pool metadata for the last merge matches the
PGs at all. This could mismatch if we have an osdmap gap and are forced
to do some merge without merge info at all... in which case we should
definitely invalidate: there should be newer copies of the PG(s), and we
have no idea whether the PGs we are merging are what we want. If this is
some disaster recovery situation, an operator is always free to use
ceph-objectstore-tool to re-mark a PG complete (at their own peril!).
Fixes: http://tracker.ceph.com/issues/38655
Signed-off-by: Sage Weil <sage@redhat.com>
Bluestore caused grep crash with "grep: memory exhausted" due to
size of "block" storage.
Fixes: http://tracker.ceph.com/issues/38678
Signed-off-by: David Zafman <dzafman@redhat.com>
Simple messenger is on it's way out and it doesn't work with msgr2.
Fixes: http://tracker.ceph.com/issues/38676
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
It's now "1/2 unfound":
1/2 objects unfound (50.000%)
..presumably due to the rbd pool init creating the rbd_directory.
Signed-off-by: Sage Weil <sage@redhat.com>
- no need for the default pool size
- no initial osds or it will collide with setup_osds later
- no need for rbd pool at all
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/26794/head:
mon/MgrMonitor: only try to update always_on_modules if >= NAUTILUS
qa/standalone/mon/msgr-v2-transition: add some tests for enabling msgr v2
mon/MonmapMonitor: add 'ceph mon set-addrs <name> <addrvec>' command
Revert "mon/MonClient: disable ms_bind_msgr2 if NAUTILUS feature not set"
mon/OSDMonitor: use legacy_equals to compare osd addrs
msg/msg_types: make legacy_equals() symmetrical
mon/MDSMonitor: stop using get_orig_source_inst()
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
* refs/pull/26770/head:
qa/standalone/osd/osd-force-create-pg: create more pgs
qa/standalone: make sure an osd is running before create_rbd_pool
Reviewed-by: Mykola Golub <mgolub@suse.com>
We've disabled the "clean" shutdown in ceph-mgr due to
https://tracker.ceph.com/issues/38621
Until then, no valgrind leak checks!
Signed-off-by: Sage Weil <sage@redhat.com>
We rename ceph_test_rados_api_tier to add _pp, so the mimic version doesn't
work. And in any case, at this stage the client host has master installed.
Signed-off-by: Sage Weil <sage@redhat.com>
'rbd pool init' now does IO. Drop the pool, or change the pool size to 1.
Fixes: http://tracker.ceph.com/issues/38585
Signed-off-by: Sage Weil <sage@redhat.com>
Add a new 'ceph orchestrator nfs update' command that will take the
NFS clustername and a new count as arguments. That will get translated
to a StatelessServiceSpec and passed to update_stateless_service.
Also, add the necessary stubs to the test_orchestrator and the CLI
QA test.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
For large clusters, we use device classes to isolate storage pools.
The existing 'osd df' output turns out to be too nosiy, say, if
you care about only single storage pool with osds possibly spanning over
all hosts.
With this change you are now being able to do 'osd df' by class (or by pool,
if you simply use classes to separate different pools), or by a specified
crush bucket name you are currently interested in, which is much more
convenient.
Some examples:
```
$ bin/ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 0.05878 - 60 GiB 6.4 GiB 23 MiB 0 B 6 GiB 54 GiB 10.60 1.00 - root default
-3 0.02939 - 30 GiB 3.2 GiB 12 MiB 0 B 3 GiB 27 GiB 10.60 1.00 - host ceph11
3 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 56 up osd.3
4 bbb 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 58 up osd.4
5 ccc 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 60 up osd.5
-5 0.02939 - 30 GiB 3.2 GiB 12 MiB 0 B 3 GiB 27 GiB 10.60 1.00 - host ceph12
0 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 50 up osd.0
1 bbb 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 61 up osd.1
2 ccc 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 51 up osd.2
TOTAL 60 GiB 6.4 GiB 23 MiB 0 B 6 GiB 54 GiB 10.60
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
$ bin/ceph osd df tree class aaa
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 0.05878 - 20 GiB 2.1 GiB 7.8 MiB 0 B 2 GiB 18 GiB 10.60 1.00 - root default
-3 0.02939 - 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 - host ceph11
3 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 56 up osd.3
-5 0.02939 - 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 - host ceph12
0 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 50 up osd.0
TOTAL 20 GiB 2.1 GiB 7.8 MiB 0 B 2 GiB 18 GiB 10.60
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
$ bin/ceph osd df tree name ceph11
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-3 0.02939 - 30 GiB 3.2 GiB 12 MiB 0 B 3 GiB 27 GiB 10.60 1.00 - host ceph11
3 aaa 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 56 up osd.3
4 bbb 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 58 up osd.4
5 ccc 0.00980 1.00000 10 GiB 1.1 GiB 3.9 MiB 0 B 1 GiB 9.0 GiB 10.60 1.00 60 up osd.5
TOTAL 30 GiB 3.2 GiB 12 MiB 0 B 3 GiB 27 GiB 10.60
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
```
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Instead of generating three tests, each with bluestore-bitmap.yaml, it
generates four tests: one consisting of just bluestore-bitmap.yaml and
the other three without any trace of bluestore. This was introduced in
commit 711df71790 ("qa: objectstore snippets for krbd").
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>