Commit Graph

38 Commits

Author SHA1 Message Date
Yuri Weinstein
11c57701c6 Merge pull request #16961 from xiexingguo/wip-class-rename
crush: "osd crush class rename" support

Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-11 06:18:57 -07:00
Sage Weil
d2d9b41275 Merge pull request #16709 from dzafman/wip-standalone
qa/standalone: misc fixes
2017-08-10 21:33:43 -05:00
xie xingguo
d792e8d528 crush: "osd crush class rename" support
In 076a6abd80 I killed the 'class rename' command
and thought it was totally useless but I was wrong.

Consider the following user case:
(1) randomly choose some OSDs(e.g., from different hosts) and try to make them for private use only,
    say, by grouping them into 'pool1'
(2) ceph osd crush set-device-class pool1 'OSDs from (1)'
(3) ceph osd crush rule create-replicated rule_for_pool1 default host pool1
(4) ceph osd pool rename pool1 pool2
(5) ceph osd crush class rename pool1 pool2

From the above user case, we need to safely change a pool name without worrying
any risk of data migration. That is why the 'osd crush class rename' command
is still needed here.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-11 08:32:39 +08:00
David Zafman
e24ac51a82 qa: Fix broken test_activate_osd() due to missing space
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
ae2c5331fb qa: Fix races with waiting for scrubs
The trigger_scrub sets the last_scrub_stamp backwards to
force a scheduled scrub.  In a small window this stamp could get propagated
to the mgr.  A test failure occurred because wait_for_scrub() was confused
by seeing a backward moving date.

The most critical change is having wait_for_scrub() make sure that the
date advances past the previous in value.

A test failed because the random backoff kept delayed triggered scrub, so
set osd_scrub_backoff throughout.

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
dddda523d1 qa: Testing of ceph-helpers.sh, teardown on fail to dump logs, save cores
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
1fe6cb0f02 osd: Avoid confusion over legacy snaps when head_exists corrupt
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
229de6b71d qa: Add support for core dumps
Save core dumps when running tests locally
Dump logs to output whenever cores seen

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:04 -07:00
David Zafman
4db5124e1a qa: For FreeBSD skip osd-dup.sh because there is no bluestore
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
61bfd236ad qa: Raise mon-data-avail-warn to pass tests with less space
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
574b3cd3d4 qa: Add common generalized inject_eio() to ceph-helpers.sh
Retry for a while to allow pool to appear

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
3988ebab43 qa: osd-scrub-repair.sh handle older versions of jq
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
2a679a36de qa: Add support for specifying sub-tests with run-standalone.sh
Fix test-ceph-helpers.sh to pass additional arguments on

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
69413618a0 qa: ceph-helpers.sh fixes
Add missing teardown to cleanup test directory
Fix pgid due to elimination of initial default pool
Testing could never fail because run_tests return ignored

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
xie xingguo
87952fc68d crush: automatically kill dead classes
If a class is no more referenced by any devices or crush rules,
it shall be considered as dead.

This patch makes Ceph automatically recycles those dead classes,
so user does not to explicitly call 'class rm', which is unsafe
and annoying.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 18:53:39 +08:00
xie xingguo
b863883ca7 crush: remove 'class rm' command
The current version is broken. E.g., it should only remove a class
which is never referenced by any device.

Since we now create new classes automatically, we shall automatically
recycle dead classes too. So this command is definitely unuseful.
(Actually it is weird that we keep 'class rm' without keeping the
 corresponding 'class create' command).

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 18:52:30 +08:00
xie xingguo
f1d80ff750 crush: do not automatically recycle class for 'rm-device-class'
This will prevent the current crush rule from referencing a non-existent
shadow tree and hence avoid a coredump such as below:

 0> 2017-08-05 09:54:19.943349 7f73887d6700 -1 /clove/vm/xxg/rpm/ceph/rpmbuild/BUILD/ceph-12.1.2.1/src/crush/CrushWrapper.cc: In function 'int CrushWrapper::get_rule_weight_osd_map(unsigned
 int, std::map<int, float>*)' thread 7f73887d6700 time 2017-08-05 09:54:19.941291
/clove/vm/xxg/rpm/ceph/rpmbuild/BUILD/ceph-12.1.2.1/src/crush/CrushWrapper.cc: 1631: FAILED assert(b)

 ceph version 12.1.2.1-11-gd0f812a (d0f812a3a757b319c26794f558b57770663ab324) luminous (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f7398b66ea0]
 2: (CrushWrapper::get_rule_weight_osd_map(unsigned int, std::map<int, float, std::less<int>, std::allocator<std::pair<int const, float> > >*)+0x54e) [0x7f7398daac4e]
 3: (PGMap::get_rule_avail(OSDMap const&, int) const+0x68) [0x7f73989a6428]
 4: (PGMap::get_rules_avail(OSDMap const&, std::map<int, long, std::less<int>, std::allocator<std::pair<int const, long> > >*) const+0x35c) [0x7f73989b748c]
 5: (PGMap::encode_digest(OSDMap const&, ceph::buffer::list&, unsigned long) const+0x16) [0x7f73989b7506]
 6: (DaemonServer::send_report()+0x2a4) [0x7f73989f5474]
 7: (DaemonServer::maybe_ready(int)+0x2f9) [0x7f73989f6129]
 8: (DaemonServer::ms_dispatch(Message*)+0xce) [0x7f73989ff68e]
 9: (DispatchQueue::entry()+0x792) [0x7f7398dd2a22]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f7398c1429d]
 11: (()+0x7df3) [0x7f739640cdf3]
 12: (clone()+0x6d) [0x7f73954f23ed]

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 18:44:59 +08:00
David Zafman
99ad4bbd91 qa: Add create_pool() which sleeps 1 second like python variant
wait_for_clean() can miss the new pool if it races with pool create.

Fixes: http://tracker.ceph.com/issues/20465

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
David Zafman
b20dfc2864 qa: Add special test_failure.sh script (not run by default)
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
David Zafman
8c768050a5 qa: run-standalone.sh improvements
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
David Zafman
4314cdd666 qa: Dump logs after daemons are killed to make sure everything is flushed
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
xie xingguo
734b5f2c60 test/osd-fast-mark-down: enable 'osd-class-update-on-start' by default
116cf759c8
will now hide all shadow trees(roots), so this is not applicable anymore
(actually it is misleading).

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-03 17:26:26 -04:00
Sage Weil
41bcf2fee5 Merge pull request #16281 from badone/wip-PG-cluster-log-audit
osd: Log audit

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-07-27 16:25:30 -05:00
Sage Weil
e469a8044c qa/standalone/crush/crush-classes: fix test
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:25:25 -04:00
Sage Weil
380de3395f qa/standalone/README
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:24:52 -04:00
xie xingguo
076a6abd80 crush: kill 'class rename'
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:40:50 +08:00
xie xingguo
a27fd9d25c crush: kill "class create" command
The device class is now self and automatically managed.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:40:17 +08:00
xie xingguo
edd8930346 crush: allow "crush class rm" to automatically recycle shadow tree(s)
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:39:41 +08:00
xie xingguo
9d908c14f6 crush: rm-device-class support
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:39:08 +08:00
xie xingguo
32fb548797 crush: guard set-device-class
If a device has already been bounded to a class,
do not allow to change its class silently.
Require user call rm-device-class first.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:34:08 +08:00
xie xingguo
e4e83a0dd7 crush: fix class_is_in_use()
A class can be considered as in-use only if it is referenced by
any of the existing crush rules.

The patch also makes the output more human readable. For example:

./bin/ceph osd crush rule create-replicated myrule default host ssd
./bin/ceph osd crush class rm ssd
Error EBUSY: class 'ssd' still referenced by crush_rule 'myrule'

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:31:39 +08:00
xie xingguo
f3a3180cca crush: rebuild shadow tree on "crush create-or-move/move"
This patch solves the problem below:

./bin/ceph osd crush move osd.0 root=foo rack=foo-rack host=foo-host
moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

 ./bin/ceph osd crush rule create-replicated foo-rule foo host ssd
Error EINVAL: root foo has no devices with class ssd

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:30:59 +08:00
xie xingguo
10bf2a633f crush: fix "crush create-or-move/move" would drop osd's class
Was:
     ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -1       3.00000 root default
    -2       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     0   ssd 1.00000         osd.0                                         up  1.00000 1.00000
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    ./bin/ceph osd crush move osd.0 root=foo rack=foo-rack  host=foo-host
    moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

     ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -7       1.00000 root foo
    -6       1.00000     rack foo-rack
    -5       1.00000         host foo-host
     0       1.00000             osd.0                                     up  1.00000 1.00000
    -1       2.00000 root default
    -2       2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    Now:
    ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -1       3.00000 root default
    -2       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     0   ssd 1.00000         osd.0                                         up  1.00000 1.00000
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    ./bin/ceph osd crush move osd.0 root=foo rack=foo-rack  host=foo-host
    moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

    ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -7       1.00000 root foo
    -6       1.00000     rack foo-rack
    -5       1.00000         host foo-host
     0   ssd 1.00000             osd.0                                     up  1.00000 1.00000
    -1       2.00000 root default
    -2       2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:30:26 +08:00
Brad Hubbard
f8acc53d82 osd: Log audit
Review current log messages for consistency, accuracy and necessesity as
part of usability initiative. First in a series.

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2017-07-26 17:34:28 +10:00
Sage Weil
766229b034 qa/standalone/scrub: separate scrub/repair tests from rest of osd/
They are slow.  Run them separately.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
cabad62242 qa/standalone/ceph-helpers: factor rbd pool create out of run_mon
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
b12bebe432 qa/standalone/mon/osd-pool-create: stop testing create pool output
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:49 -04:00
Sage Weil
71ea171604 qa: move ceph-helpers and misc src/test/*.sh tests to qa/standalone
- stop running via make check
- add teuthology yamls to run them
- disable ceph_objecstore_tool.py for now (too slow for make check, and
we can't use vstart in teuthology via a package install)
- drop cephtool tests since those are already covered by other teuthology
tests
- leave a handful of (fast!) ceph-helpers tests for make check for minimal
integration tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:49 -04:00