Commit Graph

2579 Commits

Author SHA1 Message Date
Sage Weil
435777dbff qa/suites/upgrade/jewel-x/parallel: thrash layout
We can't kill and restart osds because that will interfere with
the upgrade process.  We can, however, thrash the layout by
tweaking osd weights and so on.  This will exercise osd recovery
paths during the upgrade that aren't normally exercised (outside
of stress-split..which doesn't upgrade individual osds while they
are non-clean).

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-09 22:07:48 -04:00
Yuri Weinstein
c1b87c71e3 Merge pull request #16892 from xiexingguo/wip-clean-pg-temp
mon/OSDMonitor: fix 'osd pg temp' unable to cleanup pg-temp

Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-09 16:34:38 -07:00
Matt Benjamin
0956b3aafd Merge pull request #16834 from mdw-at-linuxbox/policy
radosgw: usage: fix bytes_sent bug.
2017-08-09 14:24:01 -04:00
Sage Weil
1043fca076 Merge pull request #16923 from liewegas/wip-20738
qa/suites/rados/objectstore: logs
2017-08-09 12:45:29 -05:00
Sage Weil
34db3f8a08 Merge pull request #16947 from liewegas/wip-jewel-x
qa/suites/upgarde/jewel-x/point-to-point-x: disable app warnings
2017-08-09 09:56:15 -05:00
Sage Weil
bbd5fe354c qa/suites/upgarde/jewel-x/point-to-point-x: disable app warnings
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-09 09:18:54 -04:00
Mykola Golub
6a575136a7 qa/workunits/rbd: use command line option to specify watcher asok
The previous method to get the watcher admin socket was fragile
and had started to fail after the recent changes to vstart ceph.conf.

Fixes: http://tracker.ceph.com/issues/20954
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
2017-08-09 09:03:00 +02:00
Marcus Watts
a45ab45f74 Test bytes_sent bugs.
Rearrange logic to make it easier to measure accumulation.
Instrument the boto request/response loop to count bytes in and out.
Accumulate byte counts in usage like structure.
Compare actual usage reported by ceph against local usage measured.
Report and assert if there are any short-comings.
Remove zone placement rule that was newly added at end: tests should be rerunable.

Nit: the logic to wait for "delete_obj" is not quite right.

Fixes: http://tracker.ceph.com/issues/19870
Signed-off-by: Marcus Watts <mwatts@redhat.com>
2017-08-08 21:56:01 -04:00
Sage Weil
c8d60396c7 qa/suites/rados/objectstore: logs
Hunting http://tracker.ceph.com/issues/20738

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-08 18:07:18 -04:00
Sage Weil
6127a4c294 Merge pull request #16546 from asomers/openstack_shebang2
qa: Fix shebangs on openstack scripts

Reviewed-by: Amik Kumar <amitkuma@redhat.com>
2017-08-08 15:55:38 -05:00
Patrick Donnelly
eabe662614
Merge PR #16378 into master
* refs/remotes/upstream/pull/16378/head:
	doc: remove accidental additions to release notes
	qa/cephfs: Fix race in test_volume_client
	qa/cephfs: Test filtered df
	PendingReleaseNotes: add note about df filtering
	client: Support new, filtered MStatfs
	objecter: Support new, filtered MStatfs
	mon/PGMap stats: Support new, filtered MStatfs
	messages: Add optional data pool to MStatfs

Reviewed-by: John Spray <john.spray@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-08 09:33:52 -07:00
xie xingguo
71cef3cb74 mon/OSDMonitor: fix 'osd pg temp' unable to cleanup pg-temp
This is not very elegant way but should work..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-08 17:33:35 +08:00
Mykola Golub
b4dbfcc879 rbd-ggate: tool to map images on FreeBSD via GEOM Gate
rbd-ggate spawns a process responsible for the creation of ggate
device and forwarding I/O requests between the GEOM Gate kernel
subsystem and RADOS.

On FreeBSD it provides functionality similar to rbd-nbd on Linux.

Signed-off-by: Mykola Golub <mgolub@mirantis.com>
2017-08-08 11:00:30 +02:00
Sage Weil
bf29142b08 qa/suites/upgrade/kraken-x/stress-split*: whitelist
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-07 21:36:58 -04:00
Sage Weil
2234a0ed11 qa/suites/upgrade/kraken-x/parallel: whitelist
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-07 21:36:58 -04:00
Sage Weil
973772c11d Merge pull request #16871 from liewegas/wip-20920
mon: fix commands advertised during mon cluster upgrade

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2017-08-07 14:48:58 -05:00
Sage Weil
3e7d157871 qa/suites/upgrade/jewel-x/parallel: fix POOL_APP_NOT_ENABLED disable
This code runs on the mgr.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-07 15:12:10 -04:00
Jason Dillaman
f0e351b50b Merge pull request #16642 from dillaman/wip-rbd-mirror-image-ids
rbd-mirror: simplify notifications for image assignment

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2017-08-07 14:33:50 -04:00
Sage Weil
387ad56a69 qa/clusters/fixed-[23]: 4 osds per node, not 3
Smithi have 4 nvme partitions available for use.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-07 13:36:05 -04:00
Sage Weil
b5fae9a9ca Merge pull request #16873 from liewegas/wip-4-nodes
qa/suites: change fixed-2.yaml users to get 4 openstack disks

Reviewed-by: Zack Cerza <zcerza@redhat.com>
2017-08-07 11:27:40 -05:00
Sage Weil
3ffca50824 Merge pull request #16864 from smithfarm/wip-big-openstack
qa: big: add openstack.yaml
2017-08-07 11:02:59 -05:00
Sage Weil
f683d2d374 qa/suites: change fixed-2.yaml users to get 4 openstack disks
Follow-up for 4203c4f887

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-07 11:56:33 -04:00
Sage Weil
a872c44be7 Merge pull request #16842 from liewegas/wip-more-ec-map-discon
qa/suites/rados/thrash: fix thrashing with ec vs map discon
2017-08-07 10:48:56 -05:00
Jason Dillaman
4e1b834d2d rbd-mirror: simplify resync handling within image replayer
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-08-07 10:13:33 -04:00
Nathan Cutler
8bb3d8444f qa: big: add openstack.yaml
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-08-07 12:07:36 +02:00
Sage Weil
96f3ef6fd1 Merge pull request #16837 from xiexingguo/wip-still-more-class-fixes
crush: more class fixes

Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-06 14:07:33 -05:00
Sage Weil
db47061327 Merge pull request #16838 from xiexingguo/wip-fix-purge
mon/OSDMonitor: sanity check osd before performing 'osd purge'

Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-06 14:07:15 -05:00
Sage Weil
ed2d984ad1 qa/suites/upgarde/jewel-x/parallel: more whitelisting
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-06 10:04:14 -04:00
Sage Weil
58f15d2b98 qa/suites/upgrade/jewel-x/parallel: more whitelisting
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-06 09:56:55 -04:00
Sage Weil
622e950e43 qa/suites/upgrade/*-x/parallel: whitelist more stuff
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-06 09:56:55 -04:00
Sage Weil
2d260443f0 qa/suites/upgrade/*/parallel: disable POOL_APP_NOT_ENABLED
There is some other random workload running (that creates pools)
while we upgrade and wait for healthy.  Just disable the warning
for these tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-06 09:56:55 -04:00
Sage Weil
f4c2863999 qa/suites/upgrade/jewel-x/parallel: whitelist OSD_DOWN
We restart OSDs during the upgrade.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-06 09:56:55 -04:00
Sage Weil
d9a0145f8f Merge pull request #16824 from liewegas/wip-more-scrub-time
qa/tasks/ceph: wait longer for scrub
2017-08-05 13:35:55 -05:00
Sage Weil
4203c4f887 qa/clusters/fixed-2: 4 osds per node, not 3
We need this for the thrashing with ec k=2 m=2 pools.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-05 14:34:27 -04:00
Sage Weil
6307e03c6d qa/suites/rados/thrash/workloads/cache-agent-big: m=2
...because we do the test_map_discontinuity thing.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-05 14:33:13 -04:00
Sage Weil
62482ce82c qa/tasks/ceph: debug osd setup
I've seen a couple rbd runs that seem to skip the next block :/

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-05 13:53:26 -04:00
xie xingguo
bfd1d4ef9a mon/OSDMonitor: sanity check osd before performing 'osd purge'
This will pervent OSDMonitor from crashing on purging a very large
non-existent osd id as below:

osd e11 prepare_command_osd_purge purging osd.8
    -1> 2017-08-05 18:59:44.994319 7f6076968700 10 mon.a@0(leader).osd e11 prepare_command_osd_destroy osd.8 does not exist.
     0> 2017-08-05 18:59:45.002309 7f6076968700 -1 /home/xxg/build/ceph-dev/src/osd/OSDMap.h: In function 'int OSDMap::get_state(int) const'
 thread 7f6076968700 time 2017-08-05 18:59:44.994336
/home/xxg/build/ceph-dev/src/osd/OSDMap.h: 690: FAILED assert(o < max_osd)

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 19:43:39 +08:00
xie xingguo
87952fc68d crush: automatically kill dead classes
If a class is no more referenced by any devices or crush rules,
it shall be considered as dead.

This patch makes Ceph automatically recycles those dead classes,
so user does not to explicitly call 'class rm', which is unsafe
and annoying.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 18:53:39 +08:00
xie xingguo
b863883ca7 crush: remove 'class rm' command
The current version is broken. E.g., it should only remove a class
which is never referenced by any device.

Since we now create new classes automatically, we shall automatically
recycle dead classes too. So this command is definitely unuseful.
(Actually it is weird that we keep 'class rm' without keeping the
 corresponding 'class create' command).

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 18:52:30 +08:00
xie xingguo
f1d80ff750 crush: do not automatically recycle class for 'rm-device-class'
This will prevent the current crush rule from referencing a non-existent
shadow tree and hence avoid a coredump such as below:

 0> 2017-08-05 09:54:19.943349 7f73887d6700 -1 /clove/vm/xxg/rpm/ceph/rpmbuild/BUILD/ceph-12.1.2.1/src/crush/CrushWrapper.cc: In function 'int CrushWrapper::get_rule_weight_osd_map(unsigned
 int, std::map<int, float>*)' thread 7f73887d6700 time 2017-08-05 09:54:19.941291
/clove/vm/xxg/rpm/ceph/rpmbuild/BUILD/ceph-12.1.2.1/src/crush/CrushWrapper.cc: 1631: FAILED assert(b)

 ceph version 12.1.2.1-11-gd0f812a (d0f812a3a757b319c26794f558b57770663ab324) luminous (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f7398b66ea0]
 2: (CrushWrapper::get_rule_weight_osd_map(unsigned int, std::map<int, float, std::less<int>, std::allocator<std::pair<int const, float> > >*)+0x54e) [0x7f7398daac4e]
 3: (PGMap::get_rule_avail(OSDMap const&, int) const+0x68) [0x7f73989a6428]
 4: (PGMap::get_rules_avail(OSDMap const&, std::map<int, long, std::less<int>, std::allocator<std::pair<int const, long> > >*) const+0x35c) [0x7f73989b748c]
 5: (PGMap::encode_digest(OSDMap const&, ceph::buffer::list&, unsigned long) const+0x16) [0x7f73989b7506]
 6: (DaemonServer::send_report()+0x2a4) [0x7f73989f5474]
 7: (DaemonServer::maybe_ready(int)+0x2f9) [0x7f73989f6129]
 8: (DaemonServer::ms_dispatch(Message*)+0xce) [0x7f73989ff68e]
 9: (DispatchQueue::entry()+0x792) [0x7f7398dd2a22]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f7398c1429d]
 11: (()+0x7df3) [0x7f739640cdf3]
 12: (clone()+0x6d) [0x7f73954f23ed]

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-05 18:44:59 +08:00
Patrick Donnelly
04d8ba4b04
Merge PR #16833 into master
* refs/remotes/upstream/pull/16833/head:
	qa: whitelist expected MDS_CLIENT_OLDEST_TID warn
	qa: ignore insufficient standby during failover
	qa: fix read-only whitelist
	mds: MDS_DAMAGED to MDS_DAMAGE
	doc: remove duplicate CephFS health check doc
2017-08-04 20:26:09 -07:00
Patrick Donnelly
29e5f0a450
qa: whitelist expected MDS_CLIENT_OLDEST_TID warn
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-04 20:21:43 -07:00
Patrick Donnelly
06f53e4a82
qa: ignore insufficient standby during failover
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-04 20:14:59 -07:00
Patrick Donnelly
42cd1c7122
qa: fix read-only whitelist
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-04 20:14:48 -07:00
Patrick Donnelly
8d4f3e3045
mds: MDS_DAMAGED to MDS_DAMAGE
We had both and MDS_DAMAGE looks to be the right/intended one.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-04 13:01:33 -07:00
Sage Weil
62e51661e6 Merge branch 'wip-qa-rbd-health' of git://github.com/dillaman/ceph
# Conflicts:
#	qa/tasks/ceph.py
2017-08-04 15:07:22 -04:00
Sage Weil
ffd171fd46 Merge pull request #16820 from liewegas/wip-more-whitelist
qa/suites/rados: a bit more whitelisting

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-08-04 13:44:08 -05:00
Douglas Fuller
552225f329 qa/cephfs: Fix race in test_volume_client
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
2017-08-04 14:38:50 -04:00
Sage Weil
82cf3046de qa/suites/rados/basic/tasks/rados_python: POOL_APP_NOT_ENABLED
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-04 13:39:13 -04:00
Sage Weil
d09606619f qa/tasks/ceph: wait longer for scrub
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-04 12:06:27 -04:00