Commit Graph

2624 Commits

Author SHA1 Message Date
Sage Weil
1ae9ff173b qa/suites/rados/upgrade: ignore FS_DEGRADED from mds restart
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-04 09:34:31 -04:00
Sage Weil
27a685f626 qa/suites/rados/monthrash: ignore MGR_DOWN
Heavily thrashing mons + mgr reconnect backoff may make us fail
to process the beacon.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-04 09:34:15 -04:00
Nathan Cutler
d919987caa tests: rbd: reproducer for rbd-on-EC issue
This introduces a new "rbd/singleton-bluestore" suite because creating an rbd
on an EC-backed datapool will fail on filestore.

References: http://tracker.ceph.com/issues/20295
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-08-03 22:54:17 -04:00
Sage Weil
47480d8a06 qa/suites/rados/thrash-erasure-code-big: add k=4 m=2
Get better coverage for larger codes.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 22:50:16 -04:00
Patrick Donnelly
d89af4a3e8
Merge PR #16802 into master
* refs/remotes/upstream/pull/16802/head:
	qa: update wait_for_health for new health json syntax

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-03 16:20:20 -07:00
John Spray
0613d411aa qa: update wait_for_health for new health json syntax
Fixes: http://tracker.ceph.com/issues/20890
Signed-off-by: John Spray <john.spray@redhat.com>
2017-08-03 23:46:41 +01:00
Kefu Chai
007095b7ae qa/workunits/mon/crush_ops.sh: remove existing dev class before setting it
we cannot overwrite existing dev class, and "osd_class_update_on_start"
is true by default (see 0c885d6). so we should remove all device classes before
setting them.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-03 17:26:26 -04:00
xie xingguo
734b5f2c60 test/osd-fast-mark-down: enable 'osd-class-update-on-start' by default
116cf759c8
will now hide all shadow trees(roots), so this is not applicable anymore
(actually it is misleading).

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-08-03 17:26:26 -04:00
Patrick Donnelly
9d348ad8c9
qa: add health whitelist for all fs sub-suites
Fixes: http://tracker.ceph.com/issues/20892

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-03 14:01:28 -07:00
Patrick Donnelly
60fa9714d4
Merge PR #16768 into master
* refs/remotes/upstream/pull/16768/head:
	qa: fix log whitelist string

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-08-03 13:55:42 -07:00
Patrick Donnelly
66756c4f65
Merge PR #16292 into master
* refs/remotes/upstream/pull/16292/head:
	qa: use new hex rep of inode
	qa: fix whitelist error message
	mds: refine "Scrub error" cluster log message
	mds: polish clog messages
	doc: developer logging guidance

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-03 13:55:21 -07:00
Sage Weil
342607f4d5 Merge pull request #16749 from tchaikov/wip-restful-delete-key
mgr: handle "module.set_config(.., None)" correctly 

Reviewed-by: John Spray <john.spray@redhat.com>
2017-08-03 15:53:27 -05:00
Yuri Weinstein
09fd18d031 Merge pull request #16760 from cbodley/wip-rgw-disable-lifecycle-s3tests
qa/rgw: disable lifecycle tests because of expiration failures

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
2017-08-03 13:25:39 -07:00
Josh Durgin
b172642124 Merge pull request #16789 from liewegas/wip-ec-m-2
qa: avoid map-gap tests for k=2 m=1

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-08-03 11:20:13 -07:00
Sage Weil
b8627f897a Merge pull request #16795 from liewegas/wip-mgr-whitelist
qa/suites/rados/mgr/tasks/failover: whitelist

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-08-03 13:14:20 -05:00
Sage Weil
ef21c9d7df qa/suites/rados/thrash-erasure-code: do not test map gap with m=1
We test EC profiles with m=1 here, and mapgap can lead to incomplete pgs
because it takes an osd down and waits for healthy.

Fixes: http://tracker.ceph.com/issues/20844
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 14:13:02 -04:00
Douglas Fuller
b9d11af92b qa/cephfs: Test filtered df
Add a test for filtered df for file systems with single data pools.

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
2017-08-03 14:11:47 -04:00
Sage Weil
f74d71f708 qa/suites/rados/thrash-erasure-coe-big/clsuter: 12 osds on 3 nodes not 4
smithi have 4 nvme partitions available, not 3.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 14:11:43 -04:00
Josh Durgin
ae48c75065 Merge pull request #16797 from jdurgin/wip-upgrade-jewel-x
qa: timeout when waiting for mgr to be available in healthy()

Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-03 11:11:41 -07:00
Sage Weil
63221e21f5 qa/suites/rados/thrash-erasure-code-big: add k=4 m=2
Get better coverage for larger codes.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 14:10:36 -04:00
Sage Weil
e994b03335 qa/suites/rados/monthrash/worklaods/rados_api_tests: whitelist SMALLER_PGP_NUM
The rados/test.sh fiddles with pg_num.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 13:31:39 -04:00
Sage Weil
7c350180b1 qa/suites/rados/mgr/tasks/failover: whitelist
remote/smithi025/log/ceph.log.gz:2017-08-03 07:02:15.049074 mon.b mon.0 172.21.15.25:6789/0 197 : cluster [INF] Manager daemon x is unresponsive, replacing it with standby daemon y
remote/smithi025/log/ceph.log.gz:2017-08-03 07:03:10.078032 mon.b mon.0 172.21.15.25:6789/0 226 : cluster [WRN] Manager daemon x is unresponsive.  No standby daemons available.

x and y may be swapped, so whitelist the rest of the string.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 12:40:01 -04:00
Jason Dillaman
c2b451e8cb qa: fix RBD-related POOL_APP_NOT_ENABLED health warnings
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-08-03 09:50:41 -04:00
Patrick Donnelly
8d33cbbf5c
qa: use new hex rep of inode
Resolves a failure from QA:

    2017-08-02T19:23:27.489 INFO:tasks.cephfs_test_runner:======================================================================
    2017-08-02T19:23:27.489 INFO:tasks.cephfs_test_runner:FAIL: test_oversize (tasks.cephfs.test_fragment.TestFragmentation)
    2017-08-02T19:23:27.489 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
    2017-08-02T19:23:27.490 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
    2017-08-02T19:23:27.490 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20170802/qa/tasks/cephfs/test_fragment.py", line 71, in test_oversize
    2017-08-02T19:23:27.490 INFO:tasks.cephfs_test_runner:    self.assertEqual(frags[0]['dirfrag'], "10000000000.0*")
    2017-08-02T19:23:27.490 INFO:tasks.cephfs_test_runner:AssertionError: u'0x10000000000.0*' != '10000000000.0*'
    2017-08-02T19:23:27.490 INFO:tasks.cephfs_test_runner:

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-02 21:39:48 -07:00
Patrick Donnelly
d4ed085238
Merge PR #16713 into master
* refs/remotes/upstream/pull/16713/head:
	qa: ignore failed MDS message during upgrade
2017-08-02 19:41:42 -07:00
Patrick Donnelly
6cad5be68c
Merge PR #16714 into master
* refs/remotes/upstream/pull/16714/head:
	qa: test export_pin is correct in dumped subtree
	mds: print export_pin for dumped subtree

Reviewed-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: huanwen ren <ren.huanwen@zte.com.cn>
2017-08-02 18:41:12 -07:00
Patrick Donnelly
7f04d88af8
qa: fix whitelist error message
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-02 16:52:30 -07:00
Patrick Donnelly
8e975a6347
qa: fix log whitelist string
Fixes: http://tracker.ceph.com/issues/20889

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-08-02 16:32:19 -07:00
Sage Weil
5085dc1164 qa/suites/powercycle: whitelist health for thrashing
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-02 11:06:43 -04:00
Casey Bodley
0debf4dc6e qa/rgw: disable lifecycle tests because of expiration failures
lifecycle expiration tests are too reliant on timing, and have been
failing consistently for a long time

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-08-02 11:06:35 -04:00
Kefu Chai
da1a60ced1 qa: refactor suites/rados/rest/mgr-restful
- use "ceph restful restart" to restart the restful API server instead
of restarting the ceph-mgr
- test "ceph restful delete-key"
- test "ceph restful list-keys"

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-02 18:20:56 +08:00
Josh Durgin
63693779fc qa: timeout when waiting for mgr to be available
Otherwise during upgrades we wait forever.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2017-08-02 02:18:28 -04:00
Kefu Chai
1ff1f836da Merge pull request #16722 from tchaikov/wip-qa-fixes
qa/suites: escape the parenthesis of the whitelist text

Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-02 13:00:01 +08:00
Kefu Chai
a70be4e00c qa/suites: more whitelisting
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-02 10:00:57 +08:00
Sage Weil
c955bf528f qa/suites/rados/singleton-nomsgr/all/multi-backfill-reject: sleep longer
I saw a failure where the 30% backfill probability was enough that we
just didn't manage to backfill all of the pgs during the 5 minute recovery
timeout during ceph.py shutdown.  Build in some additional time for the
test to recover.

http://pulpito.ceph.com/sage-2017-08-01_15:32:10-rados-wip-sage-testing-distro-basic-smithi/1469184

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-01 15:50:47 -04:00
Kefu Chai
69c6402bbd Merge pull request #16727 from jcsp/wip-doc-config-hel
doc/qa: cover `config help` command

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-08-01 23:38:28 +08:00
Jason Dillaman
2589f57ecd Merge pull request #16656 from idryomov/wip-qa-newer-fio
qa/tasks/rbd_fio: bump default fio version to 2.21

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-08-01 10:14:46 -04:00
Kefu Chai
d12c51ca91 qa/suites: escape the parenthesis of the whitelist text
so we can avoid the warnings like

grep: Unmatched ( or \(

because we pass the whitelisted string to `egrep -v "$1"` directly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-01 21:54:44 +08:00
Kefu Chai
d67d6c57ae qa/workunits/ceph-disk: fix the path to ceph-helpers-root.sh
partially reverts 841f3bd

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-01 21:54:44 +08:00
John Spray
ac2b9d63ca qa: include config help in admin socket test
Signed-off-by: John Spray <john.spray@redhat.com>
2017-08-01 13:38:40 +01:00
Patrick Donnelly
8db2c43e79
qa: test export_pin is correct in dumped subtree
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-07-31 15:33:49 -07:00
Patrick Donnelly
5e5ff5c086
qa: ignore failed MDS message during upgrade
The cluster is expected to become degraded during reboot.

Fixes: http://tracker.ceph.com/issues/20731
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-07-31 14:45:07 -07:00
Patrick Donnelly
019f20ff98
Merge PR #16640 into master
* refs/remotes/upstream/pull/16640/head:
	qa: fix wait for wrong health message

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-07-28 09:55:49 -07:00
Patrick Donnelly
6fc2ee383f
Merge PR #16413 into master
* refs/remotes/upstream/pull/16413/head:
	qa/cephfs: lsof if umount fails

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-07-28 09:55:23 -07:00
Sage Weil
c3c2b31c87 Merge pull request #16568 from liewegas/wip-application-warn
qa,doc: document and fix tests for pool application warnings
2017-07-28 09:00:46 -05:00
Kefu Chai
75e361433d qa/run-standalone.sh: fix the find option to be compatible with GNU find
also re-indent to be consistent with other part of this script

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-07-28 14:22:02 +08:00
Kefu Chai
2a128f4829 Merge pull request #16599 from liewegas/wip-standalone-fixes
qa/workunits: adjust path to ceph-helpers.sh

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-07-28 13:18:47 +08:00
Patrick Donnelly
fb039383e9
Merge PR #16435 into master
* refs/remotes/upstream/pull/16435/head:
	qa: whitelist trim error during powercycle tests

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-07-27 17:54:59 -07:00
Patrick Donnelly
ced01a2335
qa: fix wait for wrong health message
Fixes: http://tracker.ceph.com/issues/20805

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-07-27 14:40:05 -07:00
Sage Weil
41bcf2fee5 Merge pull request #16281 from badone/wip-PG-cluster-log-audit
osd: Log audit

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-07-27 16:25:30 -05:00
Sage Weil
862392fbf9 Merge pull request #16514 from liewegas/wip-20744
qa/tasks/ceph: wait for mgr to activate and pg stats to flush in health()

Reviewed-by: John Spray <john.spray@redhat.com>
2017-07-27 16:24:59 -05:00
Patrick Donnelly
d7f5af40a2
qa: whitelist trim error during powercycle tests
Fixes: http://tracker.ceph.com/issues/20566

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-07-27 13:24:21 -07:00
Sage Weil
541de391e1 Merge pull request #16572 from liewegas/wip-pidfile
test: add separate ceph-helpers-based smoke test

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-07-27 12:32:36 -05:00
Ilya Dryomov
bd6e3e5f1f qa/tasks/rbd_fio: bump default fio version to 2.21
I'm seeing sporadic single thread deadlocks on fio stat_mutex during krbd
thrash runs:

  (gdb) info threads
    Id   Target Id         Frame
  * 1    Thread 0x7f89ee730740 (LWP 15604) 0x00007f89ed9f41bd in __lll_lock_wait () from /lib64/libpthread.so.0
  (gdb) bt
  #0  0x00007f89ed9f41bd in __lll_lock_wait () from /lib64/libpthread.so.0
  #1  0x00007f89ed9f17b2 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  #2  0x00000000004429b9 in fio_mutex_down (mutex=0x7f89ee72d000) at mutex.c:170
  #3  0x0000000000459704 in thread_main (data=<optimized out>) at backend.c:1639
  #4  0x000000000045b013 in fork_main (offset=0, shmid=<optimized out>, sk_out=0x0) at backend.c:1778
  #5  run_threads (sk_out=sk_out@entry=0x0) at backend.c:2195
  #6  0x000000000045b47f in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
  #7  0x000000000040cb0c in main (argc=2, argv=0x7fffad3e3888, envp=<optimized out>) at fio.c:63
  (gdb) up 2
  170                     pthread_cond_wait(&mutex->cond, &mutex->lock);
  (gdb) p mutex.lock.__data.__owner
  $1 = 15604

Upgrading to 2.21 seems to make these go away.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-27 18:57:43 +02:00
Sage Weil
c1aef68f02 Merge pull request #16569 from liewegas/wip-set-not-put
mon: 'config-key put' -> 'config-key set'

Reviewed-by: Joao Eduardo Luis <joao@suse.de>
2017-07-27 11:34:37 -05:00
Sage Weil
e469a8044c qa/standalone/crush/crush-classes: fix test
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:25:25 -04:00
Sage Weil
380de3395f qa/standalone/README
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:24:52 -04:00
Sage Weil
0b5036f072 qa/suites/rados/upgrade: fix upgrade wait for healthy
There is no mgr, so we can't call ceph.healthy.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:10:34 -04:00
Sage Weil
a40d94b163 qa/tasks/ceph: wait for pg stats to flush in healthy check
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:10:27 -04:00
Sage Weil
80978dea8a qa/tasks/ceph_manager: wait_for_all_up -> wait_for_all_osds_up
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:10:26 -04:00
Sage Weil
7648894e55 qa/tasks/ceph_manager: expose flush_all_pg_stats
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 12:10:26 -04:00
Sage Weil
c7430c56cd Merge pull request #16388 from xiexingguo/wip-class-misc-fixes
crush, mon: simplify device class manipulation commands

Reviewed-by: Sage Weil <sage@redhat.com>
2017-07-27 11:04:33 -05:00
Sage Weil
203c68ad55 Merge pull request #16575 from liewegas/wip-20693
qa/suites/rados: at-end: ignore PG_{AVAILABILITY,DEGRADED}

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-07-27 08:31:53 -05:00
Sage Weil
e398fd4ee4 qa/suites: more whitelisting
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-27 09:31:24 -04:00
Jason Dillaman
42fabc2e80 Merge pull request #16398 from dillaman/wip-20655
rbd-mirror: guard the deletion of non-primary images

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2017-07-27 08:27:39 -04:00
David Zafman
e92c953d7b Merge pull request #16610 from dzafman/wip-fix-reg11184
test: reg11184 might not always find pg 2.0 prior to import

Reviewed-by: Sage Weil <sage@redhat.com>
2017-07-26 11:42:15 -07:00
Sage Weil
5534912daa qa/workunits/cephtool/test.sh: add some config-key tests
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-26 14:13:22 -04:00
Sage Weil
4eb1a518e3 mon: 'config-key put' -> 'config-key set'
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-26 14:10:08 -04:00
Sage Weil
ee06dc6996 Merge pull request #16530 from xiexingguo/wip-fix-pgtemp
mon: prime pg_temp and a few health warning fixes

Reviewed-by: Sage Weil <sage@redhat.com>
2017-07-26 13:09:33 -05:00
Sage Weil
59a3a4a40e Merge pull request #16559 from hjwsm1989/dump-stuck
qa/tasks/dump_stuck: fix dump_stuck test bug

Reviewed-by: Sage Weil <sage@redhat.com>
2017-07-26 11:59:21 -05:00
David Zafman
7c43840399 test: reg11184 might not always find pg 2.0 prior to import
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-07-26 09:46:15 -07:00
Sage Weil
56ffd7a727 Merge pull request #16571 from ceph/wip-cd-bluestore-2
qa/tasks/ceph-deploy: Fix bluestore options for ceph-deploy

Reviewed-by: Tamil Muthamizhan <tmuthami@redhat.com>
2017-07-26 11:43:50 -05:00
xie xingguo
076a6abd80 crush: kill 'class rename'
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:40:50 +08:00
xie xingguo
a27fd9d25c crush: kill "class create" command
The device class is now self and automatically managed.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:40:17 +08:00
xie xingguo
edd8930346 crush: allow "crush class rm" to automatically recycle shadow tree(s)
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:39:41 +08:00
xie xingguo
9d908c14f6 crush: rm-device-class support
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:39:08 +08:00
xie xingguo
32fb548797 crush: guard set-device-class
If a device has already been bounded to a class,
do not allow to change its class silently.
Require user call rm-device-class first.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:34:08 +08:00
xie xingguo
e4e83a0dd7 crush: fix class_is_in_use()
A class can be considered as in-use only if it is referenced by
any of the existing crush rules.

The patch also makes the output more human readable. For example:

./bin/ceph osd crush rule create-replicated myrule default host ssd
./bin/ceph osd crush class rm ssd
Error EBUSY: class 'ssd' still referenced by crush_rule 'myrule'

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:31:39 +08:00
xie xingguo
f3a3180cca crush: rebuild shadow tree on "crush create-or-move/move"
This patch solves the problem below:

./bin/ceph osd crush move osd.0 root=foo rack=foo-rack host=foo-host
moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

 ./bin/ceph osd crush rule create-replicated foo-rule foo host ssd
Error EINVAL: root foo has no devices with class ssd

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:30:59 +08:00
xie xingguo
10bf2a633f crush: fix "crush create-or-move/move" would drop osd's class
Was:
     ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -1       3.00000 root default
    -2       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     0   ssd 1.00000         osd.0                                         up  1.00000 1.00000
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    ./bin/ceph osd crush move osd.0 root=foo rack=foo-rack  host=foo-host
    moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

     ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -7       1.00000 root foo
    -6       1.00000     rack foo-rack
    -5       1.00000         host foo-host
     0       1.00000             osd.0                                     up  1.00000 1.00000
    -1       2.00000 root default
    -2       2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    Now:
    ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -1       3.00000 root default
    -2       3.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     0   ssd 1.00000         osd.0                                         up  1.00000 1.00000
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

    ./bin/ceph osd crush move osd.0 root=foo rack=foo-rack  host=foo-host
    moved item id 0 name 'osd.0' to location {host=foo-host,rack=foo-rack,root=foo} in crush map

    ./bin/ceph osd tree
    ID CLASS WEIGHT  TYPE NAME                                        UP/DOWN REWEIGHT PRI-AFF
    -7       1.00000 root foo
    -6       1.00000     rack foo-rack
    -5       1.00000         host foo-host
     0   ssd 1.00000             osd.0                                     up  1.00000 1.00000
    -1       2.00000 root default
    -2       2.00000     host gitbuilder-ceph-rpm-centos7-amd64-basic
     1   ssd 1.00000         osd.1                                         up  1.00000 1.00000
     2   ssd 1.00000         osd.2                                         up  1.00000 1.00000

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 22:30:26 +08:00
Sage Weil
742005bd75 Merge pull request #16579 from liewegas/wip-fix-nonregression
qa/suites/rados/singleton/all/erasure-code-nonregression: fix typo

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Amik Kumar <amitkuma@redhat.com>
2017-07-26 08:46:43 -05:00
Sage Weil
c1bdd36d8f qa/workunits/erasure-code/encode-decode-nonregression: do not require git checkout
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-26 09:35:46 -04:00
Sage Weil
841f3bdf92 qa/workunits: adjust path to ceph-helpers.sh
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-26 08:08:01 -04:00
Willem Jan Withagen
ae88edd25d qa: make run-standalone work on FreeBSD
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2017-07-26 12:01:37 +02:00
Kefu Chai
d85a7889fd Merge pull request #16446 from xiexingguo/wip-destroyed
mon: show destroyed status in tree view; do not auto-out destroyed osds

Reviewed-by: Sage Weil <sage@redhat.com>
2017-07-26 17:15:53 +08:00
Brad Hubbard
f8acc53d82 osd: Log audit
Review current log messages for consistency, accuracy and necessesity as
part of usability initiative. First in a series.

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2017-07-26 17:34:28 +10:00
xie xingguo
96eb0a9887 mon/OSDMonitor: apply new 'destroyed' status to 'osd tree' filter
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-26 15:13:32 +08:00
Sage Weil
326019a466 qa/suites/rados: whitelist various tests
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-25 22:29:07 -04:00
Sage Weil
2ef8614f67 qa/suites/rados/singleton/all/erasure-code-nonregression: fix typo
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-25 22:26:43 -04:00
Sage Weil
d2c31a8114 Merge pull request #16469 from xiexingguo/wip-fix-test
test: s/osd_objectstore_type/osd_objectstore
2017-07-25 21:04:22 -05:00
Sage Weil
3683cdf496 qa/suites/rados: at-end: ignore PG_{AVAILABILITY,DEGRADED}
With the peering deletes change, setting luminous sets the osdmap flag
which triggers a new peering interval.  That can lead to health warnings
about PG_AVAILABILITY or PG_DEGRADED.  Ignore those!

Fixes: http://tracker.ceph.com/issues/20693
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-25 18:29:07 -04:00
Vasu Kulkarni
45c6a9acc4 Add both filestore and bluestore options for tests
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-07-25 15:16:37 -07:00
Vasu Kulkarni
bdf6851fb0 Add ceph-deploy overrides options
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-07-25 15:10:38 -07:00
Vasu Kulkarni
25c89804e4 bluestore config options for tests
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-07-25 12:26:11 -07:00
Vasu Kulkarni
05cafd5011 Add bluestore overrides for ceph-deploy
ceph-deploy doesn't use ceph overrides, Add same overrides for ceph-deploy

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-07-25 12:26:11 -07:00
Vasu Kulkarni
12a1ceba6e Move ceph-deploy config options into its own folder
The old structure of link at top folder is pretty much outdated, the test
config option needs to be specific to cluster yaml.

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-07-25 12:26:11 -07:00
Vasu Kulkarni
2fa0fae72f Add option to specify bluestore/filestore options
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-07-25 12:26:03 -07:00
Sage Weil
a264725b62 Merge pull request #16541 from liewegas/wip-20761
qa/workunits/cephtool/test.sh: disable 'fs status' until bug is fixed
2017-07-25 14:03:38 -05:00
Jason Dillaman
76fd882464 qa/workunits/rbd: rbd-mirror now treats no primary image as unknown state
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-07-25 07:17:15 -04:00
huangjun
daf8efee32 qa/tasks/dump_stuck: fix dump_stuck test bug
Test cluster with 2 osds, stop osd.0, if osd.1
  report the pg stats during pg peering, mon will
  record pg state to 'peering',then stop osd.1,
  finally the pg state will stuck in 'stale+peering',
  which is unexpected.

  Let's wait_for_active() after stop osd.0.

  Signed-off-by: huangjun <huangjun@xsky.com>
2017-07-25 11:14:07 +00:00
xie xingguo
450633b9e6 mon/OSDMonitor: ENOENT on removing non-existent app key
So we don't bother to trigger an pool update, which is potentially
big stuff.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-25 13:19:35 +08:00
xie xingguo
b4dcdecb6a mon/OSDMonitor: ENOENT on disabling non-existend app
so we don't bother to trigger an pool update, which is potentially
big stuff.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-25 13:19:29 +08:00
Sage Weil
7c157863a8 qa/run-standalone.sh: helper to run all standalone tests
Nothing fancy, but documents how these are run.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
766229b034 qa/standalone/scrub: separate scrub/repair tests from rest of osd/
They are slow.  Run them separately.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
cabad62242 qa/standalone/ceph-helpers: factor rbd pool create out of run_mon
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
b12bebe432 qa/standalone/mon/osd-pool-create: stop testing create pool output
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:49 -04:00
Sage Weil
71ea171604 qa: move ceph-helpers and misc src/test/*.sh tests to qa/standalone
- stop running via make check
- add teuthology yamls to run them
- disable ceph_objecstore_tool.py for now (too slow for make check, and
we can't use vstart in teuthology via a package install)
- drop cephtool tests since those are already covered by other teuthology
tests
- leave a handful of (fast!) ceph-helpers tests for make check for minimal
integration tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:49 -04:00
Alan Somers
821511bd32 openstack: Fix shebangs on openstack scripts
Many of the files in qa/qa_scripts/openstack had incorrect shebang
lines: the bang was missing.  This means that those scripts would
execute using the calling user's login shell, which is doubtless not
what the author intended.  Now they'll always use bash.

Two scripts do not need shebangs, because they contain only library
functions and don't execute anything.  I removed their shebangs.

Signed-off-by: Alan Somers <asomers@gmail.com>
2017-07-24 17:33:02 -06:00
Sage Weil
f347ef54c2 qa/workunits/cephtool/test.sh: disable 'fs status' until bug is fixed
See http://tracker.ceph.com/issues/20761
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 16:54:13 -04:00
Sage Weil
2e5955212d qa/tasks/workunit: allow alt basedir
Instead of 'qa/workunits' allow something like 'qa/standalone'.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 15:44:51 -04:00
Sage Weil
02c2e853d3 Merge pull request #16509 from liewegas/wip-rgw-wait
qa/suits/rados/basic/tasks/rgw_snaps: wait for pools to be created

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2017-07-24 11:55:54 -05:00
Sage Weil
29549e6834 Merge pull request #13723 from ovh/bp-forced-recovery
osd/PG: make prioritized recovery possible

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-07-24 09:01:03 -05:00
John Spray
343e1a4281 qa: update whitelist for "wrongly marked me down"
Signed-off-by: John Spray <john.spray@redhat.com>
2017-07-24 14:54:46 +01:00
Sage Weil
fc8374b472 Merge pull request #16326 from liewegas/wip-weight-set
crush,mon: add weight-set introspection and manipulation commands

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com
2017-07-24 08:27:06 -05:00
Sage Weil
ecd1193ab9 qa/suites/rados/basic/tasks/rgw_snaps: wait for pools to be be created
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-22 18:54:46 -04:00
Sage Weil
9b4002b6b8 qa/suites/rados/basic/tasks/rgw_snaps: fix pool list
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-22 18:54:45 -04:00
Sage Weil
08bdc2c867 Merge pull request #16500 from liewegas/wip-compact-sudo
qa/workunits/cephtool/test.sh: add sudo for daemon compact

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-07-22 13:01:20 -05:00
xie xingguo
fa0e314cde test: s/osd_objectstore_type/osd_objectstore/
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-22 15:22:31 +08:00
Sage Weil
4e6487cad4 Merge pull request #15991 from dillaman/wip-rbd-auth-profile
mon,osd: new rbd-based cephx cap profiles

Reviewed-by: Sage Weil <sage@redhat.com>
2017-07-21 22:38:42 -05:00
Sage Weil
0429acda45 Merge pull request #16460 from liewegas/wip-mgr-metadata
mon: add mgr metdata commands, and overall 'versions' command for all daemon versions

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-07-21 22:36:09 -05:00
Sage Weil
2f272ab451 qa/workunits/cephtool/test.sh: add sudo for daemon compact
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-21 23:18:03 -04:00
Patrick Donnelly
9506789ce1
Merge PR 16379 into master
* refs/remotes/upstream/pull/16379/head:
	qa: fix MDS_CLIENT_RECALL copy error

Reviewed-by: Zheng Yan <zyan@redhat.com>
2017-07-21 13:23:07 -07:00
Patrick Donnelly
23e3d40751
Merge PR 16226 into master
* refs/remotes/upstream/pull/16226/head:
	qa: wait for OSDMap to propagate for snap purge

Reviewed-by: Zheng Yan <zyan@redhat.com>
2017-07-21 13:22:47 -07:00
Jason Dillaman
44fa7ee788 qa/workunits/rbd: rbd-mirror tests should use 'mirror' user
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-07-21 14:30:18 -04:00
Jason Dillaman
56614d0ee9 qa/suites/rbd: mirroring tests should use rbd cap profiles
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-07-21 14:30:18 -04:00
Jason Dillaman
d32485ff37 qa/workunits/rbd: devstack test should use auth profiles
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-07-21 14:30:18 -04:00
Sage Weil
09b89ace82 qa/workunits/mon/crush_ops.sh: fix in-use rule removal test
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-21 13:50:57 -04:00
Sage Weil
fac1de8259 qa/workunits/mon/crush_ops: require luminous clients for test
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-21 13:50:57 -04:00
Sage Weil
70263dae67 mon: 'osd crush weight-set {ls,dump,create[-compat],rm[-compat],reweight[-compat]}' commands
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-21 13:50:52 -04:00
Kefu Chai
10b88b5d82 test: create asok files in a temp directory under $TMPDIR
to shorten the pathname of unix domain socket created for admin socket,
so it does not exceed the limit of 107 on GNU/Linux:

* ceph-helper.sh: the temp directory is named ${TMPDIR:-/tmp}/ceph-asok.$$
* vstart.sh: the temp directory is named `mktemp -u -d "${TMPDIR:-/tmp}/ceph-asok.XXXXXX"`

Fixes: http://tracker.ceph.com/issues/16895
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-07-22 01:05:29 +08:00
Sage Weil
cb084a55f6 Merge pull request #16453 from liewegas/wip-workloadgen
crush: enforce buckets-before-rules rule

Reviewed-by: Joao Eduardo Luis <joao@suse.de>
2017-07-21 11:01:22 -05:00
Sage Weil
75ac7d85da qa/workunits/cephtool/test.sh: add a few tests
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-21 11:25:05 -04:00
Joao Eduardo Luis
6f6fbe7870 qa: flush out monc's dropped msgs on msgr failure injection
We have a few open tickets regarding the mgr being down during suites
involving messenger failure injection. There are a few suspicions that
this may be related with the monclient, but we'll need more logs to
validate those suspicions and, more, to validate we're actually fixing
the issue.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
2017-07-21 15:29:21 +01:00
Jos Collin
fae6dc4786 Merge pull request #16430 from yuriw/wip_add_luminous
qa: Added luminous to the mix in schedule_subset.sh

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
2017-07-21 12:11:29 +00:00
Kefu Chai
4599eb7963 Merge pull request #16454 from liewegas/wip-fix-ceph-scrub
qa/tasks/ceph_manager: wait for osd to start after objectstore-tool sequence

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-07-21 19:31:19 +08:00
Kefu Chai
0193e38b3f Merge pull request #16028 from jcsp/wip-mgr-commands
mon: load mgr commands at runtime

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-07-21 18:16:13 +08:00
Sage Weil
6c4992aeca qa/workunits/cephtool/test.sh: fix test to watch audit channel
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-21 11:40:48 +08:00
Sage Weil
2e8413dede qa: remove workloadgen test
The CRUSH rule creation is busted (rules and buckets out of order), but
after I fix that it doesn't seem to run right anyway.  Remove it.
We get the mon thrasher coverage from rados/monthrash already; I don't
think this is adding meaningful coverage for the amount of effort it takes
to maintain.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-20 18:06:50 -04:00
Sage Weil
59e3827be7 qa/tasks/reg11184: import run
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-20 17:42:59 -04:00
Sage Weil
27e8d75f61 Merge pull request #16429 from liewegas/wip-jewel-x
qa/suites/upgrade/jewel-x: misc fixes for new health checks
2017-07-20 10:47:05 -05:00
Sage Weil
3de9f22ce0 Merge pull request #16423 from liewegas/wip-ls
mon: '* list' -> '* ls'

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-07-20 10:43:34 -05:00
Kefu Chai
acc24bf0dc Merge pull request #16444 from tchaikov/wip-test-osd-stat
qa/workunits/cephtool/test.sh: "ceph osd stat" output changed, update accordingly

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
2017-07-20 23:41:53 +08:00
Sage Weil
583a38bca2 qa/tasks/ceph_manager: wait for osd to start after objectstore-tool sequence
Fixes: http://tracker.ceph.com/issues/20705
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-20 11:41:36 -04:00
Kefu Chai
3dfa9daeca Merge pull request #16443 from wjwithagen/bug-wjw-qa-test-reorder
cephtool/test.sh: Only delete a test pool when no longer needed.

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-07-20 22:13:37 +08:00
Kefu Chai
a1d16185a2 qa/tasks/reg11184: use literal 'foo' instead pool_name
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-07-20 21:35:41 +08:00
Kefu Chai
ba525a829c qa/workunits/cephtool/test.sh: "ceph osd stat" output changed, update test accordingly
Signed-off-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2017-07-20 19:34:53 +08:00
Willem Jan Withagen
e3760fa936 cephtool/test.sh: Only delete a test pool when no longer needed.
the pool_getset pool is deleted before all tests on it are complete

4: /home/jenkins/workspace/ceph-master/qa/workunits/cephtool/test.sh:1990: test_mon_osd_pool_set:  ceph osd pool delete pool_get
set pool_getset --yes-i-really-really-mean-it
4: pool 'pool_getset' removed
4: /home/jenkins/workspace/ceph-master/qa/workunits/cephtool/test.sh:1992: test_mon_osd_pool_set:  ceph osd pool get rbd crush_r
ule
4: /home/jenkins/workspace/ceph-master/qa/workunits/cephtool/test.sh:1992: test_mon_osd_pool_set:  grep 'crush_rule: '
4: crush_rule: replicated_rule
4: /home/jenkins/workspace/ceph-master/qa/workunits/cephtool/test.sh:1994: test_mon_osd_pool_set:  ceph -f json osd pool get poo
l_getset compression_mode
4: Error ENOENT: unrecognized pool 'pool_getset'

Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2017-07-20 12:24:14 +02:00
Kefu Chai
aea471d73a Merge pull request #16403 from wjwithagen/bug-wjw-ceph-osd-stat
test: ceph osd stat out has changed, fix tests for that

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-07-20 18:06:47 +08:00
Ilya Dryomov
67db89f6c2 Merge pull request #16428 from idryomov/wip-krbd-luminous-thrash
qa: thrash tests for backoff and upmap

Reviewed-by: Vasu Kulkarni <vasu@redhat.com>
2017-07-20 11:28:22 +02:00
Piotr Dałek
b0134cc7a8 qa: add force/cancel recovery/backfill to QA testing
This randomly issues pg force-recovery/force-backfill and
pg cancel-force-recovery/cancel-force-backfill during QA
testing. Disabled for upgrades from hammer, jewel and kraken.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
2017-07-20 09:35:55 +02:00