Commit Graph

186 Commits

Author SHA1 Message Date
Sage Weil
9c1d621083 qa/suites/rados/monthrash: tolerate PG_AVAILABILITY during mon thrashing
Signed-off-by: Sage Weil <sage@redhat.com>
2017-10-04 21:26:56 -05:00
Yuri Weinstein
7b1c77a643 Merge pull request #18078 from liewegas/wip-21614
qa/suites/rados/singleton/all/recover-preemption: handle slow starting osd

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-10-03 12:33:29 -07:00
Josh Durgin
4570075984 Merge pull request #17576 from ceph/wip-rm-1-minsize
qa/tests/rados: Remove unsupported 2-size-1-min-size config

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-10-03 09:45:45 -07:00
Sage Weil
76d84ac194 qa/suites/rados/singleton/all/recover-preemption: handle slow starting osd
The OSD may not be marked up yet; set the config via the admin
socket.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-10-02 07:20:57 -05:00
John Spray
47bfe6cf17 Merge pull request #17735 from jcsp/wip-mgr-perf-interface
mgr: common interface for TSDB modules

Reviewed-by: My Do <mhdo@umich.edu>
Reviewed-by: Jan Fajerski <jfajerski@suse.com>
Reviewed-by: John Spray <john.spray@redhat.com>
2017-10-02 11:12:35 +01:00
Sage Weil
c59efe0a2b Merge pull request #17839 from liewegas/wip-recovery-preemption
osd: allow PG recovery scheduling preemption

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2017-09-28 14:14:01 -05:00
Sage Weil
d7b29acb19 qa/suites/rados/singleton/all/recovery-preemption: add test
This mirrors what I was testing locally.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-09-28 13:48:14 -04:00
John Spray
99352ceced qa: add mgr module selftest task
The module self test commands give us a chance to
catch any other ceph changes that change something
that a module was relying on reading.

Signed-off-by: John Spray <john.spray@redhat.com>
2017-09-27 14:20:22 -04:00
Sage Weil
07fe4fd5b5 Merge pull request #17528 from liewegas/wip-mgr-localpool
pybind/mgr/localpool: module to automagically create localized pools

Reviewed-by: John Spray <john.spray@redhat.com>
2017-09-25 12:30:05 -05:00
Sage Weil
6383fa5b30 qa/workunits/mgr/test_localpool: simple test for localpool mode
Signed-off-by: Sage Weil <sage@redhat.com>
2017-09-25 12:34:53 -04:00
Neha Ojha
11d8dfe591 qa/suites/rados/perf: create pool with lower pg_num
Signed-off-by: Neha Ojha <nojha@redhat.com>
2017-09-19 16:40:45 -07:00
Neha Ojha
2635e7a591 qa/suites/rados/perf: add optimized settings
Signed-off-by: Neha Ojha <nojha@redhat.com>
2017-09-18 15:53:28 -07:00
Josh Durgin
520a5a218c Merge pull request #17583 from neha-ojha/wip-cbt-teuthology-integration
qa: add cbt task for performance testing

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-09-14 16:36:26 -07:00
Xie Xingguo
b4bb7ce2da Merge pull request #17371 from xiexingguo/wip-per-pool-full-control
mon, osd: per pool space-full flag support

Reviewed-by: Sage Weil <sage@redhat.com>
2017-09-14 18:26:12 +08:00
xie xingguo
afcb617dc9 osd/PrimaryLogPG: do not generate data digest for BlueStore by default
BlueStore enables CRC by default, so this is a dup and gains
no more benefits.

Turn this off by default, which is good for performance.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-09-13 12:17:16 +08:00
Neha Ojha
59531d81c5 qa: avoid using make install for fio
Signed-off-by: Neha Ojha <nojha@redhat.com>
2017-09-12 08:26:27 -07:00
Neha Ojha
1dfd12e852 qa/suites/rados: add perf suite
Signed-off-by: Neha Ojha <nojha@redhat.com>
2017-09-08 11:15:11 -07:00
xie xingguo
b4ca5ae462 mon, osd: per pool space-full flag support
The newly introduced 'device-class' can be used to separate
different type of devices into different pools, e.g, hdd-pool
for backup data and all-flash-pool for DB applications.

However, if any osd of the cluster is currently running out
of space (exceeding the predefined 'full' threshold), Ceph
will mark the whole cluster as full and prevent writes to all pools,
which turns out to be very wrong.

This patch instead makes the space 'full' control at pool granularity,
which exactly leverages the pool quota logic but shall solve
the above problem.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-09-08 10:03:17 +08:00
Vasu Kulkarni
30dbbfe4ae Remove unsupported 2-size-1-min-size config
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-09-07 09:47:15 -07:00
Sage Weil
ec5e9c8d76 Merge pull request #17379 from liewegas/wip-div-p
qa/suites/rados/singleton/diverget_priors*: broaden whitelist
2017-08-31 13:56:59 -05:00
Sage Weil
39e5efbad2 qa/suites/rados/singleton/diverget_priors*: broaden whitelist
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-30 15:13:08 -04:00
Sage Weil
7b51cedac6 qa/suites/rados/upgrade: jewel-x -> luminous-x
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-28 23:11:27 -04:00
Sage Weil
d8dead1aaf qa/suites/rados: remove luminous tests
- snapdir conversion (at-end) stuff
- merge luminous-specific collections that avoided the above back
into their normal locations

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-28 23:10:32 -04:00
Kefu Chai
b2d7f4f4c7 qa/suites/rados/upgrade/jewel-x-singleton: tolerate sloppy past_intervals
See-also: d5d5d7d1
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-28 15:19:41 +08:00
Sage Weil
d69f0e120b qa/suites/rados/objectstore/objectstore: less debug
Saw an ENOSPC.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-13 14:41:43 -04:00
Sage Weil
41e5a85308 qa/suites/rados/verify/validater/valgrind: whitelist PG_
Peering might be slow due to valgrind.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-12 14:18:59 -04:00
Sage Weil
12007044b1 qa/suites/rados/multimon/tasks/mon_lock_with_skew: whitelist PG_
Default pool pgs not up because mons too broken for OSDs to peer.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-12 14:15:15 -04:00
Sage Weil
ad23d7dc1f qa/suites/rados/multimon: whitelist mgr down vs clock skew test
Clock skew might make us fail the mgr.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-11 13:42:02 -04:00
Sage Weil
c8d60396c7 qa/suites/rados/objectstore: logs
Hunting http://tracker.ceph.com/issues/20738

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-08 18:07:18 -04:00
Sage Weil
b5fae9a9ca Merge pull request #16873 from liewegas/wip-4-nodes
qa/suites: change fixed-2.yaml users to get 4 openstack disks

Reviewed-by: Zack Cerza <zcerza@redhat.com>
2017-08-07 11:27:40 -05:00
Sage Weil
f683d2d374 qa/suites: change fixed-2.yaml users to get 4 openstack disks
Follow-up for 4203c4f887

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-07 11:56:33 -04:00
Sage Weil
6307e03c6d qa/suites/rados/thrash/workloads/cache-agent-big: m=2
...because we do the test_map_discontinuity thing.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-05 14:33:13 -04:00
Sage Weil
ffd171fd46 Merge pull request #16820 from liewegas/wip-more-whitelist
qa/suites/rados: a bit more whitelisting

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-08-04 13:44:08 -05:00
Sage Weil
82cf3046de qa/suites/rados/basic/tasks/rados_python: POOL_APP_NOT_ENABLED
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-04 13:39:13 -04:00
Sage Weil
c8af364699 Merge pull request #16739 from liewegas/wip-multi-backfill-reject
qa/suites/rados/singleton-nomsgr/all/multi-backfill-reject: sleep longer
2017-08-04 08:41:06 -05:00
Sage Weil
1ae9ff173b qa/suites/rados/upgrade: ignore FS_DEGRADED from mds restart
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-04 09:34:31 -04:00
Sage Weil
27a685f626 qa/suites/rados/monthrash: ignore MGR_DOWN
Heavily thrashing mons + mgr reconnect backoff may make us fail
to process the beacon.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-04 09:34:15 -04:00
Sage Weil
342607f4d5 Merge pull request #16749 from tchaikov/wip-restful-delete-key
mgr: handle "module.set_config(.., None)" correctly 

Reviewed-by: John Spray <john.spray@redhat.com>
2017-08-03 15:53:27 -05:00
Josh Durgin
b172642124 Merge pull request #16789 from liewegas/wip-ec-m-2
qa: avoid map-gap tests for k=2 m=1

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-08-03 11:20:13 -07:00
Sage Weil
ef21c9d7df qa/suites/rados/thrash-erasure-code: do not test map gap with m=1
We test EC profiles with m=1 here, and mapgap can lead to incomplete pgs
because it takes an osd down and waits for healthy.

Fixes: http://tracker.ceph.com/issues/20844
Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 14:13:02 -04:00
Sage Weil
f74d71f708 qa/suites/rados/thrash-erasure-coe-big/clsuter: 12 osds on 3 nodes not 4
smithi have 4 nvme partitions available, not 3.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 14:11:43 -04:00
Sage Weil
63221e21f5 qa/suites/rados/thrash-erasure-code-big: add k=4 m=2
Get better coverage for larger codes.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 14:10:36 -04:00
Sage Weil
e994b03335 qa/suites/rados/monthrash/worklaods/rados_api_tests: whitelist SMALLER_PGP_NUM
The rados/test.sh fiddles with pg_num.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 13:31:39 -04:00
Sage Weil
7c350180b1 qa/suites/rados/mgr/tasks/failover: whitelist
remote/smithi025/log/ceph.log.gz:2017-08-03 07:02:15.049074 mon.b mon.0 172.21.15.25:6789/0 197 : cluster [INF] Manager daemon x is unresponsive, replacing it with standby daemon y
remote/smithi025/log/ceph.log.gz:2017-08-03 07:03:10.078032 mon.b mon.0 172.21.15.25:6789/0 226 : cluster [WRN] Manager daemon x is unresponsive.  No standby daemons available.

x and y may be swapped, so whitelist the rest of the string.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-03 12:40:01 -04:00
Kefu Chai
da1a60ced1 qa: refactor suites/rados/rest/mgr-restful
- use "ceph restful restart" to restart the restful API server instead
of restarting the ceph-mgr
- test "ceph restful delete-key"
- test "ceph restful list-keys"

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-02 18:20:56 +08:00
Kefu Chai
1ff1f836da Merge pull request #16722 from tchaikov/wip-qa-fixes
qa/suites: escape the parenthesis of the whitelist text

Reviewed-by: Sage Weil <sage@redhat.com>
2017-08-02 13:00:01 +08:00
Kefu Chai
a70be4e00c qa/suites: more whitelisting
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-02 10:00:57 +08:00
Sage Weil
c955bf528f qa/suites/rados/singleton-nomsgr/all/multi-backfill-reject: sleep longer
I saw a failure where the 30% backfill probability was enough that we
just didn't manage to backfill all of the pgs during the 5 minute recovery
timeout during ceph.py shutdown.  Build in some additional time for the
test to recover.

http://pulpito.ceph.com/sage-2017-08-01_15:32:10-rados-wip-sage-testing-distro-basic-smithi/1469184

Signed-off-by: Sage Weil <sage@redhat.com>
2017-08-01 15:50:47 -04:00
Kefu Chai
d12c51ca91 qa/suites: escape the parenthesis of the whitelist text
so we can avoid the warnings like

grep: Unmatched ( or \(

because we pass the whitelisted string to `egrep -v "$1"` directly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-08-01 21:54:44 +08:00
John Spray
ac2b9d63ca qa: include config help in admin socket test
Signed-off-by: John Spray <john.spray@redhat.com>
2017-08-01 13:38:40 +01:00