Merge pull request #13517 from liewegas/wip-kraken-x

qa/suites/upgrade/kraken-x

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
This commit is contained in:
Sage Weil 2017-02-20 10:20:26 -06:00 committed by GitHub
commit d6950a413f
60 changed files with 591 additions and 7 deletions

View File

@ -2,8 +2,9 @@ tasks:
- exec:
osd.0:
- ceph osd set require_luminous_osds
- ceph.healthy:
overrides:
ceph:
conf:
mon:
mon warn on osd down out interval zero: false
mon warn on osd down out interval zero: false

View File

View File

@ -0,0 +1,4 @@
openstack:
- volumes: # attached to each instance
count: 3
size: 30 # GB

View File

@ -0,0 +1,27 @@
meta:
- desc: |
Run ceph on two nodes,
with a separate client 0,1,2 third node.
Use xfs beneath the osds.
CephFS tests running on client 2,3
roles:
- - mon.a
- mds.a
- osd.0
- osd.1
- - mon.b
- mon.c
- osd.2
- osd.3
- - client.0
- client.1
- client.2
- client.3
overrides:
ceph:
log-whitelist:
- scrub mismatch
- ScrubResult
- wrongly marked
conf:
fs: xfs

View File

@ -0,0 +1,22 @@
meta:
- desc: |
install ceph/kraken latest
run workload and upgrade-sequence in parallel
upgrade the client node
tasks:
- install:
branch: kraken
- print: "**** done installing kraken"
- ceph:
- print: "**** done ceph"
- install.upgrade:
mon.a:
mon.b:
- print: "**** done install.upgrade both hosts"
- parallel:
- workload
- upgrade-sequence
- print: "**** done parallel"
- install.upgrade:
client.0:
- print: "**** done install.upgrade on client.0"

View File

@ -0,0 +1,14 @@
meta:
- desc: |
run a cephfs stress test
mount ceph-fuse on client.2 before running workunit
workload:
full_sequential:
- sequential:
- ceph-fuse:
- print: "**** done ceph-fuse 2-workload"
- workunit:
clients:
client.2:
- suites/blogbench.sh
- print: "**** done suites/blogbench.sh 2-workload"

View File

@ -0,0 +1,24 @@
meta:
- desc: |
run run randomized correctness test for rados operations
on an erasure-coded pool
workload:
full_sequential:
- rados:
clients: [client.0]
ops: 4000
objects: 50
ec_pool: true
write_append_excl: false
op_weights:
read: 100
write: 0
append: 100
delete: 50
snap_create: 50
snap_remove: 50
rollback: 50
copy_from: 50
setattr: 25
rmattr: 25
- print: "**** done rados ec task"

View File

@ -0,0 +1,11 @@
meta:
- desc: |
object class functional tests
workload:
full_sequential:
- workunit:
branch: kraken
clients:
client.0:
- cls
- print: "**** done cls 2-workload"

View File

@ -0,0 +1,11 @@
meta:
- desc: |
generate read/write load with rados objects ranging from 1MB to 25MB
workload:
full_sequential:
- workunit:
branch: kraken
clients:
client.0:
- rados/load-gen-big.sh
- print: "**** done rados/load-gen-big.sh 2-workload"

View File

@ -0,0 +1,11 @@
meta:
- desc: |
librbd C and C++ api tests
workload:
full_sequential:
- workunit:
branch: kraken
clients:
client.0:
- rbd/test_librbd.sh
- print: "**** done rbd/test_librbd.sh 2-workload"

View File

@ -0,0 +1,11 @@
meta:
- desc: |
librbd python api tests
workload:
full_sequential:
- workunit:
branch: kraken
clients:
client.0:
- rbd/test_librbd_python.sh
- print: "**** done rbd/test_librbd_python.sh 2-workload"

View File

@ -0,0 +1,16 @@
meta:
- desc: |
upgrade the ceph cluster
upgrade-sequence:
sequential:
- ceph.restart:
daemons: [mon.a, mon.b, mon.c]
- ceph.restart:
daemons: [osd.0, osd.1, osd.2, osd.3]
wait-for-healthy: false
wait-for-osds-up: true
- ceph.restart:
daemons: [mds.a]
wait-for-healthy: false
wait-for-osds-up: true
- print: "**** done ceph.restart all"

View File

@ -0,0 +1,35 @@
meta:
- desc: |
upgrade the ceph cluster,
upgrate in two steps
step one ordering: mon.a, osd.0, osd.1, mds.a
step two ordering: mon.b, mon.c, osd.2, osd.3
ceph expected to be healthy state after each step
upgrade-sequence:
sequential:
- ceph.restart:
daemons: [mon.a]
wait-for-healthy: true
- sleep:
duration: 60
- ceph.restart:
daemons: [mon.b, mon.c]
wait-for-healthy: true
- sleep:
duration: 60
- ceph.restart:
daemons: [osd.0, osd.1]
wait-for-healthy: true
- sleep:
duration: 60
- ceph.restart: [mds.a]
- sleep:
duration: 60
- sleep:
duration: 60
- ceph.restart:
daemons: [osd.2, osd.3]
wait-for-healthy: false
wait-for-osds-up: true
- sleep:
duration: 60

View File

@ -0,0 +1 @@
../../../../releases/luminous.yaml

View File

@ -0,0 +1,13 @@
meta:
- desc: |
run a cephfs stress test
mount ceph-fuse on client.3 before running workunit
tasks:
- sequential:
- ceph-fuse:
- print: "**** done ceph-fuse 5-final-workload"
- workunit:
clients:
client.3:
- suites/blogbench.sh
- print: "**** done suites/blogbench.sh 5-final-workload"

View File

@ -0,0 +1,17 @@
meta:
- desc: |
randomized correctness test for rados operations on a replicated pool with snapshots
tasks:
- rados:
clients: [client.1]
ops: 4000
objects: 50
write_append_excl: false
op_weights:
read: 100
write: 100
delete: 50
snap_create: 50
snap_remove: 50
rollback: 50
- print: "**** done rados 4-final-workload"

View File

@ -0,0 +1,9 @@
meta:
- desc: |
generate read/write load with rados objects ranging from 1 byte to 1MB
tasks:
- workunit:
clients:
client.1:
- rados/load-gen-mix.sh
- print: "**** done rados/load-gen-mix.sh 4-final-workload"

View File

@ -0,0 +1,18 @@
meta:
- desc: |
librados C and C++ api tests
overrides:
ceph:
log-whitelist:
- reached quota
tasks:
- mon_thrash:
revive_delay: 20
thrash_delay: 1
- print: "**** done mon_thrash 4-final-workload"
- workunit:
branch: kraken
clients:
client.1:
- rados/test.sh
- print: "**** done rados/test.sh 4-final-workload"

View File

@ -0,0 +1,9 @@
meta:
- desc: |
rbd object class functional tests
tasks:
- workunit:
clients:
client.1:
- cls/test_cls_rbd.sh
- print: "**** done cls/test_cls_rbd.sh 4-final-workload"

View File

@ -0,0 +1,11 @@
meta:
- desc: |
run basic import/export cli tests for rbd
tasks:
- workunit:
clients:
client.1:
- rbd/import_export.sh
env:
RBD_CREATE_ARGS: --new-format
- print: "**** done rbd/import_export.sh 4-final-workload"

View File

@ -0,0 +1,13 @@
meta:
- desc: |
swift api tests for rgw
overrides:
rgw:
frontend: civetweb
tasks:
- rgw: [client.1]
- print: "**** done rgw 4-final-workload"
- swift:
client.1:
rgw_server: client.1
- print: "**** done swift 4-final-workload"

View File

@ -0,0 +1 @@
../../../../distros/supported/

View File

@ -0,0 +1 @@
../../../../objectstore

View File

@ -0,0 +1 @@
../stress-split/0-cluster/

View File

@ -0,0 +1 @@
../stress-split/1-kraken-install/

View File

@ -0,0 +1 @@
../stress-split/2-partial-upgrade/

View File

@ -0,0 +1,20 @@
meta:
- desc: |
randomly kill and revive osd
small chance to increase the number of pgs
overrides:
ceph:
log-whitelist:
- wrongly marked me down
- objects unfound and apparently lost
- log bound mismatch
tasks:
- parallel:
- stress-tasks
stress-tasks:
- thrashosds:
timeout: 1200
chance_pgnum_grow: 1
chance_pgpnum_fix: 1
min_in: 4
- print: "**** done thrashosds 3-thrash"

View File

@ -0,0 +1,22 @@
meta:
- desc: |
randomized correctness test for rados operations on an erasure coded pool
stress-tasks:
- rados:
clients: [client.0]
ops: 4000
objects: 50
ec_pool: true
write_append_excl: false
op_weights:
read: 100
write: 0
append: 100
delete: 50
snap_create: 50
snap_remove: 50
rollback: 50
copy_from: 50
setattr: 25
rmattr: 25
- print: "**** done rados ec task"

View File

@ -0,0 +1 @@
../stress-split/5-finish-upgrade.yaml

View File

@ -0,0 +1 @@
../stress-split/6-luminous.yaml

View File

@ -0,0 +1,35 @@
#
# k=3 implies a stripe_width of 1376*3 = 4128 which is different from
# the default value of 4096 It is also not a multiple of 1024*1024 and
# creates situations where rounding rules during recovery becomes
# necessary.
#
meta:
- desc: |
randomized correctness test for rados operations on an erasure coded pool
using the jerasure plugin with k=3 and m=1
tasks:
- rados:
clients: [client.0]
ops: 4000
objects: 50
ec_pool: true
write_append_excl: false
erasure_code_profile:
name: jerasure31profile
plugin: jerasure
k: 3
m: 1
technique: reed_sol_van
ruleset-failure-domain: osd
op_weights:
read: 100
write: 0
append: 100
delete: 50
snap_create: 50
snap_remove: 50
rollback: 50
copy_from: 50
setattr: 25
rmattr: 25

View File

@ -0,0 +1 @@
../../../../distros/supported/

View File

@ -0,0 +1 @@
../../../../objectstore

View File

@ -0,0 +1,6 @@
openstack:
- machine:
disk: 100 # GB
- volumes: # attached to each instance
count: 3
size: 30 # GB

View File

@ -0,0 +1,19 @@
meta:
- desc: |
Run ceph on two nodes,
with a separate client-only node.
Use xfs beneath the osds.
overrides:
ceph:
fs: xfs
roles:
- - mon.a
- mon.b
- mon.c
- osd.0
- osd.1
- osd.2
- - osd.3
- osd.4
- osd.5
- - client.0

View File

@ -0,0 +1,8 @@
meta:
- desc: install ceph/kraken latest
tasks:
- install:
branch: kraken
- print: "**** done install kraken"
- ceph:
- print: "**** done ceph"

View File

@ -0,0 +1,12 @@
meta:
- desc: |
install upgrade ceph/-x on one node only
1st half
restart : osd.0,1,2
tasks:
- install.upgrade:
osd.0:
- print: "**** done install.upgrade osd.0"
- ceph.restart:
daemons: [mon.a,mon.b,mon.c,osd.0, osd.1, osd.2]
- print: "**** done ceph.restart 1st half"

View File

@ -0,0 +1,19 @@
meta:
- desc: |
randomly kill and revive osd
small chance to increase the number of pgs
overrides:
ceph:
log-whitelist:
- wrongly marked me down
- objects unfound and apparently lost
- log bound mismatch
tasks:
- parallel:
- stress-tasks
stress-tasks:
- thrashosds:
timeout: 1200
chance_pgnum_grow: 1
chance_pgpnum_fix: 1
- print: "**** done thrashosds 3-thrash"

View File

@ -0,0 +1,40 @@
meta:
- desc: |
run randomized correctness test for rados operations
generate write load with rados bench
stress-tasks:
- full_sequential:
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- radosbench:
clients: [client.0]
time: 150
- print: "**** done radosbench 7-workload"

View File

@ -0,0 +1,10 @@
meta:
- desc: |
run basic cls tests for rbd
stress-tasks:
- workunit:
branch: kraken
clients:
client.0:
- cls/test_cls_rbd.sh
- print: "**** done cls/test_cls_rbd.sh 5-workload"

View File

@ -0,0 +1,12 @@
meta:
- desc: |
run basic import/export cli tests for rbd
stress-tasks:
- workunit:
branch: kraken
clients:
client.0:
- rbd/import_export.sh
env:
RBD_CREATE_ARGS: --new-format
- print: "**** done rbd/import_export.sh 5-workload"

View File

@ -0,0 +1,10 @@
meta:
- desc: |
librbd C and C++ api tests
stress-tasks:
- workunit:
branch: kraken
clients:
client.0:
- rbd/test_librbd.sh
- print: "**** done rbd/test_librbd.sh 7-workload"

View File

@ -0,0 +1,16 @@
meta:
- desc: |
randomized correctness test for rados operations on a replicated pool,
using only reads, writes, and deletes
stress-tasks:
- full_sequential:
- rados:
clients: [client.0]
ops: 4000
objects: 500
write_append_excl: false
op_weights:
read: 45
write: 45
delete: 10
- print: "**** done rados/readwrite 5-workload"

View File

@ -0,0 +1,18 @@
meta:
- desc: |
randomized correctness test for rados operations on a replicated pool with snapshot operations
stress-tasks:
- full_sequential:
- rados:
clients: [client.0]
ops: 4000
objects: 50
write_append_excl: false
op_weights:
read: 100
write: 100
delete: 50
snap_create: 50
snap_remove: 50
rollback: 50
- print: "**** done rados/snaps-few-objects 5-workload"

View File

@ -0,0 +1,8 @@
tasks:
- install.upgrade:
osd.3:
- ceph.restart:
daemons: [osd.3, osd.4, osd.5]
wait-for-healthy: false
wait-for-osds-up: true

View File

@ -0,0 +1 @@
../../../../releases/luminous.yaml

View File

@ -0,0 +1,10 @@
meta:
- desc: |
librbd python api tests
tasks:
- workunit:
branch: kraken
clients:
client.0:
- rbd/test_librbd_python.sh
- print: "**** done rbd/test_librbd_python.sh 9-workload"

View File

@ -0,0 +1,12 @@
meta:
- desc: |
swift api tests for rgw
tasks:
- rgw:
client.0:
default_idle_timeout: 300
- print: "**** done rgw 9-workload"
- swift:
client.0:
rgw_server: client.0
- print: "**** done swift 9-workload"

View File

@ -0,0 +1,16 @@
meta:
- desc: |
randomized correctness test for rados operations on a replicated pool with snapshot operations
tasks:
- rados:
clients: [client.0]
ops: 4000
objects: 500
write_append_excl: false
op_weights:
read: 100
write: 100
delete: 50
snap_create: 50
snap_remove: 50
rollback: 50

View File

@ -0,0 +1 @@
../../../../distros/supported/

View File

@ -0,0 +1 @@
../../../../objectstore

View File

@ -1301,12 +1301,6 @@ def restart(ctx, config):
ctx.daemons.get_daemon(type_, id_, cluster).restart()
clusters.add(cluster)
if config.get('wait-for-healthy', True):
for cluster in clusters:
healthy(ctx=ctx, config=dict(cluster=cluster))
if config.get('wait-for-osds-up', False):
for cluster in clusters:
wait_for_osds_up(ctx=ctx, config=dict(cluster=cluster))
manager = ctx.managers['ceph']
for dmon in daemons:
if '.' in dmon:
@ -1314,6 +1308,13 @@ def restart(ctx, config):
if dm_parts[1].isdigit():
if dm_parts[0] == 'osd':
manager.mark_down_osd(int(dm_parts[1]))
if config.get('wait-for-healthy', True):
for cluster in clusters:
healthy(ctx=ctx, config=dict(cluster=cluster))
if config.get('wait-for-osds-up', False):
for cluster in clusters:
wait_for_osds_up(ctx=ctx, config=dict(cluster=cluster))
yield