Orit Wasserman
c320fbd9f8
Merge pull request #15753 from pritha-srivastava/wip-rgw-s3tests-conf
...
rgw: Changes for s3test config file, to add user under a tenant.
Reviewed-by: Casey Bodely <cbodley@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
2017-06-22 11:00:26 +03:00
Patrick Donnelly
d4870a093c
qa: wait for healthy cluster before testing pins
...
Fixes: http://tracker.ceph.com/issues/20318
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-06-21 13:21:32 -07:00
Vasu Kulkarni
14b6267cba
s3a task to test radosgw compatibility with hadoop s3a interface
...
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-06-21 11:52:10 -07:00
Sage Weil
6a00ba0e26
qa/tasks/ceph_manager: get osds all in after thrashing
...
Otherwise we might end up with some PGs remapped, which means they won't
get scrubbed.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-20 12:07:25 -04:00
Yan, Zheng
57e82edc9c
qa/cephfs: use ceph.dir.pin to trigger migration
...
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2017-06-20 17:39:46 +08:00
Pritha Srivastava
5e94a9852c
rgw: Changes for s3test config file, to add user under a tenant.
...
Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
2017-06-20 12:57:24 +05:30
Sage Weil
04969eff23
qa/tasks/resolve_stuck_peering: start osd at end
...
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-19 14:28:28 -04:00
Sage Weil
cc902a1f6b
qa/tasks/ceph: osd_scrub_pgs: reissue scrub requests in loop
...
The scrub commands are not reliable: if the OSD doesn't happen to
be connected at the time the command is issued it may not get
delivered. Re-request scrubs for each PG that has not yet been
scrubbed so that we don't wait forever when the original request
is dropped.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-19 12:00:12 -04:00
Sage Weil
32361a798f
qa/tasks/ceph: osd_scrub_pgs: tolerate down osd at initial scrub time
...
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-19 12:00:12 -04:00
Sage Weil
bdf40c546d
Merge pull request #15717 from liewegas/wip-20326
...
qa/tasks/ceph.py: tolerate active+clean+something
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-06-16 16:12:20 -05:00
Sage Weil
1565b86dc0
qa/tasks/ceph.py: tolerate active+clean+something
...
where something is, say, snaptrim. or maybe scrubbing.
or whatever.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-15 22:29:28 -04:00
Sage Weil
f870cc5f28
qa/tasks/thrashosds: wait before wait_for_recovery
...
Make sure OSDs are up *and* they have flushed their PG stats before
waiting for recovery to ensure that we do not see a stale 'clean' state.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-15 12:14:24 -04:00
Sage Weil
200abcee6d
qa/tasks/ceph: raise exception if scrubs time out
...
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-15 11:23:18 -04:00
Sage Weil
0d80c88667
qa/tasks/ceph: raise an exception if pgs are not clean
...
If this happens the preceding test should have cleaned
up (e.g., ceph.healthy:).
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-15 11:23:18 -04:00
Sage Weil
6fa9d32407
qa/tasks/ceph: osd_scrub_pgs: try a bit longer
...
I just saw a test fail that was still waiting for
scrubs to complete.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-15 11:23:18 -04:00
John Spray
18fbf24c7a
Merge pull request #15308 from jcsp/wip-19706
...
mon: don't kill MDSs unless some beacons are getting through
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-06-15 10:50:44 -04:00
John Spray
4a1fe14bc6
Merge pull request #15411 from jcsp/wip-fs-suite
...
qa: misc cephfs test improvements
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-06-15 10:50:07 -04:00
Yan, Zheng
5e1d8879ee
qa/cephfs: update stray reintegration test case
...
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2017-06-12 09:46:06 +08:00
Sage Weil
554cf8394a
Merge pull request #15073 from liewegas/wip-mgr-stats
...
mon,mgr: extricate PGmap from monitor
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2017-06-04 13:36:01 -05:00
Kefu Chai
e8b23d6852
qa/tasks: add a blacklist for flush_pg_stats()
...
so we don't wait for marked out osds.
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:06:50 -04:00
Sage Weil
ab1b78ae00
qa/tasks: use new reliable flush_pg_stats helper
...
The helper gets a sequence number from the osd (or osds), and then
polls the mon until that seq is reflected there.
This is overkill in some cases, since many tests only require that the
stats be reflected on the mgr (not the mon), but waiting for it to also
reach the mon is sufficient!
Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:45 -04:00
Yehuda Sadeh
ea911b7f48
Merge pull request #14351 from yehudasa/wip-rgw-mdsearch
...
rgw: metadata search part 2
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Abhishek Lekshmanan <abhishek@suse.com>
2017-06-02 09:16:07 -07:00
Yehuda Sadeh
6594d972f2
qa/tasks/rgw_multisite.py: adjust zone init
...
zone is now a ZoneConn object. Also, change import to make it relative
so that qa task can locate it.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2017-06-01 13:32:00 -07:00
John Spray
7e1be30b9a
qa: clean up test_exports.py
...
Mainly just using the setfattr helper
instead of run_shell.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-06-01 07:18:03 -04:00
John Spray
6ef30d1ed3
qa: explicitly set up standby replay in test_journal_migration
...
Previously this relied on being run in a special cluster configuration
that set up standby replay daemons. This change will allow it
to live alongside all the 'normal' functional tests.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-06-01 07:18:03 -04:00
John Spray
01c46bf832
Merge pull request #15205 from batrick/i20039
...
mds: check export pin during replay
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-06-01 11:23:02 +01:00
John Spray
3326321858
qa: fix daemon restart between tests
...
Previously, calling mds_stop without mds_fail meant
that if the filesystem creation was not quick, then
we would see those daemons go laggy. This starts
to trigger failures now that we have cluster log
messages that fire when a daemon gets failed out
due to being laggy.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-05-31 18:00:43 -04:00
Yehuda Sadeh
760c5e4f86
Merge pull request #15184 from cbodley/wip-qa-rgw-cleanup
...
qa/rgw: remove apache/fastcgi and radosgw-agent tests
Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
2017-05-30 13:09:31 -07:00
Patrick Donnelly
76335b0e0f
qa: improve debug message for subtree wait
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-05-30 09:08:27 -07:00
Sage Weil
8554158574
Merge pull request #15325 from liewegas/wip-redirect
...
osd,librados: add manifest, redirect
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-05-29 14:48:33 -05:00
Sage Weil
a9a728fe4d
Merge pull request #15296 from liewegas/wip-fix-at-end
...
qa/tasks/repair_test: unset flags we set
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-05-27 22:11:31 -05:00
Kefu Chai
8abc6e1bea
qa/tasks/rebuild_mondb: update to address ceph-mgr changes
...
- revive ceph-mgr after updating the keyring cap
- grant "mgr:allow *" to client.admin
- minor refactors
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-05-28 09:59:50 +08:00
Sage Weil
a4247dd594
Merge branch 'wip-extensible_tier-redirect' of git://github.com/myoungwon/ceph into wip-redirect
2017-05-26 22:50:14 -04:00
Sage Weil
d292b5419f
qa/tasks/repair_test: unset flags we set
...
In particular, noscrub and nodeepscrub leave a health
warning, which prevents shutdown with at-end.yaml.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-05-25 18:05:42 -04:00
John Spray
f80e0973f5
Merge pull request #15062 from ukernel/wip-19912
...
qa/tasks/cephfs: use getattr to guarantee inode is in client cache
Reviewed-by: John Spray <john.spray@redhat.com>
2017-05-25 18:44:54 +01:00
Sage Weil
5d80c74e63
Merge pull request #15252 from liewegas/wip-cleanup-tell
...
qa/tasks/ceph_manager: 'ceph $service tell ...' is obsolete
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-05-24 17:05:32 -05:00
John Spray
ef9d555916
Merge pull request #15105 from ukernel/wip-19892
...
qa/cephfs: disable mds_bal_frag for TestStrays.test_purge_queue_op_rate
Reviewed-by: John Spray <john.spray@redhat.com>
2017-05-24 16:41:45 +01:00
John Spray
ee75318807
Merge pull request #15122 from batrick/test-fragment-error
...
qa: fix float parse error in test_fragment
Reviewed-by: John Spray <john.spray@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-05-24 16:40:50 +01:00
Sage Weil
5ab996ab3c
qa/tasks/ceph_manager: 'ceph $service tell ...' is obsolete
...
This died forever ago; no need for the fallback here.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-05-23 22:53:53 -04:00
John Spray
3913ed0ba6
qa: refine assert_session_count (don't count killing)
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-05-23 05:22:18 -04:00
John Spray
ee2683c804
qa: update TestVolumeClient for new blacklisting
...
Blacklisted clients will now proactively fail
outstanding operations, rather than blocking.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-05-23 05:22:18 -04:00
John Spray
ab8e328c80
qa: clean up whitespace in test_misc.py
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-05-23 05:22:18 -04:00
John Spray
c91ccac6f6
qa: remove outdated TODO in TestVolumeClient
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-05-23 05:22:17 -04:00
John Spray
47a9c9ba67
qa: add test_filelock_eviction
...
To check that eviction is releasing flocks.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-05-23 05:22:17 -04:00
Casey Bodley
8c74c8a639
qa/rgw: remove apache/fastcgi
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-19 16:05:36 -04:00
Casey Bodley
0fb3e76eae
qa/rgw: more cleanup in rgw.py
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-19 15:53:37 -04:00
Casey Bodley
c8d8b9cae1
qa/rgw: remove unused helpers in util/rgw.py
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-19 15:53:37 -04:00
Casey Bodley
a05b3bb409
qa/rgw: remove radosgw_agent task
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-19 15:53:37 -04:00
Casey Bodley
762e15fbb3
qa/rgw: remove radosgw-agent config from s3tests task
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-19 15:53:37 -04:00
Casey Bodley
9d82486d0e
qa/rgw: remove radosgw-agent tests from radosgw_admin task
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-19 15:53:37 -04:00
Casey Bodley
898ab4bb0f
qa/rgw: remove multisite configuration from rgw task
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-19 15:53:36 -04:00
Casey Bodley
cff53b246f
Merge pull request #14688 from cbodley/wip-rgw-multi-suite
...
qa/rgw: add multisite suite to configure and run multisite tests
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
2017-05-19 14:30:57 -04:00
Sage Weil
590fd5362a
Merge pull request #15071 from cbodley/wip-qa-dnsmasq
...
qa: add task for dnsmasq configuration
Reviewed-by: Vasu Kulkarni <vasu@redhat.com>
Reviewed-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
2017-05-19 13:25:12 -05:00
Casey Bodley
de836ee684
qa/rgw: add test config to rgw_multisite_tests task
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-18 13:38:44 -04:00
Casey Bodley
efb3b181fd
qa/rgw: add log_level argument to rgwadmin()
...
changes default level from info to debug
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-18 13:37:35 -04:00
Casey Bodley
4722d1d920
qa/rgw: add rgw_multisite_tests task to run tests
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-17 14:48:55 -04:00
Casey Bodley
b6d86be2c5
qa/rgw: add rgw_multisite task based on rgw_multi
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-17 14:48:55 -04:00
Casey Bodley
a86ce77155
qa/rgw: add symlink to qa/tasks/rgw_multi
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-17 14:48:55 -04:00
Casey Bodley
746c630999
qa/rgw: move startup polling logic to util/rgw.py
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-17 14:48:55 -04:00
Casey Bodley
76e147614f
qa/rgw: fixes for cluster name on cleanup
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-17 14:48:55 -04:00
Casey Bodley
4c59d343c3
qa/rgw: move compression type out of ceph.conf
...
this makes the 'compression type' setting global to all gateways, and
makes the setting visible to other tasks in ctx.rgw.compression_type
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-17 14:48:55 -04:00
Patrick Donnelly
6c34a2c673
qa: silence upgrade test failure
...
The new fs setting standby_count_wanted is only avialable in luminous. Upgrade
tests were tripping on this.
Fixes: http://tracker.ceph.com/issues/19934
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-05-16 18:43:57 -04:00
Patrick Donnelly
4b72940d02
qa: fix float parse error in test_fragment
...
2017-05-16 17:45:30,663.663 INFO:__main__:run args=['./bin/ceph', 'daemon', 'mds.b', 'perf', 'dump', 'mds']
2017-05-16 17:45:30,664.664 INFO:__main__:Running ['./bin/ceph', 'daemon', 'mds.b', 'perf', 'dump', 'mds']
Can't get admin socket path: unable to get conf option admin_socket for mds.b: parse error setting 'mds_bal_fragment_size_max' to '152.0'
2017-05-16 17:45:30,781.781 INFO:__main__:test_rapid_creation (tasks.cephfs.test_fragment.TestFragmentation) ... ERROR
2017-05-16 17:45:30,782.782 ERROR:__main__:Traceback (most recent call last):
File "/home/pdonnell/ceph/qa/tasks/cephfs/test_fragment.py", line 114, in test_rapid_creation
self.assertEqual(self.get_splits(), 0)
File "/home/pdonnell/ceph/qa/tasks/cephfs/test_fragment.py", line 15, in get_splits
return self.fs.mds_asok(['perf', 'dump', 'mds'])['mds']['dir_split']
File "/home/pdonnell/ceph/qa/tasks/cephfs/filesystem.py", line 788, in mds_asok
return self.json_asok(command, 'mds', mds_id)
File "/home/pdonnell/ceph/qa/tasks/cephfs/filesystem.py", line 174, in json_asok
proc = self.mon_manager.admin_socket(service_type, service_id, command)
File "../qa/tasks/vstart_runner.py", line 561, in admin_socket
args=[os.path.join(BIN_PREFIX, "ceph"), "daemon", "{0}.{1}".format(daemon_type, daemon_id)] + command, check_status=check_status
File "../qa/tasks/vstart_runner.py", line 296, in run
proc.wait()
File "../qa/tasks/vstart_runner.py", line 174, in wait
raise CommandFailedError(self.args, self.exitstatus)
CommandFailedError: Command failed with status 22: ['./bin/ceph', 'daemon', 'mds.b', 'perf', 'dump', 'mds']
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-05-16 18:02:18 -04:00
myoungwon oh
a07ad9fe80
qa/suites/rados/thrash: add redirect test cases
...
Signed-off-by: Myoungwon Oh omwmw@sk.com
2017-05-17 05:47:12 +09:00
John Spray
60f904615f
Merge pull request #15096 from jcsp/wip-journalrepair-test
...
qa: simplify TestJournalRepair
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-05-16 16:11:57 +01:00
Yan, Zheng
6473b79337
qa/cephfs: disable mds_bal_frag for TestStrays.test_purge_queue_op_rate
...
directory fragmentation generates extra osd ops, which affects checks
in the test.
Fixes: http://tracker.ceph.com/issues/19892
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2017-05-16 16:43:29 +08:00
John Spray
2350555fe5
qa: simplify TestJournalRepair
...
This was sending lots of metadata ops to MDSs to persuade
them to migrate some subtrees, but that was flaky. Use
the shiny new rank pinning functionality instead.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-05-15 17:27:07 -04:00
Douglas Fuller
7f659e104d
qa/cephfs: Fix for test_data_scan
...
Don't assume that test_data_scan will be run on exactly 2 MDS nodes.
Fixes: http://tracker.ceph.com/issues/19893
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
2017-05-15 16:01:02 -04:00
John Spray
17f669a868
Merge pull request #15026 from ukernel/wip-19891
...
qa/suites/fs: reserve more space for mds in full tests
Reviewed-by: John Spray <john.spray@redhat.com>
2017-05-15 13:21:52 +01:00
John Spray
897b5f5bbe
Merge pull request #15035 from batrick/quiet-mds-grow-shrink
...
qa: silence spurious insufficient standby health warnings
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-05-15 13:17:38 +01:00
Casey Bodley
062923515c
qa: add task for dnsmasq configuration
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-05-12 16:53:14 -04:00
Yan, Zheng
1a48359f34
qa/tasks/cephfs: use getattr to guarantee inode is in client cache
...
When selinux is enabled, kernel client may releases inodes (without
uptodate xattr) in readdir reply immediately after processing the reply.
The reason is that linking the inode to dentry causes deadlock if xattr
is not uptodate.
We can use stat(2) syscall to guarantee that kernel client caches an
inode.
Fixes: http://tracker.ceph.com/issues/19912
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2017-05-12 16:42:25 +08:00
Yan, Zheng
b67a599ebe
Merge pull request #14598 from batrick/mds-balancer-pin
...
mds: support export pinning on directories
2017-05-11 11:56:34 +08:00
Yan, Zheng
bbb3369b50
qa/suites/fs: fix write size calculation in full tests
...
'max_avail' has already taken full_ratio into account
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2017-05-11 11:18:22 +08:00
Patrick Donnelly
02c41f683d
qa: add health warning test for insufficient standbys
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-05-10 11:05:09 -04:00
Patrick Donnelly
a4cb10900d
qa: turn off spurious standby health warning
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-05-10 10:21:28 -04:00
Patrick Donnelly
9552efde4a
qa: improve time handling for test_exports test
...
Also catches corner-case found by Zheng where an unjournaled directory will
cause export pinning to fail because it cannot be made a subtree until its
parent is stable.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-05-05 19:07:05 -04:00
Sage Weil
99928c9e0d
Merge pull request #14931 from tchaikov/wip-19771
...
qa/tasks/ceph_manager: always fix pgp_num when done with thrashosd task
Reviewed-by: Sage Weil <sage@redhat.com>
2017-05-05 08:53:38 -05:00
Tamilarasi Muthamizhan
a189b61095
Merge pull request #14400 from ceph/wip-cd-1node
...
qa/tasks: few fixes to get ceph-deploy 1node to working state
2017-05-04 10:42:50 -07:00
Vasu Kulkarni
e58dd3938a
install mgr on the node
...
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-05-03 16:47:14 -07:00
Kefu Chai
da1161cbd8
qa/tasks/ceph_manager: always fix pgp_num when done with thrashosd task
...
Fixes: http://tracker.ceph.com/issues/19771
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-05-03 18:28:27 +08:00
Patrick Donnelly
63cbe330b7
qa: remove errant mount requirement
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-05-02 18:29:08 -04:00
Patrick Donnelly
6bd58fefb7
mds: use aux subtrees for export pinned inodes
...
Idea here is that a pinned inode should not be exported when its parent is.
Setting the pinned inode's dirfrags to aux subtrees prevents them from being
merged with a parent subtree.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-05-02 00:30:35 -04:00
Casey Bodley
0e30e3ef01
Merge pull request #14845 from cbodley/wip-rgw-qa-s3tests
...
qa/rgw: add cluster name to path when s3tests scans rgw log
Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
2017-05-01 10:49:12 -04:00
Kefu Chai
7424345c77
qa/erasure-code: override min_size to 2
...
so isa(k=2,m=1) can survive with 1 down OSD.
Fixes: http://tracker.ceph.com/issues/19770
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-29 10:43:17 +08:00
Kefu Chai
5f50298025
qa/tasks/rados: add optional setting of "min_size"
...
this setting only affects the newly created pool
Fixes: http://tracker.ceph.com/issues/19770
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-29 10:39:02 +08:00
Casey Bodley
88b6a142bc
qa/rgw: fix assertions in radosgw_admin task
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-04-27 19:38:10 -04:00
Casey Bodley
a31aa6f65c
qa/rgw: add cluster name to path when s3tests scans rgw log
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-04-27 14:48:40 -04:00
John Spray
d0d3a4a02e
Merge pull request #12935 from stiopaa1/17855_evictClient
...
mds/Server.cc: Don't evict a slow client if...
Reviewed-by: John Spray <john.spray@redhat.com>
2017-04-24 22:10:01 +01:00
John Spray
837a71c0af
qa/tasks/cephfs: clean up mount point setup
...
Previously were sometimes trying to maintain a mounted
client across a filesystem destroy/create.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-24 11:19:55 +01:00
John Spray
16702ff13d
Merge pull request #14018 from jcsp/wip-17939
...
client: getattr before returning quota/layout xattrs
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-04-24 11:12:26 +01:00
Michal Jarzabek
1a5cb534d9
mds/Server.cc: Don't evict a slow client if...
...
... it's the only client
Fixes: http://tracker.ceph.com/issues/17855
Signed-off-by: Michal Jarzabek <stiopa@gmail.com>
2017-04-23 13:31:47 +01:00
Sage Weil
27dd6530a2
Merge pull request #14559 from liewegas/wip-pg-map
...
mon: move 'pg map' to OSDMonitor
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-04-21 18:53:17 -05:00
Kefu Chai
c237e7ed29
Merge pull request #14232 from jcsp/wip-19412
...
mgr: fix python module teardown & add tests
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-04-21 22:57:44 +08:00
Sage Weil
069182f91f
qa/tasks/ceph_manager: use 'pg map' for get_pg_{primary,replica}
...
Pulling this out of the 'pg dump' heap is inefficient.
Also, pg dump data comes from the mgr and may be stale.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-04-21 10:56:28 -04:00
Kefu Chai
6fa16c4477
Merge pull request #14584 from tchaikov/wip-19631
...
qa/suites: Revert "qa/suites: add mon-reweight-min-pgs-per-osd = 4"
Reviewed-by: Sage Weil <sage@redhat.com>
2017-04-21 22:56:21 +08:00
Casey Bodley
a4fc5c38e5
qa/rgw: don't scan radosgw logs for encryption keys on jewel upgrade test
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-04-20 14:49:04 -04:00
John Spray
f695a0e30f
qa: s/REQUIRE_MGRS/MGRS_REQUIRED/ for consistency
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-20 15:00:31 +01:00
John Spray
636fc40d90
qa: additions to mgr.test_failover
...
Reproducers for recent fixes:
http://tracker.ceph.com/issues/19407
http://tracker.ceph.com/issues/19258
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-20 15:00:31 +01:00
John Spray
8ea98b4cbf
qa: fix vstart_runner --create for mgr tests
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-20 15:00:31 +01:00
Kefu Chai
e6a436bb27
qa/tasks/ceph_manager: be able to store options with service type
...
so we are able to change options for services other than mon while
thrashing.
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-20 14:18:21 +08:00
Kefu Chai
ee653ba87c
Merge pull request #14608 from tchaikov/wip-19594
...
qa/tasks: assert on pg status with a timeout
Reviewed-by: Sage Weil <sage@redhat.com>
2017-04-20 10:49:12 +08:00
Kefu Chai
960032e513
qa/tasks: update tests with helper to wait for pg-stats
...
and remove unused helpers
Fixes: http://tracker.ceph.com/issues/19594
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-20 09:35:05 +08:00
Kefu Chai
1207caf3a2
qa/tasks/ceph_manager: add a "wait_for_pg_stats()" decorator
...
and accompany it with two helpers to access the pg stats in a more
natural way
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-20 09:35:04 +08:00
Josh Durgin
a219319137
qa/tasks/rados: test sparse reads with ec overwrites
...
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2017-04-19 17:45:43 -07:00
Josh Durgin
6fba80c1fa
osd, OSDMonitor, qa: mark ec overwrites non-experimental
...
Keep the pool flag around so we can distinguish between a pool that
should maintain hashes for each chunk, and a missing one is a bug, vs
an overwrites pool where we rely on bluestore checksums for detecting
corruption.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2017-04-19 17:45:43 -07:00
Patrick Donnelly
0b420be7e9
mds: add export_pin feature
...
This allows the client/admin to pin a directory tree to a particular rank,
preventing its export by the dynamic balancer.
Fixes: http://tracker.ceph.com/issues/17834
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-04-19 18:21:19 -04:00
Sage Weil
ee1bb01a54
Merge pull request #14556 from liewegas/wip-pgupmap
...
osd: pg-remap -> pg-upmap
Reviewed-by: David Zafman <dzafman@redhat.com>
2017-04-19 17:07:01 -05:00
Zack Cerza
28d746bff3
Merge pull request #14464 from ceph/wip-systemd
...
qa/tasks: use sudo to check ceph health for systemd test
2017-04-18 11:34:27 -06:00
Sage Weil
ce188e8fdf
osd: pg-remap -> pg-upmap
...
'remap' is to non-specific a name. In particular, it
sounds like it is related to the 'remapped' PG state
but in reality it is not related.
'upmap' or 'pg-upmap' is more specific: it maps a pgid
to the 'up' set value (or item)
Signed-off-by: Sage Weil <sage@redhat.com>
2017-04-18 12:59:40 -04:00
Casey Bodley
da7acc4211
Merge pull request #13597 from cbodley/wip-s3tests-crypto
...
qa/rgw: add configuration for server-side encryption tests
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
2017-04-18 12:28:37 -04:00
Kefu Chai
1b54b5f3f1
Merge pull request #14415 from smithfarm/wip-19556
...
tests: Thrasher: handle "OSD has the store locked" gracefully
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-04-18 23:18:35 +08:00
John Spray
033ee6bd1f
Merge pull request #14396 from jcsp/wip-19550
...
qa: re-enable ENOSPC tests for kclient
2017-04-18 12:59:14 +01:00
John Spray
d98e19fdbd
Merge pull request #14589 from jcsp/wip-19640
...
client: refine fsync/close writeback error handling
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2017-04-18 12:58:37 +01:00
John Spray
a2a100dc13
Merge pull request #14272 from jcsp/wip-vstart-fixup
...
qa: fix test_standby_for_invalid_fscid with vstart_runner
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-04-18 12:50:20 +01:00
John Spray
1a69bec52f
client: refine fsync/close writeback error handling
...
Previously, errors stuck indelibly to the inode, which
meant that a close call would see an error even if the
user already dutifully fsync()'d and handled it.
We should emit each error only once per file handle.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-18 07:47:10 -04:00
Orit Wasserman
cb94e5ad3f
Merge pull request #12535 from ceph/wip-rgw-multisite-teuthology
...
rgw: multisite enabled over multiple clusters
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2017-04-18 11:47:48 +03:00
David Zafman
a5731076ad
osd: Handle backfillfull_ratio just like nearfull and full
...
Add BACKFILLFULL as a local OSD cur_state
Notify monitor of this new fullness state
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
John Spray
dd43d3bc64
qa/cephfs: use getfattr/setfattr helpers
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-14 06:38:48 -04:00
John Spray
61617f8f10
qa: add test for reading quotas from different clients
...
Fixes: http://tracker.ceph.com/issues/17939
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-14 06:38:48 -04:00
Sage Weil
5ca72c1193
qa/tasks/exec_on_cleanup.py: add
...
Signed-off-by: Sage Weil <sage@redhat.com>
2017-04-13 17:11:19 -04:00
Ali Maredia
b31b84529e
rgw multisite: use get_config_master_client for radosgw_admin task
...
Signed-off-by: Ali Maredia <amaredia@redhat.com>
2017-04-13 12:15:50 -04:00
Ali Maredia
c5956790e6
rgw: multisite enabled over multiple clusters
...
Added '--cluster' to all necessary commands
ex: radosgw-admin, rados, ceph, made sure
necessary checks were in place so that clients
can be read with our without a cluster_name
preceeding them
Made master_client defined in the config for
radosgw-admin task
Signed-off-by: Ali Maredia <amaredia@redhat.com>
2017-04-13 12:15:50 -04:00
Vasu Kulkarni
7af157ad4c
use sudo to check check health
...
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-04-11 13:52:26 -07:00
Nathan Cutler
a5b19d2d73
tests: Thrasher: handle "OSD has the store locked" gracefully
...
On slower machines (VPS, OVH) it takes time for the OSD to go down.
Fixes: http://tracker.ceph.com/issues/19556
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-04-11 16:09:45 +02:00
John Spray
d529121b60
Merge pull request #10636 from fullerdj/wip-djf-15069
...
cephfs: Permit recovering metadata into a new RADOS pool
Reviewed-by: John Spray <john.spray@redhat.com>
2017-04-10 13:52:04 +01:00
John Spray
fb046b9730
qa/tasks/cephfs: update kernel_mount for debugfs format
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-09 18:13:29 +01:00
Vasu Kulkarni
73cccd4115
push keys on node using admin command
...
will test admin command and is now needed due to create-keys change
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-04-07 12:39:19 -07:00
John Spray
e0833965b6
qa: re-enable ENOSPC tests for kclient
...
Fixes: http://tracker.ceph.com/issues/19550
Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-07 14:45:30 +01:00
Kefu Chai
24e69d79e7
Merge pull request #14281 from tchaikov/wip-19429
...
qa/tasks/workunit.py: use "overrides" as the default settings of workunit
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-04-05 10:01:27 +08:00
Douglas Fuller
37bafff9f4
qa/cephfs: Add test for rebuilding into an alternate metadata pool
...
Add a test to validate the ability of cephfs_data_scan and friends to
recover metadata from a damaged CephFS installation into a fresh metadata
pool.
cf: http://tracker.ceph.com/issues/15068
cf: http://tracker.ceph.com/issues/15069
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
2017-04-04 12:29:01 -07:00
Casey Bodley
9730fec922
qa: s3test task scans radosgw logs for leaked encryption keys
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-04-03 10:44:58 -04:00
John Spray
13e8315d1a
Merge pull request #13862 from jcsp/wip-16523
...
qa, mds: add checks for fragmentation, and enable it by default
2017-04-03 11:56:37 +01:00
Kefu Chai
47080150a1
qa/tasks/workunit.py: use "overrides" as the default settings of workunit
...
otherwise the settings in "workunit" tasks are always overridden by the
settings in template config. so we'd better follow the way of how
"install" task updates itself with the "overrides" settings: it uses the
"overrides" as the *defaults*.
Fixes: http://tracker.ceph.com/issues/19429
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-02 12:26:30 +08:00
vasukulkarni
574049a90b
Merge pull request #14229 from ceph/wip-systemd
...
qa: Add reboot case for systemd test
2017-03-31 09:15:53 -07:00
John Spray
992b8499d0
Merge pull request #14254 from idryomov/wip-vstart-runner-ps
...
qa/vstart_runner: amend ps invocation
Reviewed-by: John Spray <john.spray@redhat.com>
2017-03-31 17:15:30 +01:00
John Spray
bf39f561e9
qa: fix test_standby_for_invalid_fscid with vstart_runner
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-31 12:13:57 -04:00
Kefu Chai
9ca7ccf5f1
tasks/workunit.py: specify the branch name when cloning a branch
...
c1309fb
failed to specify a branch when cloning using --depth=1, which
by default clones the HEAD. and we can not "git checkout" a specific
sha1 if it is not HEAD, after cloning using '--depth=1', so in this
change, we dispatch "tag", "branch", "HEAD" using three Refspec classes.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Dan Mick <dan.mick@redhat.com>
2017-03-30 20:30:09 -07:00
Sage Weil
578b0f7cfc
Merge pull request #13617 from liewegas/wip-mgr-commands
...
mon,mgr: tag some commands for ceph-mgr
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-03-30 17:12:00 -05:00
Ilya Dryomov
8d8cd4e4d5
qa/vstart_runner: amend ps invocation
...
"ps -xwwu<id>" is parsed as BSD, because -x is not a UNIX option.
"u" is a BSD option for user-oriented format, so the <id> ends up being
parsed as an old-style "select by pid". The only reason this command
doesn't dump other user's processes is that the BSD "only yourself"
restriction is in effect.
I'm not sure what's wrong with a simple "ps xww", but if we want to
select by euid, let's do it right.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-03-30 19:36:43 +02:00
Vasu Kulkarni
7b587304a5
Add reboot case for systemd test
...
test systemd units restart after reboot
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
2017-03-29 10:30:49 -07:00
Sage Weil
5dc9b8d026
qa/tasks/dump_stuck.py: stop making assertions about 'health' report
...
Health comes from teh mon, while the pg stats come from teh mgr, so they
may be out of sync.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:27 -04:00
Sage Weil
fa0b2164ad
qa/tasks/ceph.py: add 'skip_mgr_daemons' option
...
For upgrades
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:26 -04:00
Sage Weil
7edca203d8
qa/tasks/ceph.py: give everyone mgr caps
...
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:26 -04:00
Dan Mick
c1309fbef3
tasks/workunit.py: when cloning, use --depth=1
...
Help avoid killing git.ceph.com. A depth 1 clone takes about
7 seconds, whereas a full one takes about 3:40 (much of it
waiting for the server to create a huge compressed pack)
Signed-off-by: Dan Mick <dan.mick@redhat.com>
2017-03-28 20:09:44 -07:00
John Spray
e90e37690a
qa/tasks: add check_counter.py
...
We need this for CephFS, to verify that workloads
we expect to do a particular thing (like directory fragmentation
or metadata exports) are really doing it.
This is for giving us confidence in our coverage of these
features rather than testing them per se.
Fixes: http://tracker.ceph.com/issues/16523
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-28 23:26:34 +01:00
Sage Weil
2a08cbbed5
qa/tasks/thrashosds,ceph_manager: thrash pg_remap[_items]
...
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-28 10:12:10 -04:00
Casey Bodley
e3e3a71d1f
qa: rgw task uses period instead of region-map
...
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2017-03-20 11:50:03 -04:00
Kefu Chai
bd36f13163
doc: fix the links to http://ceph.com/docs
...
they should point to http://docs.ceph.com/docs/master/ .. instead
Fixes: http://tracker.ceph.com/issues/19090
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-03-15 16:40:07 +08:00
Yehuda Sadeh
515db13970
qa/tasks/radosgw_admin: adjust test to new bucket structure
...
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2017-03-09 09:18:56 -08:00
John Spray
41f8ded3e7
qa: update TestDamage for PurgeQueue
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:27:03 +00:00
John Spray
1a1951002d
qa: update TestFlush for changed stray perf counters
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:27:03 +00:00
John Spray
6cf9c2956c
qa: add TestStrays.test_purge_queue_op_rate
...
For ensuring that the PurgeQueue code is not generating
too many extra IOs.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:27:02 +00:00
John Spray
3e66de2182
mds: create purge queue if it's not found
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:26:59 +00:00
John Spray
f826c7e8aa
qa/cephfs: add TestStrays.test_purge_on_shutdown
...
...and change test_migration_on_shutdown to
specifically target non-purgeable strays (i.e.
hardlink-ish things).
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:26:55 +00:00
John Spray
3970502c9b
qa: update test_strays for purgequeue
...
Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-08 10:20:59 +00:00
Sage Weil
7fbe8fb085
Merge pull request #13759 from liewegas/wip-19133
...
osdc/Objecter: resend RWORDERED ops on full
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2017-03-07 21:31:50 -06:00
Sage Weil
296708091c
qa/tasks/ceph_manager: use new luminous set-full-ratio etc
...
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 16:39:09 -05:00
Sage Weil
a202b68d18
qa/tasks/thrashosds: chance_thrash_cluster_full
...
Induce a momentarily full cluster.
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 13:33:44 -05:00
Radoslaw Zarzynski
6440750f53
qa/tasks/rgw.py: start Apache before RadosGW.
...
At the end of start_rgw() we wait till establishing HTTP connections
with RadosGW become possible. However, if RadosGW uses the FastCGI,
the condition can't be fulfilled without spawning HTTP server first.
Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
2017-03-07 17:31:52 +01:00
John Spray
73100305e5
Merge pull request #13262 from batrick/multimds-thrasher
...
Add multimds:thrash sub-suite and fix bugs in thrasher for multimds
Reviewed-by: John Spray <john.spray@redhat.com>
2017-03-07 14:29:18 +00:00
John Spray
39204abeda
Merge pull request #13282 from jcsp/wip-fuse-mount-teardown
...
tasks/cephfs: tear down on mount() failure
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-02-28 15:04:59 +00:00
Kefu Chai
edceabbd47
qa/tasks/workunit: use ceph.git as an alternative of ceph-ci.git for workunit repo
...
if we run upgrade test, where, for example, "jewel" is not in
ceph-ci.git repo, we should check ceph.git to clone the workunits.
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-27 17:36:05 +08:00
Sage Weil
af5dab0613
Merge pull request #13649 from liewegas/wip-ceph-scrub-debug
...
qa/tasks/ceph.py: debug which pgs aren't scrubbing
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
2017-02-25 13:15:06 -06:00
Sage Weil
f777d849e7
qa/tasks/ceph.py: debug which pgs aren't scrubbing
...
Signed-off-by: Sage Weil <sage@redhat.com>
2017-02-24 23:07:34 -05:00
Samuel Just
44b26f6ab4
Merge pull request #13594 from athanatos/wip-snap-trim-sleep
...
osd: add snap trim reservation and re-implement osd_snap_trim_sleep
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-02-24 14:09:17 -08:00
Kefu Chai
4cf28de4c9
qa/tasks/workunit: use the suite repo for cloning workunit
...
as "workunits" reside in ceph/qa/workunits, it's more intuitive to
respect suite-repo option when cloning workunits.
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-24 16:47:47 +08:00
John Spray
de5249436c
Merge pull request #13359 from jcsp/wip-logrotate-sshexception
...
qa: handle SSHException in logrotate
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-02-22 10:05:07 +00:00
Kefu Chai
b3e516fc38
Merge pull request #13518 from tchaikov/wip-fix-pgp-num
...
test: Thrasher: do not update pools_to_fix_pgp_num if nothing happens
Reviewed-by: Sage Weil <sage@redhat.com>
2017-02-21 00:46:26 +08:00
Kefu Chai
c0f0cde399
test: Thrasher: do not update pools_to_fix_pgp_num if nothing happens
...
we should not update pools_to_fix_pgp_num if the pool is not expanded or
the pg_num is not increased due to pgs being created. this prevent us
from fixing the pgp_num after done with thrashing if we actually did
nothing when fixing the pgp_num when thrashing, but we removed the pool
from pools_to_fix_pgp_num after set_pool_pgpnum() returns.
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-19 13:10:46 +08:00
Sage Weil
86c0d07e32
qa/tasks/ceph.py: fix timing of wait-for-* and osd markdown
...
Mark down osds, *then* wait for them to come up or for the cluster to be
healthy!
Signed-off-by: Sage Weil <sage@redhat.com>
2017-02-18 21:12:23 -05:00
Sage Weil
96bc86b537
Revert "qa/tasks/workunit: use the suite repo for cloning workunit"
2017-02-17 11:54:27 -06:00
Kefu Chai
1f82b9b944
qa/tasks/workunit: use the suite repo for cloning workunit
...
as "workunits" reside in ceph/qa/workunits, it's more intuitive to
respect suite-repo option when cloning workunits.
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-16 15:05:51 +08:00
Samuel Just
4aebf59d90
rados: check that pool is done trimming before removing it
...
Signed-off-by: Samuel Just <sjust@redhat.com>
2017-02-13 09:47:02 -08:00
Kefu Chai
de59b5102c
test: Thrasher: restore changed options after done with thrash
...
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:51 +08:00
Kefu Chai
761a1dc391
tests: Thrasher: extract _set_config() method
...
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
Kefu Chai
995e144e3e
tests: CephManager: add get_config() method
...
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
Kefu Chai
136483a8f9
test: Thrasher: update pgp_num of all expanded pools if not yet
...
otherwise wait_until_healthy will fail after timeout as seeing warning
like:
HEALTH_WARN pool cephfs_data pg_num 182 > pgp_num 172
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-13 09:25:50 +08:00
John Spray
880cbf09aa
Merge pull request #13137 from jcsp/wip-18661
...
qa: fix race in Mount.open_background
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-02-10 17:48:05 +00:00
John Spray
a3fd3f225c
Merge pull request #13099 from jcsp/wip-18663
...
qa/tasks: force umount during kclient teardown
2017-02-10 17:42:37 +00:00
John Spray
6f9e11f03d
qa: handle SSHException in logrotate
...
Yet another different type of exception we may get when
orchestra.run can't talk to a remote host.
Signed-off-by: John Spray <john.spray@redhat.com>
2017-02-10 17:16:24 +00:00
Nathan Cutler
6b7443fb50
tests: drop buildpackages.py
...
The buildpackages suite has been moved to teuthology. This cleans up a file
that was left behind by https://github.com/ceph/ceph/pull/13297
Fixes: http://tracker.ceph.com/issues/18846
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-02-08 21:23:54 +01:00
Loic Dachary
5a43f8d579
buildpackages: remove because it does not belong
...
It should live in teuthology, not in Ceph. And it is currently broken:
there is no need to keep it around.
Fixes: http://tracker.ceph.com/issues/18846
Signed-off-by: Loic Dachary <loic@dachary.org>
2017-02-07 18:37:26 +01:00
John Spray
6203f33df4
tasks/cephfs: tear down on mount() failure
...
There were some cases where we would leave a mountpoint
that would cause the teuthology teardown to get hung up
when it tried to look inside cephtest/
Signed-off-by: John Spray <john.spray@redhat.com>
2017-02-06 22:53:21 +00:00
Patrick Donnelly
d748226f00
qa: add DaemonWatchdog to stop tests on failure
...
Thrashing MDS will often result in failures which often do not stop the
test. The failure may also cause the test to stall which will force the
machines to needlessly be locked until a timeout is reached. This
watchdog will unmount mounts and kill daemons when a failure is
detected.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
f005e8af6b
qa: disable max_mds changes during thrashing
...
While the trasher supports the behavior desired by issue 10792 [1], the
bugs uncovered due to deactivating MDS (and sometimes killing
deactivating MDS) are presently a distraction from addressing issues
during normal failures. So now thrashing max_mds is turned off by
default. I have added a TODO to deactivate ranks in order (configurably)
as random deactivation causes a lot of other problems.
This also fixes a bug: random.randrange(0.0, 1.0) always returns 0.
Oops.
[1] http://tracker.ceph.com/issues/10792
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
82662edd7f
qa: do not pretty the json to shorten stdout log
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
a0052fc2d6
qa: use gevent.sleep so greenlet yields
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:14 -05:00
Patrick Donnelly
cf9e0da078
qa: use fs methods for setting configs
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
0098873fb7
qa: remove old comment
...
Filesystem is now cluster aware.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
fd4b61890d
qa: allow revived MDS to be up:active
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
884215d933
qa: timeout waiting for thrashed MDS to revive
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
8e9ea7b6ac
qa: configure thrashing while MDS are stopping
...
Currently multimds is prone to many failures when killing an active or
stopping MDS when there are MDS in the cluster which have been
deactivated (stopping). Have this turned off by default for now.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
6304b6ed5d
qa: add deactivation log message
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:13 -05:00
Patrick Donnelly
1185326c45
qa: avoid infinite wait if no repl. can be made
...
The thrasher can enter an infinite loop waiting for an MDS to take a
certain rank when a replacement may not be possible. For example,
max_mds actives are already running.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:12 -05:00
Patrick Donnelly
638bccb2bb
qa: timeout thrasher if fs does not stabilize
...
After 5 minutes of waiting, it's reasonable to stop as the cluster is
probably stuck.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:12 -05:00
Patrick Donnelly
8f3e745344
qa: check replacement MDS is active in thrasher
...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:12 -05:00
Patrick Donnelly
19289725c8
qa: handle thrashing ranks with holes
...
During the course of thrashing max_mds, the ranks assigned to MDSs may
develop holes. This causes the thrasher to try to wrongly deactivate
ranks that are not assigned.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-02-06 14:07:12 -05:00
Nathan Cutler
db2582e25e
tests: fix regression in qa/tasks/ceph_master.py
...
https://github.com/ceph/ceph/pull/13194 introduced a regression:
2017-02-06T16:14:23.162 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 722, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 839, in do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/ceph_manager.py", line 305, in kill_osd
output = proc.stderr.getvalue()
AttributeError: 'NoneType' object has no attribute 'getvalue'
This is because the original patch failed to pass "stderr=StringIO()" to run().
Fixes: http://tracker.ceph.com/issues/16263
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-06 19:37:38 +01:00
Sage Weil
5fc3dd36e2
Merge pull request #13237 from smithfarm/wip-18799
...
tests: Thrasher: eliminate a race between kill_osd and __init__
Reviewed-by: Sage Weil <sage@redhat.com>
2017-02-05 12:49:30 -06:00