Commit Graph

78 Commits

Author SHA1 Message Date
Kefu Chai
63840ffaba qa: timeout if flush_pg_stats() takes too long
a "timeout" which defaults to 300 seconds is added to flush_pg_stats()

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-12 19:32:11 +08:00
Sage Weil
fca1721247 Merge pull request #15437 from dachary/wip-bluestore
ceph-disk: add --filestore argument, default to --bluestore

Reviewed-by: Sage Weil <sage@redhat.com>
2017-06-07 09:02:29 -05:00
Kefu Chai
a52445e3c8 qa/workunits/ceph-helpers.sh: use syntax understood by jq 1.3
trusty still ships jq 1.3 which does not offer "first". see
https://stedolan.github.io/jq/manual/v1.3/ .

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-07 09:32:21 +08:00
Sage Weil
5cfe4cfa13 ceph-disk: add --filestore argument, default to --bluestore
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
2017-06-06 19:45:24 +02:00
Kefu Chai
46bf019cbe test: switch from xmlstartlet to jq
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:52 -04:00
Kefu Chai
30f0ae0496 qa/workunites/ceph-helpers.sh: move flush_pg_stats() here
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:47 -04:00
Kefu Chai
62d1960cb9 test: pass mon_pg_warn_min_per_osd=3 to mgr also
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-06-02 13:02:43 -04:00
Sage Weil
07ddeb24c7 qa/workunites/ceph-helpers.sh: do not bail when num_pg==0
Right after the cluster is created when the first mgr report hasn't come
in yet we will report 0 pgs.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-06-02 13:02:11 -04:00
Willem Jan Withagen
f28f4cbc20 ./qa/workunits/ceph-helpers.sh: Do not trace kill_daemon
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2017-05-26 12:07:28 +02:00
Willem Jan Withagen
e07f9ccb13 qa/workunits/ceph-helpers.sh: introduce (and use) wait_for_health
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-20 15:39:34 -07:00
Josh Durgin
3ca750d41d test/osd/osd-scrub-repair.sh: add ec overwrites test cases
Move pool and profile creation into a single function, and
add a 'allow_overwrites' parameter for it so each ec test
can be paramterized by it.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2017-04-19 17:45:43 -07:00
David Zafman
a5731076ad osd: Handle backfillfull_ratio just like nearfull and full
Add BACKFILLFULL as a local OSD cur_state
Notify monitor of this new fullness state

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
Kefu Chai
6cb4503a40 qa/workunits/ceph-helpers: do not error out if is_clean
it would be a race otherwise, because we cannot be sure that the cluster
pgs are not all clean or not when run_osd() returns, but we can be sure
that they are expected to active+clean after a while. that's what
wait_for_clean() does.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-12 17:32:44 +08:00
Kefu Chai
0196e154ed qa/workunits/ceph-helpers: display rejected string
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-12 16:34:51 +08:00
Sage Weil
83b19dd1f1 qa/workunits/ceph-helpers: start and stop mgr daemons
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-29 11:39:25 -04:00
Willem Jan Withagen
0a91b76f2f test: use gsed on FreeBSD for inplace editting
- FreeBSD sed(1) requires a extension on -i
   so replace the usuage with GNU sed: gsed

Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2017-03-19 18:02:26 +01:00
Joao Eduardo Luis
3cabcb7d51 qa/workunits/ceph-helpers: add wait_for_quorum()
Takes optional timeout and desired quorum size

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
2017-03-02 17:32:34 +00:00
Kefu Chai
389bd00da3 tests: ceph-helpers.sh reduce get_timeout_delays() verbosity
`set +o` prints out the full command line which is echoed if "xtrace" is
enabled. this increases the verbosity of get_timeout_delays().
in this change, we follow the way of kill_daemons() to kill the extra
output. see aefcf6d.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-02-04 17:10:49 +08:00
David Zafman
1009a16291 wait_for_clean: Racing with pg creation might cause increasing num PGs
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-01-10 09:43:09 -08:00
David Zafman
64a7012e98 test: Add test for keeping deep-scrub information
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-12-09 16:51:20 -08:00
Loic Dachary
d5d7e3665f Merge pull request #12180 from tchaikov/wip-silence-get_timeout_delays
tests: disable the echo when running get_timeout_delays()

Reviewed-by: Loic Dachary <ldachary@redhat.com>
2016-11-29 09:06:43 +01:00
Loic Dachary
f491ea062d tests: facilitate background process debug in ceph-helpers.sh
When displaying the output of a background process, do it on stderr so
that it is not bufferized. Otherwise the output of the background
process may be displayed after it completed.

Prefix the output of a background process with the PID of the process
known to the parent instead of the PID of the awk process processing the
output. When wait_background loops, it will print the process on which
it is waiting and it is confusing that they do not match with the PID
prefixing the process output.

Refs: http://tracker.ceph.com/issues/17830

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-11-24 19:52:14 +01:00
Kefu Chai
1b9bc0501c tests: disable the echo when running get_timeout_delays()
this function is very distracting when one is looking at the log

Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-11-25 00:17:23 +08:00
Loic Dachary
cca0f59156 Merge pull request #12085 from wjwithagen/wip-freebsd-ceph-helpers-2
workunits/ceph-helpers.sh: Fixes for FreeBSD

Reviewed-by: Loic Dachary <ldachary@redhat.com>
2016-11-24 08:01:51 +01:00
Willem Jan Withagen
e4629b3397 workunits/ceph-helpers.sh: Fixes for FreeBSD
- stat(1) does not have '%T'

Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2016-11-24 01:57:05 +01:00
David Zafman
dcb5fb9b5a test: CLEANUP: Make wait_for_clean() clearer changing variable name
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-11-22 21:38:42 -08:00
David Zafman
c1eb8746bc test: Return wait_for_clean() to start sleeping at .1
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-11-22 21:36:13 -08:00
David Zafman
453942946a test: Enhance get_timeout_delays()
Do all math using bc so we can have fractions
Allow caller to specify the first step (default 1)
Add testing of fractional first step

Signed-off-by: David Zafman <dzafman@redhat.com>
2016-11-22 21:32:34 -08:00
Kefu Chai
23c21238b8 Merge pull request #12005 from wjwithagen/wip-wjw-freebsd-ceph-helpers
workunits/ceph-helpers.sh: FreeBSD returns a different errorstring.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-11-21 22:53:23 +08:00
Loic Dachary
5e625674a8 tests: fix ceph-helpers.sh wait_for_clean delays
The TENTH_TIMEOUT was not delcared as an int and failed to be set with
the correct number. The test of the function did not catch this.

Implement computing of the increasingly large sleep delays in a separate
function so that it can be tested more easily. Give up on sub-second
sleep because a the function will not sleep at all if the cluster is
already clean. And if it is not already clean, it is very unlikely to
become clean within less than a second. The downside of having very
short sleep time is that it needlessly stress the machine and also
possibly spam the logs.

Refs: http://tracker.ceph.com/issues/17830

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-11-21 11:42:42 +01:00
Loic Dachary
cd72ff9f74 tests: save 9 characters for asok paths
For vstart.sh powered tests, save 9 characters in the path name
by replacing testdir/test- with td/t-

60 characters imposed by jenkins
9 characters for src/test
5 characters for td/t-

33 left (instead of 24) for the test to create asok such as out/client.admin.25327.asok

Moving these files outside of the build directory is a bad idea because
tests should only create/use files within the builddir and not write
outside of this directory. Doing so would make things more complicated
for cleanup in case the test fail and create other problems as a
consequence (filling out disk space, conflicting directories between
runs etc.).

For ceph-helpers.sh tests replace testdir with td, saving 5 characters.
This is not strictly necessary but keeps the directory names consistent:
if the developer wants to get rid of all the test leftovers, it is
enough to remove the a single directory: td.

Fixes: http://tracker.ceph.com/issues/16014

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-11-18 09:19:18 +01:00
Kefu Chai
2c7f08b849 Merge pull request #9613 from dzafman/wip-16064
common osd: Improve scrub analysis, list-inconsistent-obj output and osd-scrub-repair test

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-11-17 15:48:32 +08:00
David Zafman
1a75696065 test: activate_osd() doesn't need to set crush
Tests use objectstore_tool() which stops and starts OSDs,
but may assume consistency of object locations.

Signed-off-by: David Zafman <dzafman@redhat.com>
2016-11-16 11:01:43 -08:00
David Zafman
f3def4a0e5 test: wait_for_clean() add sleep backoff
Reduce size of log on timeout by doing a backoff so that
we don't log 3000 loops at 1/10 second sleeps.

Signed-off-by: David Zafman <dzafman@redhat.com>
2016-11-16 11:01:43 -08:00
Willem Jan Withagen
e34e18609f workunits/ceph-helpers.sh: FreeBSD returns a different errorstring.
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2016-11-15 12:10:18 +01:00
Sage Weil
573e5b060e qa/workunits/ceph-helpers.sh: allow pool deletes
Signed-off-by: Sage Weil <sage@redhat.com>
2016-11-10 11:43:41 -05:00
David Zafman
907e79e2b7 test: Add test support for deep-scrub
Signed-off-by: David Zafman <dzafman@redhat.com>
2016-11-08 15:16:52 -08:00
Kefu Chai
b975e85afa test: re-enable test_pg_scrub() test in ceph-helper.sh
this reverts d053705. i disabled this test in hope to bisect the
offending tests that fail the mysterious jenkins failure, which was
fixed by 6f3ce3a.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-09-23 14:50:03 +08:00
Loic Dachary
a5e5119bd1 test: timeout verification that mon is unreachable
Without a timeout on the command, it may hang for a very long time,
hunting for new mons. If it hangs for more than 60 seconds, it is
safe to assume the mon is indeed down.

Fixes: http://tracker.ceph.com/issues/16477

Signed-off-by: Loic Dachary <loic@dachary.org>
2016-09-16 14:20:38 +02:00
Kefu Chai
d053705c03 qa/workunits/ceph-helpers.sh: disable test_pg_scrub()
Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-07-07 23:17:11 +08:00
Kefu Chai
f7331fdc3f Merge pull request #8786 from tchaikov/wip-btrfs-sudo
test: sudo to rm btrfs subvol

Reviewed-by: Erwan Velu <erwan@redhat.com>
2016-05-09 22:38:07 +08:00
Sage Weil
475cc08c33 qa/workunits/ceph-helpers.sh: make ceph-osd behave on ext4
Signed-off-by: Sage Weil <sage@redhat.com>
2016-04-30 17:12:05 -04:00
Kefu Chai
a5b2658f9f test: sudo to rm btrfs subvol
"btrfs subvolume {list,delete}" needs root privilege even if the current
user owns this subvol. one can only list/delete he/she is the root, or
if the btrfs volume was mounted with "-o user_subvol_rm_allowed".

Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-04-28 02:05:26 +08:00
Sage Weil
bbdec192f8 Merge pull request #8691 from flyd1005/master
cleanup: Fix typos, change prefered to preferred

Reviewed-by: Sage Weil <sage@redhat.com>
2016-04-22 10:37:48 -04:00
Sage Weil
5698cd4889 Merge pull request #8530 from wjwithagen/patch-6
ceph-helpers.sh: only use mon*pid files when killing MONs

Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-04-22 09:35:41 -04:00
Li Peng
88ae8c38d0 Fix typos, change prefered to preferred 2016-04-22 15:18:44 +08:00
Ali Maredia
e0f400fdef cmake: test-ceph-helpers working
Moved all the libraries in CMAKE_BINARY_DIR/lib
and all the binaries in CMAKE_BINARY_DIR/bin. Set
various environment variables for test-ceph-helpers.
Put those variables throughout
qa/workunits/ceph-helpers.sh.

NOTE: This is a very rough draft of these fixes.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2016-04-14 20:48:19 -04:00
Willem Jan Withagen
d5376c545b ceph-helpers.sh: only use mon*pid files when killing MONs
FreeBSD once in a while forgets to remove *pid files (this is probably a bug).
But taking care of it this way is probably much in line of what is actually needs to be done

Signed-off-by: Willem Jan Withagen wjw@digiware.nl
2016-04-11 11:45:29 +02:00
Erwan Velu
0eea2436d9 tests: Optimizing kill_daemons() sleep time
It could sounds like nothing but the actual sleeping rampup is counter
productive.

The code does : kill <proc>; sleep 0; kill <proc>; sleep 0; kill <proc;
sleep 1; and then it grows up 120 seconds by a smooth rampup.

But actually there is almost no chance the process dies so fast meaning
that by default we switch to the sleep 1.

Moving from sleep 0 to sleep 1 doesn't seems a big win but as
kill_daemons() is called very often we can save a lot of time by then
end.

This patch offer to sleep first a 1/10th of second instead of 0 and then
1/20th of second instead of 0.

The sleep call is also moved after the kill call as it's not necessary
waiting before executing the command.

This patch makes the running time of a test like osd-scrub-repair.sh
dropping from 7m30 to 7m7.

Saving another ~30seconds is an interesting win at make check level.
Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
84197f1641 tests: Optimizing wait_for_clean()
wait_for_clean() is a very common call when running the make check.
It does wait the cluster to be stable before continuing.

This script was doing the same calls twice and could be optimized by
making the useful calls only once.

is_clean() function was checking num_pgs & get_num_active_clean()
The main loop itself was also calling get_num_active_clean()

This patch is inlining the is_clean() inside this loop to benefit from a
single get_num_active_clean() call. This avoid a useless call of (ceph +
xmlstarlet).

This patch does move all the 'timer reset' conditions into an else
avoiding spawning other ceph+xmlstarlet call while we already know we
should reset the timer.

The last modification is to reduce the sleeping time as the state of the
cluster is changing very fast.

This whole patch could looks like almost not a big win but for a test
like test/osd/osd-scrub-repair.sh, we drop from 9m56 to 9m30 while
reducing the number system calls.

At the scale of make check, that's a lot of saving.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00