in cf24535, we use $CEPH_ROOT to specify the $top_srcdir to unify
cmake and autotools, but this breaks ceph-qa-suite/tasks/workunit.py,
as it only clones the necessary qa/workunits directory, and does not
pass $CEPH_ROOT to the test scripts. so we need to set a default
$CEPH_ROOT if it is not set.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Replaced relative paths in test/cephtool-test-mon.sh,
qa/workunits/cephtool/test.sh, and test/cephtool-test-mon.sh
to work with CEPH_FOO environment variables set in cmake.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Replaced relative paths in encode-decode-non-regression.sh
to work with CEPH_FOO environment variables set in
cmake.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Moved all the libraries in CMAKE_BINARY_DIR/lib
and all the binaries in CMAKE_BINARY_DIR/bin. Set
various environment variables for test-ceph-helpers.
Put those variables throughout
qa/workunits/ceph-helpers.sh.
NOTE: This is a very rough draft of these fixes.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
FreeBSD once in a while forgets to remove *pid files (this is probably a bug).
But taking care of it this way is probably much in line of what is actually needs to be done
Signed-off-by: Willem Jan Withagen wjw@digiware.nl
Protect a number of unstable/experimental features behind durable flags
https://github.com/ceph/ceph/pull/8383
Reviewed-by: John Spray <john.spray@redhat.com>
As a precaution to using cleanup for mass deletion of other
objects, only allow --prefix which begins with "benchmark_data."
Signed-off-by: David Zafman <dzafman@redhat.com>
Method preprocess_remove_snaps() is designed to fast check whether
we can safely handle a remove-snaps-request without changing the osdmap.
The original design is to be able to handle snaps from multiple pools,
including those snaps even from a non-existent pool by simply skipping
over them. However, this method will quit on successfully detecting
any vaild snap which is truly needed to be removed and forward this
request to prepare_remove_snaps() for further processing.
From the above analysis, the prepare_remove_snaps() method will
theoretically also encounter some snaps which possibly belong to
non-existent pools.
This pr solves the above problem by adding a sanity check against
pool existense associated with the specified snap to be removed, which
shall be considered as a defensive move and makes prepare_remove_snaps()
stronger.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
The current code was waiting 10s to expect the file being put.
If the file was put in a shorter time than 10s, the test just waits for
nothing reducing the execution speed of that test.
This patch simply check if the file is actually available every second
during 10sec to exit prematurely.
This patch saves exactly 10 sec on a local system, surely a little bit
less on an infra but still saves time.
Signed-off-by: Erwan Velu <erwan@redhat.com>
The actual code double the wait time between two calls leading to a
possible 511s of waiting time which sounds a little bit excessive.
This patch offer to reduce the global wait time to 300s and test more
often the rados status to exit the loop earlier. In a local test, that
saves 6 secs per run.
Signed-off-by: Erwan Velu <erwan@redhat.com>
ceph_watch_wait() is doing a sleep _before_ doing the test which could
stop this loop.
It's better doing the action first as it could exit immediately and
avoid a useless sleep.
That's a minor optimization but everything count when trying to get
something smooth.
Signed-off-by: Erwan Velu <erwan@redhat.com>
OSDs are taking some time to be up but waiting 10 secs seems execessive
here between two loops. In the worst case, we can be in a situation of
waiting 10secs for nothing as we are just a few microsecs after the osd
is up.
This patch simply reduce the sleep from 10 to 1 seconds.
Signed-off-by: Erwan Velu <erwan@redhat.com>
It could sounds like nothing but the actual sleeping rampup is counter
productive.
The code does : kill <proc>; sleep 0; kill <proc>; sleep 0; kill <proc;
sleep 1; and then it grows up 120 seconds by a smooth rampup.
But actually there is almost no chance the process dies so fast meaning
that by default we switch to the sleep 1.
Moving from sleep 0 to sleep 1 doesn't seems a big win but as
kill_daemons() is called very often we can save a lot of time by then
end.
This patch offer to sleep first a 1/10th of second instead of 0 and then
1/20th of second instead of 0.
The sleep call is also moved after the kill call as it's not necessary
waiting before executing the command.
This patch makes the running time of a test like osd-scrub-repair.sh
dropping from 7m30 to 7m7.
Saving another ~30seconds is an interesting win at make check level.
Signed-off-by: Erwan Velu <erwan@redhat.com>
wait_for_clean() is a very common call when running the make check.
It does wait the cluster to be stable before continuing.
This script was doing the same calls twice and could be optimized by
making the useful calls only once.
is_clean() function was checking num_pgs & get_num_active_clean()
The main loop itself was also calling get_num_active_clean()
This patch is inlining the is_clean() inside this loop to benefit from a
single get_num_active_clean() call. This avoid a useless call of (ceph +
xmlstarlet).
This patch does move all the 'timer reset' conditions into an else
avoiding spawning other ceph+xmlstarlet call while we already know we
should reset the timer.
The last modification is to reduce the sleeping time as the state of the
cluster is changing very fast.
This whole patch could looks like almost not a big win but for a test
like test/osd/osd-scrub-repair.sh, we drop from 9m56 to 9m30 while
reducing the number system calls.
At the scale of make check, that's a lot of saving.
Signed-off-by: Erwan Velu <erwan@redhat.com>
get_num_active_clean() is called very often but spawn 1 useless process.
The current "grep -v | wc -l" can be easily replaced by "grep -cv" which
do the same while spawning one process less.
Signed-off-by: Erwan Velu <erwan@redhat.com>
The current code of kill_daemons() was killing daemons one after the
other and wait it to actually die before switching to the next one.
This patch makes the kill_daemons() loop being run in parallel to avoid
this bottleneck.
Signed-off-by: Erwan Velu <erwan@redhat.com>
This commit introduce two new functions in ceph-helpers.sh to ease
parallelism in tests.
It's based on two functions : run_in_background() & wait_background()
The first one allow you to spawn processes or functions in background and saves
the associated pid in a variable passed as first argument.
The second one waits for thoses pids to complete and report their exit status.
If one or more failed then wait_background() reports a failure.
A typical usage looks like :
pids1=""
run_in_background pids1 bash -c 'sleep 5; exit 0'
run_in_background pids1 bash -c 'sleep 1; exit 1'
run_in_background pids1 my_bash_function
wait_background pids1
The variable that contains pids is local making possible to do nested calls of
thoses two new functions.
Signed-off-by: Erwan Velu <erwan@redhat.com>
Since the merge of pr #7693, 'ceph command' to get the help is invalid.
As a result, 'test/cephtool-test-mon.sh' test was broken
This patch simply change the 'ceph command' by a 'ceph --help command'
Since this change the test is passing again.
Signed-off-by: Erwan Velu <erwan@redhat.com>
"mds stat" now gives fsmap output rather than
mdsmap. Update the rest api test's expectations.
Fixes: http://tracker.ceph.com/issues/15309
Signed-off-by: John Spray <john.spray@redhat.com>
Since the ceph task was already creating filesystems
during setup, I presume that these calls only ever
worked when they were using the same names as
the existing filesystem.
Fixes: http://tracker.ceph.com/issues/15309
Signed-off-by: John Spray <john.spray@redhat.com>
The new default must be taken into account by make check scripts
otherwise they fail.
Followup of 5b3da26.
Signed-off-by: Loic Dachary <loic@dachary.org>
The python scripts are not yet compatible with python3, yet it is the
default on jessie. Force the creation of the virtualenv to use python2.7
instead. The wheelhouse is already explicitly populated for both python3
and python2.7 by install-deps.sh, regardless of the default interpreter.
Signed-off-by: Loic Dachary <loic@dachary.org>
Before a7a3658 the rbdmap script was logging bogus messages and not working
on systemd platforms because the unit file was not defining the RBDMAPFILE
environment variable.
This workunit asserts that the bug has been fixed.
http://tracker.ceph.com/issues/14984 References: #14984
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Borrowed this from rados/test_python. It's for
invoking the test that lives in src/test/pybind/test_cephfs.py
Signed-off-by: John Spray <john.spray@redhat.com>
Simulate the cases where the activation (via udev running trigger)
sequences are:
* journal then lockbox
* data then lockbox
* lockbox
All of them must end with the OSD verfied to be up.
Signed-off-by: Loic Dachary <loic@dachary.org>
Instead of storing the dmcrypt keys in the /etc/ceph/dmcrypt-keys
directory, they are stored in the monitor. If a machine with
OSDs created with ceph-disk prepare --dmcrypt is lost, it does
not contain the key that would allow to decrypt their content.
The dmcrypt key is retrieved from the monitor using a different keyring
for each OSD. It is stored in a small partition called the lockbox. At
boot time the lockbox is mounted
/var/lib/ceph/osd-lockbox/$uuid
and used when the $uuid partition is detected by udev to map it with
cryptsetup.
The OSDs that were prepared prior to the lockbox implementation are
supported by looking up the key found in /etc/ceph/dmcrypt-keys before
looking in /var/lib/ceph/osd-lockbox/$uuid.
http://tracker.ceph.com/issues/14669Fixes: #14669
Signed-off-by: Loic Dachary <loic@dachary.org>
"ceph --watch-debug" and "ceph tell mon.foo version" could connect
to different monitors, and there is chance that "ceph --watch-debug"
is not connected yet when "ceph tell" completes, and hence the former
fails to collect the cluster log including the "ceph tell" related
message. this renders test_mon_tell() unreliable. so, in
ceph_watch_start(), we should wait until the "ceph" cli connects to the
monitor and receives messages from it.
Fixes: #14910
Signed-off-by: Kefu Chai <kchai@redhat.com>
While running a make check on a btrfs system, many subvolumes are let at the end
of the build. It's pretty common to have several hundreds of those.
btrfs is pretty sensible to the path when requesting a subvolume removal.
The current code was misleading the path and didn't deleted the remaining
volumes.
This patch list the current subvolumes, filter thoses created by the
test process and ajust the path because brtfs reports
erwan/chroot/ceph/src/testdir/test-7202/dev/osd1/snap_439
while regarding the current working directory we want to delete :
testdir/test-7202/dev/osd1/snap_439
Signed-off-by: Erwan Velu <erwan@redhat.com>
client: add option to control how directory size is calculated
This lets you disable rstats if your workload is unhappy about directories
changing size (eg, tar of recently-moved/created/untarred files).
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
http://tracker.ceph.com/issues/13422 made it so ceph-osd won't start
unless the pidfile can be created successfully. The default location
being the current directory, ceph-osd must explicitly be told to write
in a directory where it has write permissions.
Signed-off-by: Loic Dachary <loic@dachary.org>
Only support the block file for now. It is handled the same as the
journal, only with a different name (block) and it's own set of ptypes
depending on multipath or dmcrypt.
Signed-off-by: Loic Dachary <loic@dachary.org>
Refactor the test / virtualenv setup in the same way it was done for
ceph-detect-init.
All shell tests use ceph-helpers.sh which is modified to add ceph-disk /
ceph-detect-init virtualenv/bin to the PATH to ensure the source version
is used even if ceph is installed.
See "ceph-detect-init: make all must setup.py install"
Signed-off-by: Loic Dachary <loic@dachary.org>
This patch does cleanup for option "filestore_xattr_use_omap",
as this option was removed in #7408.
Fixes: #14356
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
- cleanup on a test failure;
- minimize interference with other processes (tests) that are
run concurrently;
- use xmlstarlet when parsing rbd output;
- add exit status test.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
If the journal is not unmapped, ceph-disk destroy will fail to zap the
corresponding devices because it is still held by devicemapper.
A consequence of this modification is that
ceph-disk activate --dmcrypt --reactivate
no longer works from the command line, because it does not map the
dmcrypted journal. The --reactivate option is added to activate-journal
which will map both the journal and the data devices, if necessary.
http://tracker.ceph.com/issues/14233Fixes: #14233
Signed-off-by: Loic Dachary <loic@dachary.org>
* not all config items are tracked, so it does not take any effect after
we sucessfully changed them using "ceph tell <daemon> injectargs --foo-bar 15',
as shown by the command output:
$daemon: foo_bar = '15'
if foo-bar happens to be the one not tracked by any components in <daemon>.
in this fix, the message of
$daemon: foo_bar = '15' (unchangeable)
is returned instead. nevertheless, the config is still updated. as
"ceph daemon <daemon> config show | grep foo_bar" shows:
"foo_bar": "15"
this helps user to understand that the setting is not dynamically
changeable.
* update the test accordingly
Fixes: #11692
Signed-off-by: Kefu Chai <kchai@redhat.com>
This is just like 'blacklist rm' except it removes
everything. Useful if you've got a whole bunch of
things in your blacklist and you don't want to wait
for N "blacklist rm" commands to run.
Signed-off-by: John Spray <john.spray@redhat.com>
Add the pool name for a given rbd imgae when executing rbd admin socket
commands in case there are more than one images with the same name in
different pools.
Signed-off-by: Xiangwei Wu wuxiangwei@h3c.com
librados extend remove interface, add flags parameter, and use
this extended interface to implement force remove when cluster full.
Signed-off-by: Xiaowei Chen <chen.xiaowei@h3c.com>
When an OSD id is removed via ceph osd rm, it will be reused by the next
ceph osd create command. Verify that and OSD reusing such an id
successfully comes up.
http://tracker.ceph.com/issues/13988 Refs: #13988
Signed-off-by: Loic Dachary <loic@dachary.org>
When called to teardown a test, kill_daemon should use KILL to ensure
all leftovers are removed as quickly as possible to leave a clean state
for the next test. However, when kill_daemons is called to shutdown a
given daemon from within a test, it should use TERM by default so the
daemon has time to notify the MON that it goes down. For instance, if
KILLing an OSD, the mon will still report it as being up although the
calling function probably expects that it will be marked out.
Signed-off-by: Loic Dachary <loic@dachary.org>
In a test environment, consistency is more important than
performances. Effectively disable the test that would postpone a scrub
depending on the load average. It is assumed that a machine with a load
average higher than 2000 won't be useable anyway.
http://tracker.ceph.com/issues/14027 Refs: #14027
Signed-off-by: Loic Dachary <loic@dachary.org>
If /tmp/obj1 happened to exist already, and was not writable by the
testing user, then this test failed!
Signed-off-by: Robin H. Johnson <robin.johnson@dreamhost.com>
When testing auto scrub, waiting 20 seconds for the scrub to complete is
sometimes not enough and creates false negatives.
Split wait_for_scrub out of the repair helper so that it can be used to
wait for the scrub to happen instead of using a timer.
The scrub timestamp is obtained after removing the object, therefore
there is a chance for the scrub to be finished already. But since auto
scrub is scheduled every 5 seconds, it will only make the test wait an
extra 5 seconds and not hang forever.
http://tracker.ceph.com/issues/13592
Signed-off-by: Xinze Chi <xinze@xsky.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
ceph osd pool set $POOL scrub_min_interval N
ceph osd pool set $POOL scrub_max_interval N
ceph osd pool set $POOL deep_scrub_interval N
If N > 0, this value is used for the pool instead of
the corresponding global parameter from the config
(osd_scrub_min_interval, osd_scrub_max_interval or
osd_deep_scrub_interval).
Fixes: #13077
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Update the PLUGINS variable that was no longer used. Add the TECHNIQUES
variable to control which techniques are compared.
Signed-off-by: Loic Dachary <loic@dachary.org>
It is used instead of the obsoleted --parameter directory= to specify
the location of the erasure code directory plugins.
Signed-off-by: Loic Dachary <loic@dachary.org>
- using the deactivate/destroy feature to destroy osd
- test reactivate option when the osd goes deactive
- add check_osd_status to check osd status when osd goes up
Signed-off-by: Vicente Cheng <freeze.bilsted@gmail.com>