Verify that an object promoted to a cache tier because of a proxy read
is evicted as expected.
http://tracker.ceph.com/issues/12673 Refs: #12673
Signed-off-by: Loic Dachary <ldachary@redhat.com>
the proble breaks `test_mon_deprecated_commands` on ubuntu precise,
on the python shipped with ubuntu precise, errno.errorcode[95]
evalutes to `EOPNOTSUPP` but not `ENOTSUP`. but these two errnos
are equal in glibc.
Signed-off-by: Kefu Chai <kchai@redhat.com>
'ceph mon_metadata' was added still during this dev cycle, so there is
no need to deprecate it first.
Fixes: #11545
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
We need to be able to allow the version of ceph_test_* from earlier
versions of ceph to continue to work. This patch also adjusts the
work unit to use a single rados snap to test the condition without
--force-nonempty to ensure that we don't need to be careful about
the config value when running that script.
Signed-off-by: Samuel Just <sjust@redhat.com>
Modify the test traces to include the file name in addition to the
function and line name. It makes it easier to locate the faulty line
without going back to the test name.
Format the trace lines to be emacs friendly (filename:lineno) so that
C-x ` or C-c C-c jumps to the right file and the right line when running
the test with M-x compile.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
When a test fails, the script returns immediately and kill_daemon
function is called to cleanup. It is quite verbose and requires
scrolling hundreds of lines back to find the actual error
message. Turn off the shell trace to reduce the verbosity and improve
error output readability.
The kill_daemon cannot just turn off set -x because it may be called by
a test, not just at the end of the run. Instead the kill_daemon function
checks if tracing is activated and temporarily disables it.
Also get rid of the find standard error that commonly happens when
kill_daemon is called to verify there are no leftovers and the test
directory does not exist.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
osd-class-dir was not set when activating osds in the test environment leading to failures with 'operation not supported' message when trying to lock objects
Signed-off-by: Sebastien Ponce <sebastien.ponce@cern.ch>
Wip writeback throttling for cache tiering
This patch is to do write back throttling for cache tiering, which is similar to what the Linux kernel does for page cache write back. A paramter 'cache_target_dirty_high_ratio' (default 0.6) is introduced as the high speed flushing threshold, while leave the 'cache_target_dirty_ratio' (default 0.4) to represent the low speed threshold. The flush speed is controlled by limiting the parallelism of flushing. The maximum parallelism under low speed is half of the parallelism under high speed. If there is at least one PG such that the dirty ratio beyond the high threshold, full speed mode is entered; If there is no PG such that dirty ratio beyond the low threshold, idle mode is entered; In other cases, slow speed mode is entered.
Signed-off-by: Mingxin Liu <mingxinliu@ubuntukylin.com>
Reviewed-by: Li Wang <liwang@ubuntukylin.com>
Suggested-by: Nick Fisk <nick@fisk.me.uk>
Tested-by: Kefu Chai <kchai@redhat.com>
This should make newer gcc releases happier in their default configuration.
kernel.org is now distributing tarballs as .xz files so we change to that
as well when decompressing (it is supported by Ubuntu Precise so we should
be all good).
Fixes: #11758
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
VirtualBox has some files with weird
permissions in its /usr/lib, which was
tripping up this usually-safe operation
when run as an unprivileged user.
Fixes: #11959
Signed-off-by: John Spray <john.spray@redhat.com>
While we're at it, take only /usr/lib instead of all of /usr
to keep the overall file count more modest.
Fixes: #11807
Signed-off-by: John Spray <john.spray@redhat.com>
fix "pg ls" with states of "recovering" and/or "repair"
Reviewed-by: Joao Eduardo Luis <joao@suse.de>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
1. Creating a filesystem using a
readonly tier on an EC pool (should be forbidden)
2. Removing a tier from a replicated base pool (should
be permitted)
Signed-off-by: John Spray <john.spray@redhat.com>
A get/set command may fail with
Error EBUSY: currently creating pgs, wait
if issued before the PGs are clean. Call wait_for_clean after the pool
is created or a pool setting is changed and remaps the PGs it
contains (size, pg_num...) to ensure the PGs are clean and the set/get
command that follow will succeed.
http://tracker.ceph.com/issues/11624Fixes: #11624
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The semantic and interface of get_pg are the same, that avoids
duplication and the ceph-helpers.sh version is tested and documented.
Make the ceph-test package dependent on xmlstarlet because it is
needed by ceph-helpers.sh.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
...not very elegantly because this is bash, but
at least check the expected value is somewhere
present in the JSON output.
Signed-off-by: John Spray <john.spray@redhat.com>
When CEPH_CLI_TEST_DUP_COMMAND=1 is set, ceph osd create will consume
two osd id and return the later. Fix the test to account for that and
not assume the osd id being allocated by osd create is always the
next available osd id.
The other osd create tests do not suffer from the same variation because
they provide a UUID argument that guarantees the same osd id is going to
be returned every time.
http://tracker.ceph.com/issues/11618Fixes: #11618
Signed-off-by: Loic Dachary <ldachary@redhdat.com>
Move check-local scripts
src/test/run-cli-tests
encode-decode-non-regression.sh
test/encoding/readable.sh
to check_SCRIPTS. Their output is captured in .log file when running
with a recent automake. This reduces the output of make check by an
order of magnitude.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
If we add a pool with snap state as a tier the snap state gets clobbered
by OSDMap::Incremental::propogate_snaps_to_tiers(), and may prevent OSDs
from starting. Disallow this.
Include a test.
Fixes: #11493
Backport: hammer, giant, firefly
Signed-off-by: Sage Weil <sage@redhat.com>
Instead of
* setting limit
* populate the cache
* check the health warnings
do the following
* populate the cache
* set limits below the content of the cache
* check the health warnings
The problem with the former approach is that the limits stored by the
OSD internally do not exactly match the one set by the user: they are
converted in ratios and there may be rounding errors.
Also replace the busy loop waiting for pg stats to flush with
ceph tell osd.* flush_pg_stats || true
for simplicity.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
On a machine slow enough, the tiering agent can be activated while
testing border cases where the cache is almost full. Prevent that
by deactivating the tiering agent.
http://tracker.ceph.com/issues/11359Fixes: #11359
Signed-off-by: Loic Dachary <ldachary@redhat.com>
ISA and Jerasure can be compared for the default stripe width (4KB) and
the two most commonly used Reed Solomon matrices. Comparing the
bandwidth for large chunks (1MB) is not relevant because it is not
commonly used.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
* also translate "repair" if specified as "states"
* update test_mon_pg in cephtool-test-mon.sh
Fixes: #11569
Signed-off-by: Kefu Chai <kchai@redhat.com>
Obtain the local IP of the client and save the nonce provided when the messenger was created. This is required for RBD lock/unlock
Fix script error in RBD concurrent test
Reset did_bind during messenger shutdown
Signed-off-by: Raju Kurunkad <raju.kurunkad@sandisk.com>
* if ceph is not reading from a tty, expect EOF instead of "quit"
as the end of input.
* do not panic at seeing the EOF
* update the test case test_mon_injectargs_SI(). since we disables
"ceph injectargs <args,...>" in a458bd83, in which the arguments
of "injectargs" are supposed to be consumed by "tell" instead.
so "ceph injectargs ..." is taken as an incomplete command, and
this command will bring ceph cli into the interactive mode,
redirecting its stdin to /dev/null helps ceph cli quit the loop,
but in a way of throwing EOFError exception. this change handles
the EOF, so the "ceph injectargs ..." does not throws anymore.
but the side effect is that the test fails since it expects a
non-zero return code. so replace it with an equivalent "tell"
command which also fails but due to the non-SI postfix.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Don't forget set cachemode.
By the way, For test max_target_bytes, don't reply on /etc/passwd
because different host have different size of /etc/passwd.
Avoid met this bug, i create a tmp file.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
This will only output all the values applicable to a given type of pool.
So for example for a pool that is not a tier pool values like HIT_SET_TYPE,
HIT_SET_PERIOD, HIT_SET_COUNT etc. will be ignored.
Fixes: #10891
Signed-off-by: Michal Jarzabek <stiopa@gmail.com>
lttng requires daemons (things that call fork, clone, or daemon
without exec, like fuse) to use a LD_PRELOAD library. Without this,
the lttng atexit() handler crashes.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
! doesn't do what we want in bash -e. Use a more explicit helper
instead, and specify expected error codes.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
The crushtool is aborted if it takes more than mon lease seconds. Since
the monitor blocks while running it, this is mandatory otherwise the
monitor will be considered down and new elections triggered.
http://tracker.ceph.com/issues/10947Fixes: #10947
Signed-off-by: Loic Dachary <ldachary@redhat.com>
If the exclusive lock feature is enabled, all locks need
to be removed prior removing the image.
Fixes: #10990
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Handle the case that kernel does not support fcntl.F_OFD_SETLK.
Also fix the code that checks if fnctl fails with errno == EINTR.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
The rbd CLI now utilizes the rbd_default_format configuration
setting, therefore the copy test now needs to tell rbd which image
format it is expecting to create.
Fixes: #10961
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Instead of the http://github.com/dachary namespace. It is not an issue
in the common case because it should be cloned because it is in the
.gitmodules file with the proper namespace.
http://tracker.ceph.com/issues/10836Fixes: #10836
Signed-off-by: Loic Dachary <ldachary@redhat.com>
We're seeing the lsof invocation fail (as not found) in testing and nobody can
identify why. Since attempting to reproduce the issue has not worked, this
patch will gather data from a genuinely in-vitro location.
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
The test was removed in 1189138 (mon: make ceph tell mon.* version
work) as it began to fail due to #10439. After it fixed in c4548f6
(pybind: ceph_argparse: validate incorrectly formed targets), the test
can be restored.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
* Use set -e to detect errors
* no need to retry cleanup in a loop now that rbd-fuse closes images
* use --no-progress for long-running operations
* add output at the start of each test
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Until the kernel supports RBD exclusive locking, this test
has been updated to create shared images (exclusive locking
disabled).
Fixes: #10613
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
This is because running fs/misc on a kclient
should not include quota (quota not supported
on kernel client).
Fixes: #10579
Signed-off-by: John Spray <john.spray@redhat.com>
Automatic locking hides the ESHUTDOWN from the caller, which is how
this test detects that blacklisting works.
Fixes: #10592
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
* nodelete - pool can't be deleted
* nopgchange - pool's pg and pgp num can't be changed
* nosizechange - pool's size and min size can't be changed
This is intended to help some poor admin to avoid a very bad day.
Fixes: #9792 (but in a different way than it was proposed there)
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
undersized not valid: undersized not in inactive|unclean|stale
undersized not valid: undersized doesn't represent an int
Invalid command: unused arguments: ['undersized']
pg dump_stuck {inactive|unclean|stale [inactive|unclean|stale...]} {<int>} : show information about stuck pgs
Signed-off-by: xinxin shu <xinxin.shu@intel.com>
These can happen with split or with state changes due to reordering
results within the hash range requested. It's easy enough to filter
them out at this stage.
Backport: giant, firefly
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Because fs reset opens a brief window for the previously
failed MDSs to spring back into life.
Fixes: #10539
Signed-off-by: John Spray <john.spray@redhat.com>
Previously was set in hashbang, which meant
that "./quota.sh" was OK, but "sh ./quota.sh" would
just run through ignoring errors.
Signed-off-by: John Spray <john.spray@redhat.com>
The tiobench software has been abandoned upstream for years. Fedora and
Debian are no longer shipping the tiobench package, so we've had to
carry the package ourselves in the Ceph project, and we're trying to
slim down our dependencies where it makes sense to do so.
Nuke the tiobench suite.
http://tracker.ceph.com/issues/10152 Refs: #10152
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
This is like a temporary measure as the mon will try to set them again,
but we have run into cases where the mon was misbehaving (failing to clear
the flag) and we wanted to do it. Note that the mon will likely set it
again on the next tick() anyway.
If we're going to clear it, we may as well be able to set it, too (again,
even if the mon is going to clear it soon). If nothing else this is useful
for writing tests.
Fixes: #9323
Signed-off-by: Sage Weil <sage@redhat.com>
This was missing from 17b5fc9a but we didn't notice
because the test wasn't being run by the gitbuilders.
Signed-off-by: John Spray <john.spray@redhat.com>
Using the array notation to list test is error prone and more
complicated to write.
It also fixes a bug : only the first test of each series (MON, OSD, MDS)
was run and the others were ignored.
Signed-off-by: Loic Dachary <loic@dachary.org>
This was an overly-strict check. In fact it is perfectly
fine to set an overlay on a pool that is already in use
as a filesystem data or metadata pool.
Fixes: #10135
Signed-off-by: John Spray <john.spray@redhat.com>
If CEPH_CLI_TEST_DUP_COMMAND is set when ceph osd create is called, it
will create two osd. They must be cleaned up afterwards instead of
assuming only one is going to be created.
http://tracker.ceph.com/issues/10083Fixes: #10083
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The local filesytem may behave slightly differently. This isn't
foolproof, but seems to be reliable enough on rhel7 rootfs, where
exact comparison was failing.
Fixes: #10002
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Clone the archive of encoded objects and decode all archived objects, up
to and including the current ceph version.
http://tracker.ceph.com/issues/9420 Refs: #9420
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Update bench.sh/plot tool to cover ISA backend.
ISA will output a fake echinique 'cauchy_good' so the plot tool
don't need to be changed.
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
For testing injectargs a configuration option was changed that has side
effects on the cluster. It could introduce random failures later. It is
replaced with a configuration option that cannot have adverse side
effects on the cluster.
http://tracker.ceph.com/issues/9919Fixes: #9919
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
It is incorrect to append the content of CEPH_ARGS to the argument list
when running injectargs. For instance if
CEPH_ARGS='--log-file the.log' \
./ceph tell osd.0 injectargs --no-osd_debug_op_order
translates into
./ceph tell osd.0 injectargs --no-osd_debug_op_order \
--log-file the.log
it ends up changing the log file of osd.0 which is probably unintended.
Instead CEPH_ARGS is inserted before injectargs and it translates into:
./ceph tell osd.0 --log-file the.log \
injectargs --no-osd_debug_op_order
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
The arguments of injectargs being valid ceph arguments, they are.
consumed when the ceph cli calls rados.conf_parse_argv(). It can be
worked around by obsuring them as in:
ceph tell osd.0 injectargs '--osd_debug_drop_ping_probability 444'
where '--osd_debug_drop_ping_probability 444' is a single argument that
does not match any known argument. The trick is that it will be
evaluated again once it reaches the OSD or the MON and translated into
the expected list of arguments. Although it is clear once explained, it
is obscure and leads to strange combinations such as:
ceph tell osd.0 injectargs '--osd_debug_op_order '
(note the extra space at the end) to set boolean parameters. A better
workaround is to add a -- marking the end of the options as in:
ceph tell osd.0 -- injectargs --osd_debug_op_order
this one is unfortunately much less documented and the user does not
usually know the exact semantic of --, let alone where it should be
placed.
The simpler solution is to split the argument list in two if
"injectargs" is found. The arguments that show after the "injectargs"
argument is removed from the list of arguments until parsing is
complete. It implements the more intuitive syntax:
ceph tell osd.0 injectargs --osd_debug_op_order
and the other forms are still valid for backward compatibility.
http://tracker.ceph.com/issues/9372Fixes: #9372
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Suites run with CEPH_TEST_CLI_DUP_COMMAND=1, which will send a duplicate
command for every command issued with the 'ceph' tool. Behavior is to
get a reply from the command and then send a duplicate, looking for the
same outcome (guaranteeing idempotency of the operations). However, it
so happens that if you remove the entity's own key from the keyring and
you happen to be unlucky enough so that the client's connection gets
failed (we also run tests with connection failure injections), the
'ceph' tool won't be able to reconnect to the cluster to send the
duplicate command (as it's entity no longer exists in the cluster's
keyring).
We rewrite the test instead of resorting to ugly hacks to work around
this behavior, simply having a new 'role-definer' added by the existing
'role-definer' (which we weren't testing anyway, so bonus points for
that) and then have one removing the other (to test the procedure) and
finally using 'client.admin' to remove the last 'role-definer'.
Fixes: #9820
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
Wip client flock
Add support for file locking to the userspace client, and improve blocked-lock cancellation so that it doesn't remove locks that succeeded when racing.
Reviewed-by: Greg Farnum <greg@inktank.com>
Assuming they are more likely than others to leave OSD/MON in an
unstable state that could have undefined side effects on the tests
following it. A cleaner solution would be to run them in a separate
script that is run on an independent cluster.
http://tracker.ceph.com/issues/9700Fixes: #9700
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
We added MDS resetting code here a while back,
but the order of operations was such that a
"cluster up" was being run between a fail_all_mds
and the point at which we needed the map not to
be interfered with (testing setmap).
Also the new fs create/destroy cycles for testing
EC pool handling were missing calls to stop the
daemons before fs rm.
Signed-off-by: John Spray <john.spray@redhat.com>
It is expected for ceph tell to fail with ENXIO if the daemon it is
trying to join is not ready for some reason. This should be handled as a
transient error instead of a fatal error.
Add two shell functions to help with retry. They may prove useful if
other cases requiring a few retries show up.
http://tracker.ceph.com/issues/9655Fixes: #9655
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
test creating and entity with blank caps with and without '--force'
being specified. without '--force' they must fail with EINVAL as the
monitor will not be able to parse them.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
We have variables with the same name that are being shared! We don't
hit any issues with it currently because the code just kind of works
even though that happens. Add a bit of new logic that relies on an
immutable return code (for instance) and we're in the woods.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
expect_false does not extend past the pipe and fails because the command
succeeds
introduced in f05c977bbc
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Keep the osd trash test to ensure it is a valid command but make it a
noop by giving it a zero argument (meaning thrash 0 OSD maps).
Remove the loops that were added after the command in an attempt to wait
for the cluster to recover and not pollute the rest of the tests. Actual
testing of osd thrash would require a dedicated cluster because it the
side effects are random and it is unnecessarily difficult to ensure they
are finished.
http://tracker.ceph.com/issues/9620Fixes: #9620
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
ceph --format plain osd find 1 (and metadata) are not implemented and
must fallback to the default (json-pretty).
http://tracker.ceph.com/issues/9538Fixes: #9538
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Add a trivial osd health test at the beginning of each group of
tests. When facing an intermittent failure, it is difficult to diagnose
if the cluster appears to be missing an OSD but there is no indication
as to when the OSDs were last up.
The tests are now only run after all OSDs are up.
These checks can be disabled with --no-sanity-check to allow running
some tests that have less requirements than running all the tests.
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
From the bash man page:
set -e exit immediately ... The shell does not exit ... if the
command's return value is being inverted with !
Add an explicit exit 1 where appropriate.
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
* Removing tiers from a base pool in use by CephFS is forbidden.
* Using CephFS pools as tiers is forbidden.
Signed-off-by: John Spray <john.spray@redhat.com>
Fixes two things:
* EC pools are now permissible if they have a cache overlay
* Pools are not permissible if they are a cache tier.
Fixes: #9435
Signed-off-by: John Spray <john.spray@redhat.com>
This workunit will be used by tests as a placeholder that always return
true. This is helpful in tests when a script from the qa/workunits
directory is mandatory but we do not care about testing anything. For
an example of how it can be used, check
https://github.com/ceph/ceph-qa-suite/pull/120
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Add tests to fail as soon as an unexpected condition is met in
test_mon_osd. Otherwise the actual error will be more difficult find in
the logs.
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Aside from being a bit odd to begin with, using stderr
was causing tests to fail because the output was polluted
by log output which is also on stderr.
Fixes: 9281
Signed-off-by: John Spray <john.spray@redhat.com>