* test_osd_bench: the injectargs call actually fails due to the spaces
at the beginning. so remove the spaces in args before sending it to
injectargs
* update/add some injectargs tests accordingly
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
in cf24535, we use $CEPH_ROOT to specify the $top_srcdir to unify
cmake and autotools, but this breaks ceph-qa-suite/tasks/workunit.py,
as it only clones the necessary qa/workunits directory, and does not
pass $CEPH_ROOT to the test scripts. so we need to set a default
$CEPH_ROOT if it is not set.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Replaced relative paths in test/cephtool-test-mon.sh,
qa/workunits/cephtool/test.sh, and test/cephtool-test-mon.sh
to work with CEPH_FOO environment variables set in cmake.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Protect a number of unstable/experimental features behind durable flags
https://github.com/ceph/ceph/pull/8383
Reviewed-by: John Spray <john.spray@redhat.com>
Method preprocess_remove_snaps() is designed to fast check whether
we can safely handle a remove-snaps-request without changing the osdmap.
The original design is to be able to handle snaps from multiple pools,
including those snaps even from a non-existent pool by simply skipping
over them. However, this method will quit on successfully detecting
any vaild snap which is truly needed to be removed and forward this
request to prepare_remove_snaps() for further processing.
From the above analysis, the prepare_remove_snaps() method will
theoretically also encounter some snaps which possibly belong to
non-existent pools.
This pr solves the above problem by adding a sanity check against
pool existense associated with the specified snap to be removed, which
shall be considered as a defensive move and makes prepare_remove_snaps()
stronger.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
The current code was waiting 10s to expect the file being put.
If the file was put in a shorter time than 10s, the test just waits for
nothing reducing the execution speed of that test.
This patch simply check if the file is actually available every second
during 10sec to exit prematurely.
This patch saves exactly 10 sec on a local system, surely a little bit
less on an infra but still saves time.
Signed-off-by: Erwan Velu <erwan@redhat.com>
The actual code double the wait time between two calls leading to a
possible 511s of waiting time which sounds a little bit excessive.
This patch offer to reduce the global wait time to 300s and test more
often the rados status to exit the loop earlier. In a local test, that
saves 6 secs per run.
Signed-off-by: Erwan Velu <erwan@redhat.com>
ceph_watch_wait() is doing a sleep _before_ doing the test which could
stop this loop.
It's better doing the action first as it could exit immediately and
avoid a useless sleep.
That's a minor optimization but everything count when trying to get
something smooth.
Signed-off-by: Erwan Velu <erwan@redhat.com>
OSDs are taking some time to be up but waiting 10 secs seems execessive
here between two loops. In the worst case, we can be in a situation of
waiting 10secs for nothing as we are just a few microsecs after the osd
is up.
This patch simply reduce the sleep from 10 to 1 seconds.
Signed-off-by: Erwan Velu <erwan@redhat.com>
Since the merge of pr #7693, 'ceph command' to get the help is invalid.
As a result, 'test/cephtool-test-mon.sh' test was broken
This patch simply change the 'ceph command' by a 'ceph --help command'
Since this change the test is passing again.
Signed-off-by: Erwan Velu <erwan@redhat.com>
"ceph --watch-debug" and "ceph tell mon.foo version" could connect
to different monitors, and there is chance that "ceph --watch-debug"
is not connected yet when "ceph tell" completes, and hence the former
fails to collect the cluster log including the "ceph tell" related
message. this renders test_mon_tell() unreliable. so, in
ceph_watch_start(), we should wait until the "ceph" cli connects to the
monitor and receives messages from it.
Fixes: #14910
Signed-off-by: Kefu Chai <kchai@redhat.com>
* not all config items are tracked, so it does not take any effect after
we sucessfully changed them using "ceph tell <daemon> injectargs --foo-bar 15',
as shown by the command output:
$daemon: foo_bar = '15'
if foo-bar happens to be the one not tracked by any components in <daemon>.
in this fix, the message of
$daemon: foo_bar = '15' (unchangeable)
is returned instead. nevertheless, the config is still updated. as
"ceph daemon <daemon> config show | grep foo_bar" shows:
"foo_bar": "15"
this helps user to understand that the setting is not dynamically
changeable.
* update the test accordingly
Fixes: #11692
Signed-off-by: Kefu Chai <kchai@redhat.com>
This is just like 'blacklist rm' except it removes
everything. Useful if you've got a whole bunch of
things in your blacklist and you don't want to wait
for N "blacklist rm" commands to run.
Signed-off-by: John Spray <john.spray@redhat.com>
If /tmp/obj1 happened to exist already, and was not writable by the
testing user, then this test failed!
Signed-off-by: Robin H. Johnson <robin.johnson@dreamhost.com>
ceph osd pool set $POOL scrub_min_interval N
ceph osd pool set $POOL scrub_max_interval N
ceph osd pool set $POOL deep_scrub_interval N
If N > 0, this value is used for the pool instead of
the corresponding global parameter from the config
(osd_scrub_min_interval, osd_scrub_max_interval or
osd_deep_scrub_interval).
Fixes: #13077
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
test:
see test.sh:test_mon_caps
before modify:
when we first exec ../qa/workunits/cephtool/test.sh -t mon_caps --asok-does-not-need-root , it stuck.
after modify:
exec again, return Permission denied.
Signed-off-by: Xiaowei Chen <chen.xiaowei@h3c.com>
This reverts commit 30810da4b5.
After some discussion we have decided it is better to build a generic
dictionary in pg_pool_t to store infrequently used per-pool properties.
Signed-off-by: Sage Weil <sage@redhat.com>
ceph osd pool set $POOL scrub_min_interval N
ceph osd pool set $POOL scrub_max_interval N
ceph osd pool set $POOL deep_scrub_interval N
If N > 0, this value is used for the pool instead of
the corresponding global parameter from the config
(osd_scrub_min_interval, osd_scrub_max_interval or
osd_deep_scrub_interval).
Fixes: #13077
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
This can race with an actual mdsmap epoch update for some other
reason. We just need to make sure the epoch *increased*, not that
it is exactly old + 1.
Fixes: #12991
Signed-off-by: Sage Weil <sage@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Conflicts:
src/include/ceph_features.h
src/osd/ReplicatedPG.cc
src/osd/ReplicatedPG.h
When an object is first created, it's proxied to base tier, need to
change the behavior of the test_tiering test case accordingly.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
Verify that an object promoted to a cache tier because of a proxy read
is evicted as expected.
http://tracker.ceph.com/issues/12673 Refs: #12673
Signed-off-by: Loic Dachary <ldachary@redhat.com>
the proble breaks `test_mon_deprecated_commands` on ubuntu precise,
on the python shipped with ubuntu precise, errno.errorcode[95]
evalutes to `EOPNOTSUPP` but not `ENOTSUP`. but these two errnos
are equal in glibc.
Signed-off-by: Kefu Chai <kchai@redhat.com>
'ceph mon_metadata' was added still during this dev cycle, so there is
no need to deprecate it first.
Fixes: #11545
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
We need to be able to allow the version of ceph_test_* from earlier
versions of ceph to continue to work. This patch also adjusts the
work unit to use a single rados snap to test the condition without
--force-nonempty to ensure that we don't need to be careful about
the config value when running that script.
Signed-off-by: Samuel Just <sjust@redhat.com>
Modify the test traces to include the file name in addition to the
function and line name. It makes it easier to locate the faulty line
without going back to the test name.
Format the trace lines to be emacs friendly (filename:lineno) so that
C-x ` or C-c C-c jumps to the right file and the right line when running
the test with M-x compile.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Wip writeback throttling for cache tiering
This patch is to do write back throttling for cache tiering, which is similar to what the Linux kernel does for page cache write back. A paramter 'cache_target_dirty_high_ratio' (default 0.6) is introduced as the high speed flushing threshold, while leave the 'cache_target_dirty_ratio' (default 0.4) to represent the low speed threshold. The flush speed is controlled by limiting the parallelism of flushing. The maximum parallelism under low speed is half of the parallelism under high speed. If there is at least one PG such that the dirty ratio beyond the high threshold, full speed mode is entered; If there is no PG such that dirty ratio beyond the low threshold, idle mode is entered; In other cases, slow speed mode is entered.
Signed-off-by: Mingxin Liu <mingxinliu@ubuntukylin.com>
Reviewed-by: Li Wang <liwang@ubuntukylin.com>
Suggested-by: Nick Fisk <nick@fisk.me.uk>
Tested-by: Kefu Chai <kchai@redhat.com>
fix "pg ls" with states of "recovering" and/or "repair"
Reviewed-by: Joao Eduardo Luis <joao@suse.de>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
1. Creating a filesystem using a
readonly tier on an EC pool (should be forbidden)
2. Removing a tier from a replicated base pool (should
be permitted)
Signed-off-by: John Spray <john.spray@redhat.com>
A get/set command may fail with
Error EBUSY: currently creating pgs, wait
if issued before the PGs are clean. Call wait_for_clean after the pool
is created or a pool setting is changed and remaps the PGs it
contains (size, pg_num...) to ensure the PGs are clean and the set/get
command that follow will succeed.
http://tracker.ceph.com/issues/11624Fixes: #11624
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The semantic and interface of get_pg are the same, that avoids
duplication and the ceph-helpers.sh version is tested and documented.
Make the ceph-test package dependent on xmlstarlet because it is
needed by ceph-helpers.sh.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
...not very elegantly because this is bash, but
at least check the expected value is somewhere
present in the JSON output.
Signed-off-by: John Spray <john.spray@redhat.com>
When CEPH_CLI_TEST_DUP_COMMAND=1 is set, ceph osd create will consume
two osd id and return the later. Fix the test to account for that and
not assume the osd id being allocated by osd create is always the
next available osd id.
The other osd create tests do not suffer from the same variation because
they provide a UUID argument that guarantees the same osd id is going to
be returned every time.
http://tracker.ceph.com/issues/11618Fixes: #11618
Signed-off-by: Loic Dachary <ldachary@redhdat.com>
If we add a pool with snap state as a tier the snap state gets clobbered
by OSDMap::Incremental::propogate_snaps_to_tiers(), and may prevent OSDs
from starting. Disallow this.
Include a test.
Fixes: #11493
Backport: hammer, giant, firefly
Signed-off-by: Sage Weil <sage@redhat.com>
Instead of
* setting limit
* populate the cache
* check the health warnings
do the following
* populate the cache
* set limits below the content of the cache
* check the health warnings
The problem with the former approach is that the limits stored by the
OSD internally do not exactly match the one set by the user: they are
converted in ratios and there may be rounding errors.
Also replace the busy loop waiting for pg stats to flush with
ceph tell osd.* flush_pg_stats || true
for simplicity.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
On a machine slow enough, the tiering agent can be activated while
testing border cases where the cache is almost full. Prevent that
by deactivating the tiering agent.
http://tracker.ceph.com/issues/11359Fixes: #11359
Signed-off-by: Loic Dachary <ldachary@redhat.com>
* also translate "repair" if specified as "states"
* update test_mon_pg in cephtool-test-mon.sh
Fixes: #11569
Signed-off-by: Kefu Chai <kchai@redhat.com>
* if ceph is not reading from a tty, expect EOF instead of "quit"
as the end of input.
* do not panic at seeing the EOF
* update the test case test_mon_injectargs_SI(). since we disables
"ceph injectargs <args,...>" in a458bd83, in which the arguments
of "injectargs" are supposed to be consumed by "tell" instead.
so "ceph injectargs ..." is taken as an incomplete command, and
this command will bring ceph cli into the interactive mode,
redirecting its stdin to /dev/null helps ceph cli quit the loop,
but in a way of throwing EOFError exception. this change handles
the EOF, so the "ceph injectargs ..." does not throws anymore.
but the side effect is that the test fails since it expects a
non-zero return code. so replace it with an equivalent "tell"
command which also fails but due to the non-SI postfix.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Don't forget set cachemode.
By the way, For test max_target_bytes, don't reply on /etc/passwd
because different host have different size of /etc/passwd.
Avoid met this bug, i create a tmp file.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
This will only output all the values applicable to a given type of pool.
So for example for a pool that is not a tier pool values like HIT_SET_TYPE,
HIT_SET_PERIOD, HIT_SET_COUNT etc. will be ignored.
Fixes: #10891
Signed-off-by: Michal Jarzabek <stiopa@gmail.com>
The crushtool is aborted if it takes more than mon lease seconds. Since
the monitor blocks while running it, this is mandatory otherwise the
monitor will be considered down and new elections triggered.
http://tracker.ceph.com/issues/10947Fixes: #10947
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The test was removed in 1189138 (mon: make ceph tell mon.* version
work) as it began to fail due to #10439. After it fixed in c4548f6
(pybind: ceph_argparse: validate incorrectly formed targets), the test
can be restored.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
* nodelete - pool can't be deleted
* nopgchange - pool's pg and pgp num can't be changed
* nosizechange - pool's size and min size can't be changed
This is intended to help some poor admin to avoid a very bad day.
Fixes: #9792 (but in a different way than it was proposed there)
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
undersized not valid: undersized not in inactive|unclean|stale
undersized not valid: undersized doesn't represent an int
Invalid command: unused arguments: ['undersized']
pg dump_stuck {inactive|unclean|stale [inactive|unclean|stale...]} {<int>} : show information about stuck pgs
Signed-off-by: xinxin shu <xinxin.shu@intel.com>
Because fs reset opens a brief window for the previously
failed MDSs to spring back into life.
Fixes: #10539
Signed-off-by: John Spray <john.spray@redhat.com>
This is like a temporary measure as the mon will try to set them again,
but we have run into cases where the mon was misbehaving (failing to clear
the flag) and we wanted to do it. Note that the mon will likely set it
again on the next tick() anyway.
If we're going to clear it, we may as well be able to set it, too (again,
even if the mon is going to clear it soon). If nothing else this is useful
for writing tests.
Fixes: #9323
Signed-off-by: Sage Weil <sage@redhat.com>
This was missing from 17b5fc9a but we didn't notice
because the test wasn't being run by the gitbuilders.
Signed-off-by: John Spray <john.spray@redhat.com>
Using the array notation to list test is error prone and more
complicated to write.
It also fixes a bug : only the first test of each series (MON, OSD, MDS)
was run and the others were ignored.
Signed-off-by: Loic Dachary <loic@dachary.org>
This was an overly-strict check. In fact it is perfectly
fine to set an overlay on a pool that is already in use
as a filesystem data or metadata pool.
Fixes: #10135
Signed-off-by: John Spray <john.spray@redhat.com>