Creates an installable version of "src/include/rados/objclass.h" that allows
object classes to be built outside of the Ceph tree. cls_sdk is an example
of such an object class.
Signed-off-by: Neha Ojha <nojha@redhat.com>
Teuthology would periodically fail due to a delay >10 seconds
between moving the item to the trash and checking its status.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This isn't perfect, but it's better than nothing. Prevent enabling the
allow_ec_overwrites flag if any of a sample of pgs in the pool map to
osds using filestore. This mainly protects filestore-only clusters
from enabling it.
If a filestore osd is started later, warn in the cluster log when it
gets a pg with ec overwrites enabled.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Move pool and profile creation into a single function, and
add a 'allow_overwrites' parameter for it so each ec test
can be paramterized by it.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Keep the pool flag around so we can distinguish between a pool that
should maintain hashes for each chunk, and a missing one is a bug, vs
an overwrites pool where we rely on bluestore checksums for detecting
corruption.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Sometimes I get output like:
HEALTH_ERR 2 pgs stuck unclean; Full ratio(s) out of order
Which goes away over time. So it is a transit issue
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
it would be a race otherwise, because we cannot be sure that the cluster
pgs are not all clean or not when run_osd() returns, but we can be sure
that they are expected to active+clean after a while. that's what
wait_for_clean() does.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Testing for disk usage and diff extents for a sparsely written
imported RBD image cannot generically be handled across different
OSD object stores and RBD image features.
The only alternatives would include grepping the rbd CLI
debug log for specific invocations of aio_write or mocking
the rbd CLI for unit testing.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The copy.sh is not only testing the rbd copy, but also
others such as rbd ls, rbd remove. Then rename it to generic.sh
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
in given keyring file, should alert user and should not allow this import.
Because in 'ceph auth list' we keep all the keyrings with caps and importing
'client.admin' user keyring without caps locks the cluster with error[1]
because admin keyring caps are missing in 'ceph auth'.
[1] Error connecting to cluster: PermissionDeniedError
Fixes: http://tracker.ceph.com/issues/18932
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
This script currently has a syntax error, but still exits with
success, which is hiding that failure. Expose it by allowing
the 'sudo' exit code to be the script's exit code.
Signed-off-by: Dan Mick <dan.mick@redhat.com>
This is based on a script that I've been using for a while for basic
smoke testing. The matrix has exploded with the addition of data-pool
and now it's primarily a data-pool test fixture that takes minutes to
run, so turning it into a workunit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
we have
2017-02-04T16:15:46.090 INFO:tasks.workunit.client.0.mira032.stdout:error in 22088
2017-02-04T16:15:46.092 INFO:tasks.workunit.client.0.mira032.stderr:bash: line 1: 22092 Alarm clock ceph_test_rados_api_aio 2>&1
2017-02-04T16:15:46.096 INFO:tasks.workunit.client.0.mira032.stderr: 22093 Done | tee ceph_test_rados_api_aio.log
2017-02-04T16:15:46.099 INFO:tasks.workunit.client.0.mira032.stderr: 22094 Done | sed "s/^/ api_aio: /"
2017-02-04T16:15:46.102 INFO:tasks.workunit.client.0.mira032.stderr:+
if a unittest in rados/test.sh fails in teuthology.log, but it would
be desirable to have the failed test name in the line of "error in
22088".
Signed-off-by: Kefu Chai <kchai@redhat.com>
`set +o` prints out the full command line which is echoed if "xtrace" is
enabled. this increases the verbosity of get_timeout_delays().
in this change, we follow the way of kill_daemons() to kill the extra
output. see aefcf6d.
Signed-off-by: Kefu Chai <kchai@redhat.com>
This var is mostly used when running rbd_mirror test scripts on
teuthology. It can be used locally though to speedup re-running the
tests:
Set a test temp directory:
export RBD_MIRROR_TEMDIR=/tmp/tmp.rbd_mirror
Run the tests the first time with NOCLEANUP flag (the cluster and
daemons are not stopped on finish):
RBD_MIRROR_NOCLEANUP=1 ../qa/workunits/rbd/rbd_mirror.sh
Now, to re-run the test without restarting the cluster, run cleanup
with USE_EXISTING_CLUSTER flag:
RBD_MIRROR_USE_EXISTING_CLUSTER=1 \
../qa/workunits/rbd/rbd_mirror_ha.sh cleanup
and then run the tests:
RBD_MIRROR_USE_EXISTING_CLUSTER=1
../qa/workunits/rbd/rbd_mirror_ha.sh
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
by optionally specifyning daemon instance after cluster name and
colon, like:
start_mirror ${cluster}:${instance}
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Currently if user perform image rename operation and user give pool
name as a optional parameter (--pool=<pool_name>) then currently
its taking this optional pool name for source pool and making
destination pool name default pool name.
With this fix if user provide pool name as a optional pool name
parameter then it will consider both soruce and destination pool
name as optional parameter pool name.
Fixes: http://tracker.ceph.com/issues/18326
Reported-by: МАРК КОРЕНБЕРГ <socketpair@gmail.com>
Signed-off-by: Gaurav Kumar Garg <garg.gaurav52@gmail.com>
Using cephfs_[meta]data collides with the pools that teuthology
already creates if an mds is defined.
This became a (noticeable) problem with 052c3d3f68
Signed-off-by: Sage Weil <sage@redhat.com>
This mimics the OpenStack tempest gate tests that OpenStack
Zuul executes as a gate test.
Fixes: http://tracker.ceph.com/issues/18594
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Currently it only allows you to move buckets, which is annoying and much
less useful. To move an OSD you need to use create-or-move, which is
harder to use.
Fixes: http://tracker.ceph.com/issues/18587
Signed-off-by: Sage Weil <sage@redhat.com>
When a variable is not being observed we currently mark it
"unchangable". This can be misleading so try something hopefully a
little more informative.
Fixes: http://tracker.ceph.com/issues/18424
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
The rbd_cli_tests Perl script is not maintained and currently serves no
purpose. The RbdLib.pm module was only used by rbd_functional_tests.pl (which
was dropped by 276ffb4631) and rbd_cli_tests.pl
so drop it as well.
Fixes: http://tracker.ceph.com/issues/14825
Signed-off-by: Nathan Cutler <ncutler@suse.com>
* replace hard-code pool name with $POOL
* replace hard-code object name with $OBJ
* introduce a new variable called $POOL_EC
* clean up pool
* simplify test case
Signed-off-by: liuchang0812 <liuchang0812@gmail.com>
This means users don't have to manually translate a rule
they just created to a ruleset ID in order to map a pool
to it.
Signed-off-by: Sage Weil <sage@redhat.com>
This is a dev hack to generate a bunch of bogus osdmaps. The maps are
all screwed up anyway (e.g., invalid addrs) and this is minimally useful.
Signed-off-by: Sage Weil <sage@redhat.com>
The test case is not stable due to racing console output. This
results in spurious failures.
Fixes: http://tracker.ceph.com/issues/10773
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Otherwise, it does not work as supposed to work in statements like below:
set -e
test_status_in_pool_dir ... && ...
(e.g. in wait_for_status_in_pool_dir)
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
This fixes a race in resync tests leading to false negative results.
Fixes: http://tracker.ceph.com/issues/18048
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
When displaying the output of a background process, do it on stderr so
that it is not bufferized. Otherwise the output of the background
process may be displayed after it completed.
Prefix the output of a background process with the PID of the process
known to the parent instead of the PID of the awk process processing the
output. When wait_background loops, it will print the process on which
it is waiting and it is confusing that they do not match with the PID
prefixing the process output.
Refs: http://tracker.ceph.com/issues/17830
Signed-off-by: Loic Dachary <loic@dachary.org>
Do all math using bc so we can have fractions
Allow caller to specify the first step (default 1)
Add testing of fractional first step
Signed-off-by: David Zafman <dzafman@redhat.com>
The TENTH_TIMEOUT was not delcared as an int and failed to be set with
the correct number. The test of the function did not catch this.
Implement computing of the increasingly large sleep delays in a separate
function so that it can be tested more easily. Give up on sub-second
sleep because a the function will not sleep at all if the cluster is
already clean. And if it is not already clean, it is very unlikely to
become clean within less than a second. The downside of having very
short sleep time is that it needlessly stress the machine and also
possibly spam the logs.
Refs: http://tracker.ceph.com/issues/17830
Signed-off-by: Loic Dachary <loic@dachary.org>
For vstart.sh powered tests, save 9 characters in the path name
by replacing testdir/test- with td/t-
60 characters imposed by jenkins
9 characters for src/test
5 characters for td/t-
33 left (instead of 24) for the test to create asok such as out/client.admin.25327.asok
Moving these files outside of the build directory is a bad idea because
tests should only create/use files within the builddir and not write
outside of this directory. Doing so would make things more complicated
for cleanup in case the test fail and create other problems as a
consequence (filling out disk space, conflicting directories between
runs etc.).
For ceph-helpers.sh tests replace testdir with td, saving 5 characters.
This is not strictly necessary but keeps the directory names consistent:
if the developer wants to get rid of all the test leftovers, it is
enough to remove the a single directory: td.
Fixes: http://tracker.ceph.com/issues/16014
Signed-off-by: Loic Dachary <loic@dachary.org>
common osd: Improve scrub analysis, list-inconsistent-obj output and osd-scrub-repair test
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Tests use objectstore_tool() which stops and starts OSDs,
but may assume consistency of object locations.
Signed-off-by: David Zafman <dzafman@redhat.com>
Reduce size of log on timeout by doing a backoff so that
we don't log 3000 loops at 1/10 second sleeps.
Signed-off-by: David Zafman <dzafman@redhat.com>
On trusty we see
WARNING: The following packages cannot be authenticated!
librados-dev
E: There are problems and -y was used without --force-yes
Signed-off-by: Sage Weil <sage@redhat.com>
If we have an OSD with a weight that's not 1.0 and mark it out,
we should restore the same weight when we mark it back in. We
already do this when an OSD is automatically marked out, just
not when it is explicitly marked out.
Signed-off-by: Sage Weil <sage@redhat.com>
Previously running the script as unprivileged user was not very useful
due to difficulty to change path sudo was looking for a command to
execute.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
So that a user has a natural way of undoing a setxattr
which set a pool_namespace.
Fixes: http://tracker.ceph.com/issues/17797
Signed-off-by: John Spray <john.spray@redhat.com>
Ensure that the rados client binary doesn't segfault when specifying a
number of parameters without a corresponding --pool parameter.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Because of a missing return, ceph-disk prepare would fail if given a
regular file as a journal. If the journal file does not exist, ceph-disk
will create it but fail to ensure that the ceph user owns it. The
symlink to the journal file is not set when the journal file is
specified on the command line and the journal file does not exist at
all. The ceph-osd daemon will silently create it as a file but it will
not be the file given in argument.
Add a test case to verify using a regular file as a journal works as
expected.
Fixes: http://tracker.ceph.com/issues/17662
Signed-off-by: Jayashree Candadai <jayaajay@indiana.edu>
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The sh function will collect both stderr and stdout and debug
will mess the json parsing.
Fixes: http://tracker.ceph.com/issues/17607
Signed-off-by: Loic Dachary <ldachary@redhat.com>
After recently added image metadata replication it is not possible any
more to update it on non-primary image.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
wait_for_image_replay_stopped returns not when the state is stopped,
but when the state is not replaying. So a race was possible when an
asok command was running when the previos stop command was still in
progress, leading to unexpected results.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
By switching to a new gf-complete with SIMD runtime detection, we can now remove all the different flavors of jerasure and shec. This simplifies deployment and configuration of erasure coding, enables hetergenous OSDs, and enables us to take advantage of new performance improvements in jerasure without config/build changes.
This commit removes flavors from cmake, removes ErasureCodePluginSelect___, and fixes unit tests. There is now a single plugin for jerasure and a single plugin for shec.
SIMDExt.cmake was changed so that its a little more generic, and is not polluted with gf-complete specific CFLAG defines. The #define for SIMD instruction were based on gf-complete.
I also added a small init helper for jerasure that has code that was common between jerasure and shec.
Signed-off-by: Bassam Tabbara <bassam.tabbara@quantum.com>
this reverts d053705. i disabled this test in hope to bisect the
offending tests that fail the mysterious jenkins failure, which was
fixed by 6f3ce3a.
Signed-off-by: Kefu Chai <kchai@redhat.com>
The variable 'pgs_per_osd' set value from 'new_pgs' divided by 'expected_osds',
and its type is integer. So it would remove the decimal point and get smaller value.
This would have problem in some situations, for exmaple:
The limitation of pg creating for one OSD is '32'.
There have 3 OSDs and I want to increase pgs for a pool.
It should be the limitation for creating new pgs up to '96(32 * 3)' at once.
Now, I create '98' pgs for a pool.
In original code, '98' would be divided by 'expected_osds' and get the floating value '32....'
Because of the type which is integer, the 'pgs_per_osd' would be set to 32.
Then the value won't bigger than the limitation and get the wrong result.
Signed-off-by: DesmondS <desmond.s@inwinstack.com>
Fixes: http://tracker.ceph.com/issues/17169
Without a timeout on the command, it may hang for a very long time,
hunting for new mons. If it hangs for more than 60 seconds, it is
safe to assume the mon is indeed down.
Fixes: http://tracker.ceph.com/issues/16477
Signed-off-by: Loic Dachary <loic@dachary.org>
The scsi_debug SCSI devices do not have a symlink in /dev/disk/by-partuuid
because they are filtered out by 60-persistent-storage.rules. That was
worked around by 60-ceph-partuuid-workaround-rules which has been
removed by 9f76b9ff31.
Add create rules targetting this specific case, only for tests since the
problem does not show in real use cases.
Fixes: http://tracker.ceph.com/issues/17100
Signed-off-by: Loic Dachary <loic@dachary.org>
For newly created cluster the CEPH_OSDMAP_REQUIRE_KRAKEN will be
automatically set, while for existing clusters it will not.
This change add "require_jewel_osds" to white list, so user
can access it by the "ceph osd set *" command family.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
When a primiry image is being deleted, the mirrored image might
temporary be reported in error state, before deletion is propagated.
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
- log to stderr;
- log status if a `wait_for` function failed;
- don't needlessly sleep in `wait_for` functions after the last
unsuccessful iteration;
- make `wait_for_pool_images` work for image removal case;
- fix `wait_for_pool_images` reset timeout (last_count set).
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
We are seeing an issue due to the lockdep symbols
in libcephfs and librados clashing, which shows itself
after a fork in the flock tests. We can avoid this
by splitting the libcephfs tests that require librados
(access.cc) into their own compilation unit so that
the flock tests can run in a libcephfs-only process.
Fixes: http://tracker.ceph.com/issues/16556
Signed-off-by: John Spray <john.spray@redhat.com>
Snapshot rename operations utilize the (cluster) unique snapshot
sequence to prevent attempts at replays. When mirroring to a
different cluster, these sequences will not align.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
- This script is always called from a controlled environment
- use CEPH_BIN for exec's, otherwise QA sets PATH correctly
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
create temp directory and files in $TMPDIR. the $TMPDIR is hard-wired to
/tmp before this change, we'd better respect the env variable $TMPDIR,
so it would be more consistent, and easier to do the cleanup if any.
Signed-off-by: Kefu Chai <kchai@redhat.com>
If an image is being bootstrapped, it implies that the rbd-mirror
daemon currently has the image open. The removal API will prevent the
removal of any image that is opened by another client.
Works-around: http://tracker.ceph.com/issues/16555
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
this fixes failures like,
/home/jenkins-build/build/workspace/ceph-pull-requests/qa/workunits/cephtool/test.sh:
line 32: ceph osd blacklist ls | grep 192.168.0.1: command not found
where the failure is not the "failure" we are expecting.
in our tests, following command
expect_false "ceph osd blacklist ls | grep 192.168.0.1"
is designed to to verify that "ceph osd blacklist ls | grep 192.168.0.1"
fails with non-zero return code. but expect_false() evaluates the command
line using plain "$@", which will send the arguments direct to the shell,
and $0 is "ceph auth get client.xx | grep caps | grep mon", which does
not exist and is not built-in command. so we need to check the grep
command instead.
for multiple piped command line, use
expect_false sh <<< "echo foo | grep bar | grep baz"
Signed-off-by: Kefu Chai <kchai@redhat.com>