It turns out that our suites don't exercise fsync, at least not very much
(I couldn't find it in all the places I looked for it). This tester
was written by Ted T'so and updated by Chris Mason; I just made it
work on a smaller dataset (256MB) because 8GB against a small cluster takes
more time than we want to wait.
Signed-off-by: Greg Farnum <greg@inktank.com>
This script was heuristically using short sleep commands in order to
give udev activity time to complete.
There's a command "udevadm settle" which actually looks at the udev
queue and waits until its processing is done. Much, much better.
This rearranges the get_id function a bit too, breaking it into one
function that gets the id and another that loops back and tries
again after a short delay in the event the get_id fails.
Signed-off-by: Alex Elder <elder@inktank.com>
This adds a bash script that creates an rbd image, then repeatedly
maps and unmaps it for a specified duration (5 minutes by default).
Signed-off-by: Alex Elder <elder@inktank.com>
Make import work; do I/O in image native block size.
Note: creating sparse images is not currently attempted; could
scan for runs of zeros and write discontiguous chunks to image.
Fixes: #3503
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit c99d9c3ae7)
Make import work; do I/O in image native block size.
Note: creating sparse images is not currently attempted; could
scan for runs of zeros and write discontiguous chunks to image.
Fixes: #3503
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Validate change to not assume dest pool == src pool
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 39180430b9)
These tests are showing intermittent failures so we'll drop them
from the default list for the time being.
Signed-off-by: Alex Elder <elder@inktank.com>
I've gone through the set of xfstests that were previously found to
not work. Some of those now do work, and with the addition of an
option to pass to "mkfs.xfs" a large number of other tests now
produce expected output as well.
This patch updates the default list of tests to run to reflect
the result of this exercise. The following 50 additional tests
are now run by default:
029 074 078 084-087 100 105 117 121 124 126 129-134
164 165 167 174 181 184 186 187 192 214-216 227 236
237 241 243 245-249 257-259 261 277 278 280 285 286
Test 127 completed without error, but it took from 1-3 hours so I
kept that out of the list.
Signed-off-by: Alex Elder <elder@inktank.com>
The default of 8 is virtually never the right answer. Require the initial
pg count to be explicitly provided.
Signed-off-by: Sage Weil <sage@inktank.com>
rbd ls of format-2 images was looping on the first 64 (when more than 64
were present). The key name passed to the omap layer needs to always
contain the prefix, and the "inside-the-loop next-chunk" statement
was missing the "add the prefix" call.
Also, add a test for listing 100 images, format 1 and 2.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Users have been seeing failures where rbd rm is half-done; could be
because of outstanding watches on the rbd_header object. The state
is that rbd_children no longer contains the child, but other pieces
remain; remove considers this a failure.
Fix: test for ENOENT from remove_child, and treat that as an ignorable
error and drive on. Simulate this in copy.sh by removing the
rbd_children object altogether, which also results in ENOENT return
from remove_child.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Users have been seeing failures where rbd rm is half-done; could be
because of outstanding watches on the rbd_header object. The state
is that rbd_children no longer contains the child, but other pieces
remain; remove considers this a failure.
Fix: test for ENOENT from remove_child, and treat that as an ignorable
error and drive on. Simulate this in copy.sh by removing the
rbd_children object altogether, which also results in ENOENT return
from remove_child.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This adds a "-c <count>" option to the run_xfstests.sh script so
the full set of tests can be repeated more than once without having
to go through the setup process each time.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
We can't check the initial permissions of the
file because the umask may be set to something
other than 0022. The check isn't needed to check
for chmod correctness anyway.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
This fixes up the chmod test to use a unique
filename to test with, and avoid clobbering of
other tests and commonly named files.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
This test still verifies that the race is handled correctly if it
occurs, but will no longer clutter test results with spurious failures
when the race is not reproduced.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This is to handle TextTable output, which doesn't use tabs
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Use an assert version op in combination with our watch, and re-read
the header until it's not stale. Header updates are infrequent, so
this should not cause any delay with normal use.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Or maybe it was a spello, or a thinko, or something. In any case
I'm pretty sure Josh intended to call the function he added in
commit 78d6a60ca, and not the non-existent "test_import_args".
Signed-off-by: Alex Elder <elder@inktank.com>
The locker (entity_name_t) will be different each time the rbd
command line tool is run, so 'lock remove' is always breaking a lock.
Fixes: #2556
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
* no longer need to wait for watch timeout since #2948 was fixed
* use --format 2 instead of --new-format
* add test_cls_rbd to run-rbd-tests script
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
These check that removing an image still works if an rbd rm
command was interrupted partway through.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Now it's not the caller's responsibility to specify the format,
and we can eliminate a job from the qa suite.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Stop the qa noise we fix#2410. Looks like a freeze/thaw thing.
Maybe Jan's new freeze/thaw code will address this? That's probably
wishful thinking.
Signed-off-by: Sage Weil <sage@inktank.com>
These's are comprehensive because a lot of the startup logic is about
picking a local address, and it's difficult to do test that on a single
host. They cover the other variables surrounding mon bringing up, though:
- part of initial monmap, or not
- new nodes given all prior nodes, or not
- new nodes have self included in monmap seed, or not
- initial quorum members
Signed-off-by: Sage Weil <sage@inktank.com>
Test 232 in the xfstests suite produces an XFS error in the log
when run over an RBD device. This is most likely an XFS problem
that will be tracked separately (in tracker 2302).
My original plan with getting this checked in was to have it run a
baseline set of the tests--all known to pass on rbd devices--with
the intention of doing ongoing work to add back missing tests (at
least from the "auto" group) as we understand and fix whatever
makes them produce failures.
So just comment out test 232 so the xfstests script is able to
run to completion without error.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Because we exit on any error (due to 'set -e'), the cleanup call was
never getting made in the event of an error. The net effect of that
was that a filesystem could be left mounted, and rbd cleanup then
couldn't complete because the module was in use.
Fix the trap call so it calls cleanup on exit as well as error.
Switch to using the capitalized signal names in the call.
Signed-off-by: Alex Elder <elder@dreamhost.com>
It turns out that xfstests *does* exit with non-zero status
when a test fails. Its exit status is the number of tests
that failed (which, now that we have over 255 tests could be
an issue...)
Save the exit status and make it be the result of the run.
Signed-off-by: Alex Elder <elder@dreamhost.com
Add a script that runs xfstests over a pair of devices that are
specified using command line arguments. The tests are run using
a specified filesystem type (xfs, ext4, or btrfs).
A default set of tests is run if none is specified on the command
line. Normally there's an "auto" group used for this purpose, but
for now I've laid out a (large) subset of them that I know pass on
rbd devices. These can be updated as we find they work reliably.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Attempt to reproduce btrfs bug when rmdirs race with an async snap.
Unsuccessful. Best guess is that we need multiple threads to trigger.
Signed-off-by: Sage Weil <sage@newdream.net>
This was mixed up with min/max_op_len. And max_ops wasn't being used
the initial object creation stage, flooding the OSDs. Or during run().
Signed-off-by: Sage Weil <sage@newdream.net>
Capture Alexandre's script for reproducing #1774 here for posterity, until
we write a properly harnessed test for this. Currently, workunits can't
mount/unmount, and we don't have a way to make ceph-fuse drop it's cache.
Signed-off-by: Sage Weil <sage@newdream.net>
We don't have a great way to guarantee mdsmap updates, but they
should happen on their own and we can loop. Closes#1518.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>