This was getting stressed in new ways by
TestSessionMap.test_session_reject, which
has a mount that fails initially.
Two changes here:
* Raise CommandFailedError instead of RuntimeError when
a mount fails (i.e. catch process termination instead
of timing out on /sys/ population)
* Generalise error handling on umount, so that we only
raise the exception on an umount failure if the mount
appears to really not be unmounted. There is some
EINVAL corner case that was getting triggered by the test.
Signed-off-by: John Spray <john.spray@redhat.com>
Analogous to raw_cluster_command, but instead
of calling blocking CLI command we're invoking
the -w mode.
Signed-off-by: John Spray <john.spray@redhat.com>
Used when configuring clients with dynamically
generated auth keys, and pointing them at mount paths.
Signed-off-by: John Spray <john.spray@redhat.com>
When stray directory inodes are corrupted, MDS may go to damaged state
after becoming active. (MDCache::open_root/populate_mydir is called by
MDSRank::starting_done).
Fixes: #14196
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Use the Mount.* wrappers for filesystem operations,
so that changes like making run_shell use sudo just work.
Signed-off-by: John Spray <john.spray@redhat.com>
This was causing permissions issues when
running inside teuthology, as run_python
was using sudo and run_shell wasn't.
Would be nice to get rid of all the rootishness,
but for the moment just make it more uniform.
This tests the forward scrub's ability to traverse
some metadata and tag it, and the corresponding
functionality in cephfs-data-scan to filter based
on tag and inject orphaned items.
Signed-off-by: John Spray <john.spray@redhat.com>
Use named error codes instead of numbers, and
use the helper fn for getting inode number
instead of doing it by hand.
Signed-off-by: John Spray <john.spray@redhat.com>
This was previously using a bunch of files and a small
MDCache limit to force things out of cache. It is much
simpler to just drop the journal.
Signed-off-by: John Spray <john.spray@redhat.com>
...specifically that we don't have lingering
MDS sessions after running it. This is testing
that Client::shutdown is doing the right thing
and closing sessions.
Signed-off-by: John Spray <john.spray@redhat.com>
A quick check that clients refuse to mount
when daemons are laggy, and while we're at it,
that the basics of failover work. It's a trivial
test, but it's nice to have this kind of thing
so that we don't have to wait for weird thrasher
failures if something breaks.
Signed-off-by: John Spray <john.spray@redhat.com>
To get the health warning, first we need to make sure requests are
added to session's completed request list. Then we need to send an
extra request to MDS to trigger the code that generates the warning.
Fixes: #13437
Signed-off-by: Yan, Zheng <zyan@redhat.com>
FuseMount only uses the prefix for finding the 'ceph'
executable, which is in ./ for either cmake or
authtools, not ./src for cmake like other binaries.
Signed-off-by: John Spray <john.spray@redhat.com>
It was trying to get the output file from
a different remote than the one used to
run the journal tool.
Signed-off-by: John Spray <john.spray@redhat.com>
This is to allow running CephFSTestCase tests
against a vstart cluster, for much faster turnaround
during development than running teuthology against
built ceph packages.
Not everything will be runnable this way, but for
certain things like filesystem repair scenarios we
have everything we need within a vstart environment.
Signed-off-by: John Spray <john.spray@redhat.com>
For tests to advertise that they need the client
to be able to trim its cache (i.e. currently that
means requiring run as root)
Signed-off-by: John Spray <john.spray@redhat.com>
A means for test cases to mark particular methods
as long running, so that the vstart runner can skip
them when running for developers.
This is not a scientific thing, anything that takes
more than about 2 minutes due to lots of iteration
or sleeps.
Signed-off-by: John Spray <john.spray@redhat.com>
In teuthology this isn't needed because we join the
mds child processes after killing them. In vstart
we're killing them asynchronously, so be a bit more
careful to ensure they can't re-insert themselves
to the mdsmap between our calling fail and our calling
fs rm.
Signed-off-by: John Spray <john.spray@redhat.com>
...into the part that requires a network-isolated
client and the part that doesn't.
This happens to also be the part that won't work with
vstart vs. the part that will. teuthology yaml will
still pick up and run both parts.
Signed-off-by: John Spray <john.spray@redhat.com>
* Instead of creating files in background, create
them in foreground (simpler).
* Instead of creating max_request*2 files, just create
max_requests plus a litle bit.
* Set max_requests to 1000 instead of 5000 to run a bit
faster.
Signed-off-by: John Spray <john.spray@redhat.com>
We weren't waiting for export dir to complete (the asok
just starts the process). This wasn't noticeable when running
remotely due to latency between the test runner and the MDS,
but it shows up when running against a local vstart cluster.
Signed-off-by: John Spray <john.spray@redhat.com>
I am seeing a strange thing where it seems like sometimes
a ls of /sys/fs/fuse/connections is returning empty when
connections do exist. It is pretty easy to make this
a non-issue by waiting for "more conns than we started with"
instead of "list of conns is different", so do that.
Signed-off-by: John Spray <john.spray@redhat.com>
Previously failure to stat mnt dir was interpreted
as being unmounted. For "transport endpoint no connected"
error we do want to recognise that it is mounted, albeit
with no ceph-fuse process.
Signed-off-by: John Spray <john.spray@redhat.com>
Use this during test setup to check whether
a filesystem is configured at all, before
trying to tear it down.
Signed-off-by: John Spray <john.spray@redhat.com>
So that my vstart subclass can put ./ before
all the commands.
One could set $PATH, but I like to unambiguously point
it at the local built binaries in case someone also
has some systemwide packages.
Signed-off-by: John Spray <john.spray@redhat.com>
Our ffsb and fsync tests contain so many small writes at random offsets
that it can take >10 minutes to commit all of them to disk if we get
a slower OSD cluster. 15 minutes is still a plenty-fast timeout for
this stage compared to just hanging and losing the logs!
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Previously were defaulting to a string, which
always compared greater than elapsed, so never
timed out.
Fixes: #12820
Signed-off-by: John Spray <john.spray@redhat.com>
1) add a wait time before the mount attempt to let the cluster get set up.
By default this should be skipped, but for VMs and known-slow systems we
can give them 60 seconds.
2) Make the timeout configurable, with a 30-second default, but override it
for VM tests.
http://tracker.ceph.com/issues/12320Fixes: #12320
Signed-off-by: Loic Dachary <loic@dachary.org>
When client capabilities get released, MDS may update corresponding
inodes' client writable range and mark those inodes dirty. The auto
repair test expects MDS to trim inodes from its cache, but MDS can't
trim dirty inodes. So we should flush journal after umount.
Fixes: #12172
Signed-off-by: Yan, Zheng <zyan@redhat.com>
This is for verifying the new layout-writing behaviour. While
we're at it, test that the pre-existing backtrace behaviours
are really happening (updating old_pools)
Signed-off-by: John Spray <john.spray@redhat.com>
Sometimes mount A would get a cap revoke when mount
B did its last IO, resulting in mount A's OSD epoch
getting updated too.
Fix by making sure mount B is the last one to have
done IO before we do the barrier, so that when
it does IO again after the barrier, mount A can't
be holding any caps that B would need.
Fixes: #11913
Signed-off-by: John Spray <john.spray@redhat.com>
For cases where we have e.g. poked the fuse abort
file for a process, but it's still not dying. Because
this is a special class of error (unlike e.g. when
we force umount something because the network is gone)
raise the error instead of trying again to kill
the client.
Fixes: #11835
Signed-off-by: John Spray <john.spray@redhat.com>