A quick check that clients refuse to mount
when daemons are laggy, and while we're at it,
that the basics of failover work. It's a trivial
test, but it's nice to have this kind of thing
so that we don't have to wait for weird thrasher
failures if something breaks.
Signed-off-by: John Spray <john.spray@redhat.com>
To get the health warning, first we need to make sure requests are
added to session's completed request list. Then we need to send an
extra request to MDS to trigger the code that generates the warning.
Fixes: #13437
Signed-off-by: Yan, Zheng <zyan@redhat.com>
When running on virtual machines, it may take more than one minute for a
daemon to create the admin socket.
http://tracker.ceph.com/issues/13449Fixes: #13449
Signed-off-by: Loic Dachary <loic@dachary.org>
Prior to v0.80.9, autogen.sh did not get submodules. Copy/paste the
submodule initialization from newer autogen.sh in common.sh so that
v0.80.8 and below can be rebuilt from sources. It does not hurt to
update the submodules twice.
Signed-off-by: Loic Dachary <loic@dachary.org>
os_version is from the remote and will be 7.1.23 for CentOS 7
instead of the expected 7.0 for all 7.* CentOS.
Signed-off-by: Loic Dachary <loic@dachary.org>
It is not enough to look for the first install task. In upgrade tests,
the install.upgrade task requires more packages to be built. In more
complicated tests using sequential and parallel tasks, the actual
install or install.upgrade task may be deeper in the config tree.
Signed-off-by: Loic Dachary <loic@dachary.org>
The install config may have contradicting tag/branch and sha1. When
suite.py prepares the jobs, it always overrides the sha1 with whatever
default is provided on the command line with --distro and what is found
in the gitbuilder. If it turns out that the tag or the branch in the
install config task is about another sha1, it will override anyway.
Instead of obtaining the tag, branch and sha1 directly from the
packaging.GitbuilderProject object, compute them from the returned
uri_reference data member. The uri_reference is used by the install task
to fetch packages in the gitbuilders and this is what buildpackages
needs to build.
Signed-off-by: Loic Dachary <loic@dachary.org>
The config['os_type'] and config['os_version'] are not always set for a given
job (for instance, in the rbd suite). When a suite runs, it relies on
default values, depending on the target Operating System and internal,
hard coded values associating ubuntu to 14.04 etc.
Instead of using config['os_{type,version}'] use the GitbuilderProject
equivalent which is set with the appropriate defaults.
Signed-off-by: Loic Dachary <loic@dachary.org>
test rbd or krbd using fio, can also run io on rbd clones if option is specified in yaml
various options like image-size, rbd format/features, fio io size, readwrite options can be provided in yaml.
check the docstring for exact usage.
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
FuseMount only uses the prefix for finding the 'ceph'
executable, which is in ./ for either cmake or
authtools, not ./src for cmake like other binaries.
Signed-off-by: John Spray <john.spray@redhat.com>
It was trying to get the output file from
a different remote than the one used to
run the journal tool.
Signed-off-by: John Spray <john.spray@redhat.com>
This is to allow running CephFSTestCase tests
against a vstart cluster, for much faster turnaround
during development than running teuthology against
built ceph packages.
Not everything will be runnable this way, but for
certain things like filesystem repair scenarios we
have everything we need within a vstart environment.
Signed-off-by: John Spray <john.spray@redhat.com>
For tests to advertise that they need the client
to be able to trim its cache (i.e. currently that
means requiring run as root)
Signed-off-by: John Spray <john.spray@redhat.com>
A means for test cases to mark particular methods
as long running, so that the vstart runner can skip
them when running for developers.
This is not a scientific thing, anything that takes
more than about 2 minutes due to lots of iteration
or sleeps.
Signed-off-by: John Spray <john.spray@redhat.com>
In teuthology this isn't needed because we join the
mds child processes after killing them. In vstart
we're killing them asynchronously, so be a bit more
careful to ensure they can't re-insert themselves
to the mdsmap between our calling fail and our calling
fs rm.
Signed-off-by: John Spray <john.spray@redhat.com>
...into the part that requires a network-isolated
client and the part that doesn't.
This happens to also be the part that won't work with
vstart vs. the part that will. teuthology yaml will
still pick up and run both parts.
Signed-off-by: John Spray <john.spray@redhat.com>
* Instead of creating files in background, create
them in foreground (simpler).
* Instead of creating max_request*2 files, just create
max_requests plus a litle bit.
* Set max_requests to 1000 instead of 5000 to run a bit
faster.
Signed-off-by: John Spray <john.spray@redhat.com>
We weren't waiting for export dir to complete (the asok
just starts the process). This wasn't noticeable when running
remotely due to latency between the test runner and the MDS,
but it shows up when running against a local vstart cluster.
Signed-off-by: John Spray <john.spray@redhat.com>
I am seeing a strange thing where it seems like sometimes
a ls of /sys/fs/fuse/connections is returning empty when
connections do exist. It is pretty easy to make this
a non-issue by waiting for "more conns than we started with"
instead of "list of conns is different", so do that.
Signed-off-by: John Spray <john.spray@redhat.com>
Previously failure to stat mnt dir was interpreted
as being unmounted. For "transport endpoint no connected"
error we do want to recognise that it is mounted, albeit
with no ceph-fuse process.
Signed-off-by: John Spray <john.spray@redhat.com>
Use this during test setup to check whether
a filesystem is configured at all, before
trying to tear it down.
Signed-off-by: John Spray <john.spray@redhat.com>
So that my vstart subclass can put ./ before
all the commands.
One could set $PATH, but I like to unambiguously point
it at the local built binaries in case someone also
has some systemwide packages.
Signed-off-by: John Spray <john.spray@redhat.com>
A run failed due to thrashing.. missed by about 30s (the osd
eventually sent the last reply but we'd already timed out).
Signed-off-by: Sage Weil <sage@redhat.com>
The existing logic is to ceph-deploy osd create --zap-disk which will
zap the data device before preparing it. However it will not zap the
journal device (see http://tracker.ceph.com/issues/13291).
If ceph-deploy osd create fails, a fall back will zap both the data
device and the journal and try prepare again. This could work if
the device preparation and activation was synchronous and catch all
errors that could be caused by an unclean journal device. However,
the activation is asynchronous and it is entirely possible for a device
to be prepared successfully and fail to activate in the background.
The data and journal device are always zapped before calling ceph-deploy
osd create. The logic is simpler and the overhead is low.
http://tracker.ceph.com/issues/13000Fixes: #13000
Signed-off-by: Loic Dachary <loic@dachary.org>
CentOS 6.5 needs to install a package and reboot to grow the root file
system. Instead of assuming a common user-data.txt file can fit all
Operating Systems, make one user data per os-type/os-version combination.
Signed-off-by: Loic Dachary <loic@dachary.org>
The process run by flock must not inherit the file descriptor because
this will cause the lock to be held forever should the command survive
the call to flock. This is for instance the case for the ssh-agent.
Signed-off-by: Loic Dachary <loic@dachary.org>
Instead of relying on git_base_url, use the get_ceph_git_url() to obtain
the URL of the Ceph repository to use with git clone. This allows the
user to override it via the git_ceph_url configuration option and the
--git-ceph-url command line option to teuthology-openstack.
http://tracker.ceph.com/issues/11883 Refs: #11883
Signed-off-by: Loic Dachary <loic@dachary.org>
The config paramter of download_ceph_deploy does not have a ceph-deploy
item, therefore the ceph-deploy-branch parameter is always assumed to be
master.
Signed-off-by: Loic Dachary <loic@dachary.org>
Otherwise we can get
2015-09-24T19:22:15.191 INFO:teuthology.orchestra.run.mira080.stderr:Error ENXIO: problem getting command descriptions from osd.1
Signed-off-by: Sage Weil <sage@redhat.com>
This is the correct implementation of 685d76a77c,
merged while broken in ff1655cb57 and
reverted in 4cccde634f.
Signed-off-by: John Spray <john.spray@redhat.com>
This reverts commit ff1655cb57, reversing
changes made to 2b25080d4f.
Since we haven't actually started the MDS daemons yet, this code is broken.
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Our ffsb and fsync tests contain so many small writes at random offsets
that it can take >10 minutes to commit all of them to disk if we get
a slower OSD cluster. 15 minutes is still a plenty-fast timeout for
this stage compared to just hanging and losing the logs!
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Build Ceph packages from source for the required revision, os_type,
os_version and architecture and upload them to the gitbuilder
repository.
http://tracker.ceph.com/issues/13031Fixes: #13031
Signed-off-by: Loic Dachary <loic@dachary.org>
/var/run/ceph is 770. This is mainly necessary for any
interaction with the daemon sockets, but it is what users do
and it may avoid log noise.
Signed-off-by: Sage Weil <sage@redhat.com>
We need to be able to merge things into s3-tests master that
break rgw. Create ceph-foo branches (ceph-master,
ceph-infernalis, etc.) and use those instead.
Signed-off-by: Sage Weil <sage@redhat.com>
Previously were defaulting to a string, which
always compared greater than elapsed, so never
timed out.
Fixes: #12820
Signed-off-by: John Spray <john.spray@redhat.com>
When keep_running is true, do not shutdown the cluster, leave it as it
is for other workunits or tasks to use. This effectively allows the
ceph-deploy task to be used as a helper to deploy clusters.
The call to build_ceph_cluster is simplified by giving it the whole
configuration dictionary instead of re-building one with selected arguments.
Signed-off-by: Loic Dachary <loic@dachary.org>
When ceph-deploy fails, run ceph report to get more information about
the state of the cluster at the time of the failure.
Signed-off-by: Loic Dachary <loic@dachary.org>
In our RHCS 1.3 ceph-deploy docs, we tell users to run "ceph-deploy
install --cli" on their calamari admin node, but our smoke test wasn't
actually doing this.
See https://bugzilla.redhat.com/1252929 , "[Ubuntu 1.3.0] - ceph-deploy
install --no-adjust-repos --cli `hostname` is failing with a Traceback
error"
In RHCS 1.2 we don't have a /mnt/MON directory. The intention of
35c6363a1e was to handle this condition,
but in 1.2, the non-zero return code makes Teuthology fail the whole
test.
We don't want *Teuthology* itself to act on the return code here; we
simply want to know what it was and structure the rest of the test
accordingly.
lttng is not yet part of any private repo; since 1.3.0/CentOS is not
a supported product, just grab it from EPEL for this test.
Signed-off-by: Dan Mick <dan.mick@redhat.com>
Otherwise, ceph-deploy will install an apt source that points to
ceph.com, which will override the local ISO repos.
No --mon/--osd yet until 12147 is fixed
Signed-off-by: Dan Mick <dan.mick@redhat.com>
also waits to remove it from dead_osds. this fixes an issue where
do_sighup tries to send a signal to an osd that has not been revived
yet.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This will call Thrasher.do_sighup which picks a random osd and sends a
signal.SIGHUP to it, delaying for the value of sighup_delay between each
time it picks a new osd to signal.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This method runs in a separate greenlet than do_thrash and will pick a
random live osd to send a signal.SIGHUP to. There is a config option,
sighup_delay, which controls how long to delay between sending the
signals.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>