flock only works properly on FUSE versions >=2.9, which is newer
than eg Ubuntu Precise. So check the version on our client mounts and
only test flock if it's at least that new.
Fixes: #9995
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Run the same procedure as TestClusterFull, but
instead of limiting OSD memstore size, use pool
quota on the data pool.
Signed-off-by: John Spray <john.spray@redhat.com>
Create divergent priors and a split and then move a pg using
ceph-objectstore-tool export/import
Add yaml file to run the reg11184 task
Fixes: #11343
Signed-off-by: David Zafman <dzafman@redhat.com>
Based on tasks/divergent_priors.py but also do simple export/remove/import on
same osd.
Add yaml file to run the divergent_priors2 task
Signed-off-by: David Zafman <dzafman@redhat.com>
Flake8 fixes
Use new set_recovery_delay admin socket command
Fix bad value set for filestore_blackhole
Make sure log trims and only require 100 objects
Use kick_recovery_wq to properly set osd_recovery_delay_start to 0
Write and remove divergent and verify removal was undone
Fix to make compatible with wip-10809-11135-10290
Make sure to set_recovery_delay in a non-racey way (while osd running but down)
Leave divergent "in" so its PGs aren't treated as strays
Add yaml file to run the divergent_priors task
Signed-off-by: David Zafman <dzafman@redhat.com>
This patch also adds some convenience facilities for making
some of the ceph_manager methods into tasks usable from a
yaml file.
Signed-off-by: Samuel Just <sjust@redhat.com>
Now that service IDs are modified during run, we have
to avoid repeatedly evaluating first_mon for where
to run ceph_deploy, as the answer will change.
Fixes: #11495
Signed-off-by: John Spray <john.spray@redhat.com>
The early non-defaults caused failures due to xfstests_url: None not
being overridden by run_xfstests(). Move the defaults to xfstests() and
don pass xfstests_branch past that point.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This test apparently had not been touched since
"fs new" was added. In addition to calling
Filesystem.create:
* modify the get_nodes_using_role
function to modify ctx.cluster.remotes so that the
service IDs match what ceph-deploy will set
* log exceptions during ceph_deploy setup, as otherwise
they can get lost if another exception occurs during
teardown (so that it's all easier to debug).
* default to passing --dev=master during install, so
that we don't error out horribly when run without
an explicit branch set (e.g. when run outside
scheduled suite)
Fixes: #11316
Signed-off-by: John Spray <john.spray@redhat.com>
... s/mon_remote/admin_remote/ and allow caller to pass
in which remote they want to use for that. Enables use
with ceph_deploy task which does not give admin keys
to mons.
Signed-off-by: John Spray <john.spray@redhat.com>
This broke with recent Client changes that
do better caching of readdir results, such
that doing an ls twice is no longer sufficient
to see a fresh result after repair - we need
to remount instead.
Signed-off-by: John Spray <john.spray@redhat.com>
To track recent change in master where instead of
crashing on missing MDSTable object we'll go
into damaged state.
Instead of catching a crash, handle the rank's
transition to the damanged state. Leave the crash
handling code (unused for the moment) in the
Filesystem class in case it's needed elsewhere
soon.
Signed-off-by: John Spray <john.spray@redhat.com>
This tests the new purge file/ops throttling
in the MDS, via the new perf counters for
strays/purging.
Fixes: #10390
Signed-off-by: John Spray <john.spray@redhat.com>
...to avoid having boilerplate in each test module,
and gain the ability to run them all in one go
with a nice test-by-test pass/fail report.
Signed-off-by: John Spray <john.spray@redhat.com>
Were previously taking the baseline from just after the
client did a delete, which was racy: should have taken
it from before, to get a steady state.
Also update the perf dump calls to take advantage of
the new filtering syntax.
Signed-off-by: John Spray <john.spray@redhat.com>
Wherever we are subsequently waiting for daemons
to be healthy, we should be doing a fail during the restart.
Also catch some places that were doing this longhand and use
the handy fail_restart version instead.
Signed-off-by: John Spray <john.spray@redhat.com>
In python, isinstance(foo, str) will fail if
a unicode string is passed in. The correct check
is basestring.
Signed-off-by: John Spray <john.spray@redhat.com>
...as long as only one is active, all the ops
that default to talking to a single MDS should
be happy to talk to the active MDS, even if there
happens to be a standby lying around too.
Signed-off-by: John Spray <john.spray@redhat.com>
man samba(8) contains sentences:
To shut down a user's smbd process it is recommended that SIGKILL (-9)
NOT be used, except as a last resort, as this may leave the shared
memory area in an inconsistent state. The safe way to terminate an smbd
is to send it a SIGTERM (-15) signal and wait for it to die on its own.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
These variables are needed because ceph-qa-suite bootstraps ceph-qa-chef via
http download of solo-from/scratch/run. This adds a variable to override the
default script. It also adds variables to the rbd task to override the versions
of run_xfstests_krbd.sh and run_xfstests.sh downloaded by the default task.
variables added
======
tasks:
-chef
script_url: # override default location for solo-from-scratch for Chef
chef_repo: # override default Chef repo used by solo-from-scratch
chef_branch: # to choose a different git upstream branch for ceph-qa-chef
-rbd.xfstests:
client.0:
xfstests_branch: # to choose a different git upstream branch for xfstests
xfstests_url: # override git base URL for run_xfstests{_krbd}.sh
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Pass -f by default to btrfs instead of first trying without and *then*
trying with.
Among other things, this avoids a confusing failure where we try mkfs.ext4
device (no -f), fail for some reason, and then try again with -f and get
a usage error (-f does not mean force for mke2fs).
Signed-off-by: Sage Weil <sage@redhat.com>
Make a DEFAULTS dict that is updated by any user parms, so that
defaults are documented centrally and so config.get(key, defval) is
no longer necessary everywhere.
Signed-off-by: Dan Mick <dan.mick@redhat.com>
Stop trying to build test images inside this test; presume the test
image is available built externally (in a file path or an http URL).
Config vars ice_tool_dir, ice_version, iceball_location, and
ice_git_location go away in favor of 'test_image', the path to the
testable image (which can still be a tar.gz or an .iso).
Signed-off-by: Dan Mick <dan.mick@redhat.com>
Ubuntu's mount/kernel support "mount <file> <mntpnt>" directly;
apparently Centos 6 (and presumably RHEL6) require specifying at
least '-o loop' (a /dev/loopN will be dynamically allocated and removed
on unmount).
Signed-off-by: Dan Mick <dan.mick@redhat.com>
Move the get_user_summary(out, user) logic to util.rgw so that it can be
shared between radosgw_admin_rest.py and radosgw_admin.py and modify
them accordingly.
http://tracker.ceph.com/issues/11180Fixes: #11180
Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 97e6d808f0)
Move the get_user_summary(out, user) logic to util.rgw so that it can be
shared between radosgw_admin_rest.py and radosgw_admin.py and modify
them accordingly.
http://tracker.ceph.com/issues/11180Fixes: #11180
Signed-off-by: Loic Dachary <loic@dachary.org>
To avoid internal.coredump task synthesizing a failure
during teardown from the core we left behind.
Fixes: #10949
Signed-off-by: John Spray <john.spray@redhat.com>
This was an overly strict success condition: the
flush operation doesn't promise to leave you an empty
journal, it promises that anything in the journal
before the flush will be flushed.
Fixes: #10712
Signed-off-by: John Spray <john.spray@redhat.com>
Where multiple MDSs were on the same node, trying
to concurrently update their firewall state was
causing an exception because the iptables command
errors out if another instance is already running.
Fixes: #10948
Signed-off-by: John Spray <john.spray@redhat.com>
teuthology helpfully escapes things for us so
the \; didn't need the backslash. The logic
was still falling over in some cases too.
Additionally, make the FUSE /sys/ abort operation
more surgical by working out the connection name
of our own mount during mount().
Signed-off-by: John Spray <john.spray@redhat.com>
* add a wrapper to log uncaught exception to self.logger, greenlet also
prints the backtrace and exception to stderr, but teuthology.log does
not capture stderr. so we need to catch them by ourselves to reveal
more info to root-cause this issue.
* log uncaught exception thrown by Thrasher.do_thrash() to self.log.
See: #10630
Signed-off-by: Kefu Chai <kchai@redhat.com>
Specifically, I want to know *who* is running the ceph-osd that is
holding the files open.
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit a68281e147)
This ensures that we still gather the logs even if the other nested tasks
throw an exception in the finally block.
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit ca09683f5f)
This ensures that we still gather the logs even if the other nested tasks
throw an exception in the finally block.
Signed-off-by: Sage Weil <sage@redhat.com>
Change the config option from mds_id to mds_rank to reflect the
fact that it's the rank we want to make use of (and will continue
to want when we're doing stuff like force exporting from one rank
to another).
Fixes: #10361
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
restart() will stop if the daemon is running. This will get rid of the
spurious error
2015-01-23 15:19:36,828.828 ERROR:tasks.ceph.osd.0:tried to stop a non-running daemon
when the daemon isn't already running.
Signed-off-by: Sage Weil <sage@redhat.com>
Require ceph-objectstore-tool to be available on all OSD nodes
Log a message when tool is not available
Signed-off-by: David Zafman <dzafman@redhat.com>
Where previously we only tracked RADOS-level delete
ops during deletion, now also verify that they
correspond to the right number of MDS-level purge
operations.
Signed-off-by: John Spray <john.spray@redhat.com>
This tests the new #9883 repair functionality
where we selectively scrape dentries out of
the journal while the MDS is offline.
Signed-off-by: John Spray <john.spray@redhat.com>
Add a function dedicated to erasure coded pools tests, similar to
repair_test_1. Add a corrupter that removes the hinfo_key from the object.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Add the CephManager.objectstore_tool method to encapsulate a call to
ceph-objectstore-tool. The wrapper can convert an object name into the
PG id and figure out the primary OSD. The designated OSD is stopped
before running the command and restarted afterwards.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The commit is large but does not introduce any semantic change and
consists primarily in code moving around, re-indented and removed.
Replace functions generating functions by functions and sequentially
iterating over a list of functions with a sequential call to the
functions.
Replace the setup/teardown with an equivalent using a with
statement and the ceph_manager.pool method.
Replace inline code with a call to ceph_manager.wait_for_all_up
It makes it easier to modify the tests, for instance to create erasure
coded pools and tests specific to them.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
To create a pool before running a code bloc and remove it after.
with manager.pool("mypool"):
mytest..
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Add a function dedicated to erasure coded pools tests, similar to
repair_test_1. Add a corrupter that removes the hinfo_key from the object.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Add the CephManager.objectstore_tool method to encapsulate a call to
ceph-objectstore-tool. The wrapper can convert an object name into the
PG id and figure out the primary OSD. The designated OSD is stopped
before running the command and restarted afterwards.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The commit is large but does not introduce any semantic change and
consists primarily in code moving around, re-indented and removed.
Replace functions generating functions by functions and sequentially
iterating over a list of functions with a sequential call to the
functions.
Replace the setup/teardown with an equivalent using a with
statement and the ceph_manager.pool method.
Replace inline code with a call to ceph_manager.wait_for_all_up
It makes it easier to modify the tests, for instance to create erasure
coded pools and tests specific to them.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
To create a pool before running a code bloc and remove it after.
with manager.pool("mypool"):
mytest..
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Require ceph-objectstore-tool to be available on all OSD nodes
Log a message when tool is not available
Signed-off-by: David Zafman <dzafman@redhat.com>
ice-tools needs a virtualenv populated to properly run to build
an iceball; add the commands to do that. Also remove the built
iceball when the task exits.
Fixes: #10523
Signed-off-by: Dan Mick <dan.mick@redhat.com>
Previously, the task would search for the lexicographically-greatest
filename matching ICE-*.tar.gz; now it builds a specific name
ICE-{ice_version}-{ice_distro}.tar.gz
Fixes: #10521
Signed-off-by: Dan Mick <dan.mick@redhat.com>
The small segments and small segment limit
were used when doing a hacky flush by doing
IO and waiting: now that we have the explicit
'flush journal' asok in use, we can just use
a normal journal configuration.
Signed-off-by: John Spray <john.spray@redhat.com>
This was only used in get_first_mon, which doesn't actually
need the parameter itself. Makes it easier to casually
use Filesystem from any place with a ctx to hand.
Signed-off-by: John Spray <john.spray@redhat.com>
When unused clients were mounted during an fs new,
they would end up in a state where they stalled
on subsequent attempts to umount them (ceph-fuse
stalls on exit if it can't terminate its mds_session)
Signed-off-by: John Spray <john.spray@redhat.com>
Instead of blocking the whole port range (which
might make OSDs running on that node collateral
damage), read the MDS's port out of the MDS map
and just block that.
Signed-off-by: John Spray <john.spray@redhat.com>
...because this is the one that will store up
changes to roll back during teardown.
Doing this makes it easy to run lots of test cases
togeher in a single teuthology run, raher than
setting up/tearing down the ceph cluster for each
on.
Signed-off-by: John Spray <john.spray@redhat.com>
Now that we have more of these cases, there was lots
of duplication in setup and teardown. For some tests
the "reset everything" setup/teardown is overkill,
but it's harmless.
Signed-off-by: John Spray <john.spray@redhat.com>
Since the new 'tell' for the MDS was introduced,
caps have to have the '*' to permit running remote
administrative commands.
Signed-off-by: John Spray <john.spray@redhat.com>
Now that #10387 is fixed in master, we can tighten
up this test to ensure that the expected deletions
are happening.
Signed-off-by: John Spray <john.spray@redhat.com>
This reverts commit 26a33c3a5a.
This is tryign to create the archive dir on the remote host:
2014-12-29T12:15:30.213 INFO:teuthology.orchestra.run.plana31:Running: 'mkdir -p /var/lib/teuthworker/archive/sage-2014-12-29_11:40:52-rgw-next---basic-multi/683052'
2014-12-29T12:15:30.231 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested
vars.append(enter())
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/s3readwrite.py", line 241, in run_tests
ctx.cluster.only(client).run(args=['mkdir', '-p', archive_dir])
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/cluster.py", line 64, in run
return [remote.run(**kwargs) for remote in remotes]
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 128, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 368, in run
r.wait()
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 106, in wait
exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on plana31 with status 1: 'mkdir -p /var/lib/teuthworker/archive/sage-2014-12-29_11:40:52-rgw-next---basic-multi/683052'
...but it should only be on the local host.
This tests:
* The new 'flush journal' asok command
* That the resulting on disk structures are as expected
* That cephfs-journal-tool is happy with the result
Fixes: #9881
Signed-off-by: John Spray <john.spray@redhat.com>
The format of the output of --op list was changed to include the PG to
which the object belong. It simplifies the loop in
ceph_objectstore_tool.py.
http://tracker.ceph.com/issues/10376Fixes: #10376
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Previously was always using the default values of things
so querying mon instead of the appropriate service
worked fine. However, for things we might want to
update on a per-test basis we need to go ask the
correct service what the setting really is.
Needed for osd_mon_report_interval_max in the ENOSPC
testing.
Signed-off-by: John Spray <john.spray@redhat.com>
Fixes: #9892
Need to wait through the usage interval before trimming usage, otherwise we might not
remove all pending usage info.
Backport: dumpling, firefly, giant
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit dd09ecbfab8a659f3faaf879a52849caab5e8e8e)
It now checks for 'notify1' and 'notify2' strings, allowing it to work
on both old and new versions of rados watch command.
Signed-off-by: Sage Weil <sage@redhat.com>
Leave the legacy handling out in cephfs_setup, move
the filesystem creation stuff into Filesystem. I
anticipate this being the right place for it if/when
we have tests that want to do 'fs rm' 'fs new' type
cycles within themselves.
Signed-off-by: John Spray <john.spray@redhat.com>
This was tripping over the recent commit 42c85e80
in Ceph master, which tightens the limits on
acceptable PG counts per OSD, and was making
teuthology runs fail due to never going clean.
Rather than put in a new hardcoded count, infer
it from config. Move some code around so that
the ceph task can get at a Filesystem object
to use in FS setup (this already has conf-getting
methods).
Signed-off-by: John Spray <john.spray@redhat.com>
New CephFS tests for MDS's auto repair functions. (So far the only
test case is verify/repair backtrace on fetch dirfrag)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
The s3readwrite.py task formerly wrote too much output while excuting.
It now saves the data on the local machine in either the archive
directory or in /tmp if no archive directory is specified.
The new file contains a client name and timestamp in its name.
Once all processing has completed, that file is saved locally.
Fixes: 9117
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Create an erasure coded pool and run tests on it. The list of PGs is
adapted to contain the shard id.
Signed-off-by: Loic Dachary <ldachary@redhat.com>