To avoid internal.coredump task synthesizing a failure
during teardown from the core we left behind.
Fixes: #10949
Signed-off-by: John Spray <john.spray@redhat.com>
This was an overly strict success condition: the
flush operation doesn't promise to leave you an empty
journal, it promises that anything in the journal
before the flush will be flushed.
Fixes: #10712
Signed-off-by: John Spray <john.spray@redhat.com>
Where multiple MDSs were on the same node, trying
to concurrently update their firewall state was
causing an exception because the iptables command
errors out if another instance is already running.
Fixes: #10948
Signed-off-by: John Spray <john.spray@redhat.com>
teuthology helpfully escapes things for us so
the \; didn't need the backslash. The logic
was still falling over in some cases too.
Additionally, make the FUSE /sys/ abort operation
more surgical by working out the connection name
of our own mount during mount().
Signed-off-by: John Spray <john.spray@redhat.com>
* add a wrapper to log uncaught exception to self.logger, greenlet also
prints the backtrace and exception to stderr, but teuthology.log does
not capture stderr. so we need to catch them by ourselves to reveal
more info to root-cause this issue.
* log uncaught exception thrown by Thrasher.do_thrash() to self.log.
See: #10630
Signed-off-by: Kefu Chai <kchai@redhat.com>
Specifically, I want to know *who* is running the ceph-osd that is
holding the files open.
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit a68281e1476e6af38237e1d1031dd7bd0980ef9f)
This ensures that we still gather the logs even if the other nested tasks
throw an exception in the finally block.
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit ca09683f5fc1a6067c524c4034c27ab4a26e11f3)
This ensures that we still gather the logs even if the other nested tasks
throw an exception in the finally block.
Signed-off-by: Sage Weil <sage@redhat.com>
Change the config option from mds_id to mds_rank to reflect the
fact that it's the rank we want to make use of (and will continue
to want when we're doing stuff like force exporting from one rank
to another).
Fixes: #10361
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
restart() will stop if the daemon is running. This will get rid of the
spurious error
2015-01-23 15:19:36,828.828 ERROR:tasks.ceph.osd.0:tried to stop a non-running daemon
when the daemon isn't already running.
Signed-off-by: Sage Weil <sage@redhat.com>
Require ceph-objectstore-tool to be available on all OSD nodes
Log a message when tool is not available
Signed-off-by: David Zafman <dzafman@redhat.com>
Where previously we only tracked RADOS-level delete
ops during deletion, now also verify that they
correspond to the right number of MDS-level purge
operations.
Signed-off-by: John Spray <john.spray@redhat.com>
This tests the new #9883 repair functionality
where we selectively scrape dentries out of
the journal while the MDS is offline.
Signed-off-by: John Spray <john.spray@redhat.com>
Add a function dedicated to erasure coded pools tests, similar to
repair_test_1. Add a corrupter that removes the hinfo_key from the object.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Add the CephManager.objectstore_tool method to encapsulate a call to
ceph-objectstore-tool. The wrapper can convert an object name into the
PG id and figure out the primary OSD. The designated OSD is stopped
before running the command and restarted afterwards.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The commit is large but does not introduce any semantic change and
consists primarily in code moving around, re-indented and removed.
Replace functions generating functions by functions and sequentially
iterating over a list of functions with a sequential call to the
functions.
Replace the setup/teardown with an equivalent using a with
statement and the ceph_manager.pool method.
Replace inline code with a call to ceph_manager.wait_for_all_up
It makes it easier to modify the tests, for instance to create erasure
coded pools and tests specific to them.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
To create a pool before running a code bloc and remove it after.
with manager.pool("mypool"):
mytest..
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Add a function dedicated to erasure coded pools tests, similar to
repair_test_1. Add a corrupter that removes the hinfo_key from the object.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Add the CephManager.objectstore_tool method to encapsulate a call to
ceph-objectstore-tool. The wrapper can convert an object name into the
PG id and figure out the primary OSD. The designated OSD is stopped
before running the command and restarted afterwards.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The commit is large but does not introduce any semantic change and
consists primarily in code moving around, re-indented and removed.
Replace functions generating functions by functions and sequentially
iterating over a list of functions with a sequential call to the
functions.
Replace the setup/teardown with an equivalent using a with
statement and the ceph_manager.pool method.
Replace inline code with a call to ceph_manager.wait_for_all_up
It makes it easier to modify the tests, for instance to create erasure
coded pools and tests specific to them.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
To create a pool before running a code bloc and remove it after.
with manager.pool("mypool"):
mytest..
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Require ceph-objectstore-tool to be available on all OSD nodes
Log a message when tool is not available
Signed-off-by: David Zafman <dzafman@redhat.com>
ice-tools needs a virtualenv populated to properly run to build
an iceball; add the commands to do that. Also remove the built
iceball when the task exits.
Fixes: #10523
Signed-off-by: Dan Mick <dan.mick@redhat.com>
Previously, the task would search for the lexicographically-greatest
filename matching ICE-*.tar.gz; now it builds a specific name
ICE-{ice_version}-{ice_distro}.tar.gz
Fixes: #10521
Signed-off-by: Dan Mick <dan.mick@redhat.com>
The small segments and small segment limit
were used when doing a hacky flush by doing
IO and waiting: now that we have the explicit
'flush journal' asok in use, we can just use
a normal journal configuration.
Signed-off-by: John Spray <john.spray@redhat.com>
This was only used in get_first_mon, which doesn't actually
need the parameter itself. Makes it easier to casually
use Filesystem from any place with a ctx to hand.
Signed-off-by: John Spray <john.spray@redhat.com>
When unused clients were mounted during an fs new,
they would end up in a state where they stalled
on subsequent attempts to umount them (ceph-fuse
stalls on exit if it can't terminate its mds_session)
Signed-off-by: John Spray <john.spray@redhat.com>
Instead of blocking the whole port range (which
might make OSDs running on that node collateral
damage), read the MDS's port out of the MDS map
and just block that.
Signed-off-by: John Spray <john.spray@redhat.com>
...because this is the one that will store up
changes to roll back during teardown.
Doing this makes it easy to run lots of test cases
togeher in a single teuthology run, raher than
setting up/tearing down the ceph cluster for each
on.
Signed-off-by: John Spray <john.spray@redhat.com>
Now that we have more of these cases, there was lots
of duplication in setup and teardown. For some tests
the "reset everything" setup/teardown is overkill,
but it's harmless.
Signed-off-by: John Spray <john.spray@redhat.com>
Since the new 'tell' for the MDS was introduced,
caps have to have the '*' to permit running remote
administrative commands.
Signed-off-by: John Spray <john.spray@redhat.com>
Now that #10387 is fixed in master, we can tighten
up this test to ensure that the expected deletions
are happening.
Signed-off-by: John Spray <john.spray@redhat.com>
This reverts commit 26a33c3a5a.
This is tryign to create the archive dir on the remote host:
2014-12-29T12:15:30.213 INFO:teuthology.orchestra.run.plana31:Running: 'mkdir -p /var/lib/teuthworker/archive/sage-2014-12-29_11:40:52-rgw-next---basic-multi/683052'
2014-12-29T12:15:30.231 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested
vars.append(enter())
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/s3readwrite.py", line 241, in run_tests
ctx.cluster.only(client).run(args=['mkdir', '-p', archive_dir])
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/cluster.py", line 64, in run
return [remote.run(**kwargs) for remote in remotes]
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 128, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 368, in run
r.wait()
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 106, in wait
exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on plana31 with status 1: 'mkdir -p /var/lib/teuthworker/archive/sage-2014-12-29_11:40:52-rgw-next---basic-multi/683052'
...but it should only be on the local host.
This tests:
* The new 'flush journal' asok command
* That the resulting on disk structures are as expected
* That cephfs-journal-tool is happy with the result
Fixes: #9881
Signed-off-by: John Spray <john.spray@redhat.com>
The format of the output of --op list was changed to include the PG to
which the object belong. It simplifies the loop in
ceph_objectstore_tool.py.
http://tracker.ceph.com/issues/10376Fixes: #10376
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Previously was always using the default values of things
so querying mon instead of the appropriate service
worked fine. However, for things we might want to
update on a per-test basis we need to go ask the
correct service what the setting really is.
Needed for osd_mon_report_interval_max in the ENOSPC
testing.
Signed-off-by: John Spray <john.spray@redhat.com>
Fixes: #9892
Need to wait through the usage interval before trimming usage, otherwise we might not
remove all pending usage info.
Backport: dumpling, firefly, giant
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit dd09ecbfab8a659f3faaf879a52849caab5e8e8e)
It now checks for 'notify1' and 'notify2' strings, allowing it to work
on both old and new versions of rados watch command.
Signed-off-by: Sage Weil <sage@redhat.com>
Leave the legacy handling out in cephfs_setup, move
the filesystem creation stuff into Filesystem. I
anticipate this being the right place for it if/when
we have tests that want to do 'fs rm' 'fs new' type
cycles within themselves.
Signed-off-by: John Spray <john.spray@redhat.com>
This was tripping over the recent commit 42c85e80
in Ceph master, which tightens the limits on
acceptable PG counts per OSD, and was making
teuthology runs fail due to never going clean.
Rather than put in a new hardcoded count, infer
it from config. Move some code around so that
the ceph task can get at a Filesystem object
to use in FS setup (this already has conf-getting
methods).
Signed-off-by: John Spray <john.spray@redhat.com>
New CephFS tests for MDS's auto repair functions. (So far the only
test case is verify/repair backtrace on fetch dirfrag)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
The s3readwrite.py task formerly wrote too much output while excuting.
It now saves the data on the local machine in either the archive
directory or in /tmp if no archive directory is specified.
The new file contains a client name and timestamp in its name.
Once all processing has completed, that file is saved locally.
Fixes: 9117
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Create an erasure coded pool and run tests on it. The list of PGs is
adapted to contain the shard id.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Instead of hardcoding 12 use a configuration option that defaults to
12. It is handy during development to lower the number to 4 and speed up
the test cycle.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Move code out of the task into function. Also remove the "REP" specifics
from helper functions that could also be used for erasure coded pools.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
When an hinfo_key attribute is found, assume an erasure coded object and
verify set-attr/get-attr works as expected by removing its content and
restoring it.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
For erasure coded pools to be tested, the json object representation
must be preserved for all PG because they are all different. The
internal representation is changed from
db[name]["pgid"] = pg
db[name]["json"] = objjson
to a per pg map:
db[name].setdefault("pg2json", {})[pg] = objjson
and the rest of the code is modified to adapt accordingly.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The osd dump command displays pool types using numerics instead of
symbolic names. Create constants in the CephManager class to use instead
of numbers.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Made suggestions from code reviews.
Added no_epel option.
Merged Dan Mick's changes that add the ability to get
iceballs from http URL.
Remove duplicate assignment and added some log.debugs
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Calamari_setup can be used to set up a calamari gui for manual testing,
or be run in a suite to test the calamari setup and calamari ceph
installation code.
Fixes: 9759
Signed-off-by: Warren Usui <warren.usui@inktank.com>
We mostly do a variety of successful ones, but we also corrupt the store
using the rados tool and make sure we get the expected error codes. Includes
a yaml fragment so the task gets run as part of the fs/basic suite.
Signed-off-by: Greg Farnum <greg@inktank.com>
Old version libfuse treats both flock and posix lock requests as posix
lock request. This is a workaround for the bug.
Fixes: #9995
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Don't need to explicitly turn off the test during some upgrades
Leaving disabled until merge of import/export fixes
Fixes: #9805
Signed-off-by: David Zafman <dzafman@redhat.com>
Other rados put will fail as follows
$ touch /tmp/bar
$ ./rados -p rbd put existing_3 /tmp/bar
$ ./rados -p rbd put existing_3 /tmp/bar
WARNING: could not create object: existing_3
error putting rbd/existing_3: (17) File exists
it should be considered a bug in the rados command line but needs to be
addressed separately.
http://tracker.ceph.com/issues/9387Fixes: #9387
Signed-off-by: Loic Dachary loic-201408@dachary.org
Adding this so that we can modify the clients' conf file as needed with slow backend.
This can be achieved by:
overrides:
s3tests:
slow_backend: true
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 61409179df)
Adding this so that we can modify the clients' conf file as needed with slow backend.
This can be achieved by:
overrides:
s3tests:
slow_backend: true
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Other rados put will fail as follows
$ touch /tmp/bar
$ ./rados -p rbd put existing_3 /tmp/bar
$ ./rados -p rbd put existing_3 /tmp/bar
WARNING: could not create object: existing_3
error putting rbd/existing_3: (17) File exists
it should be considered a bug in the rados command line but needs to be
addressed separately.
http://tracker.ceph.com/issues/9387Fixes: #9387
Signed-off-by: Loic Dachary loic-201408@dachary.org
To make the logs clearer when trying to work out
if/when something went wrong, rather than always
having client logs start with some failures.
Signed-off-by: John Spray <john.spray@redhat.com>
'client_id' was ambiguous because in other places it
meant the '0' in client.0, whereas here it means
the runtime-generated global ID of the client.
Signed-off-by: John Spray <john.spray@redhat.com>
Some of this stuff could be even more general for embedding
unittest-style suites, but for the moment let's keep the cephfs
stuff in a walled garden.
Signed-off-by: John Spray <john.spray@redhat.com>
May have been causing spurious failures on
trying to read session state after MDS restart (
session list isn't populated until recovery is
complete)
Signed-off-by: John Spray <john.spray@redhat.com>
To make the logs clearer when trying to work out
if/when something went wrong, rather than always
having client logs start with some failures.
Signed-off-by: John Spray <john.spray@redhat.com>
'client_id' was ambiguous because in other places it
meant the '0' in client.0, whereas here it means
the runtime-generated global ID of the client.
Signed-off-by: John Spray <john.spray@redhat.com>
Some of this stuff could be even more general for embedding
unittest-style suites, but for the moment let's keep the cephfs
stuff in a walled garden.
Signed-off-by: John Spray <john.spray@redhat.com>
...so that there will at least be multiple segments
in the log during the rewrite.
Also make the test stricter by checking that
cephfs-journal-tool can happily read the resulting
journal.
Signed-off-by: John Spray <john.spray@redhat.com>
Previously would fail because the cap waiter
completed too soon, without noticing that the
reason it completed quickly was because it failed.
Signed-off-by: John Spray <john.spray@redhat.com>
Check for more than 1 osd down and randomize on chance_move_pg (100%)
For now only export from older down osd to newly down osd to avoid missing map
Signed-off-by: David Zafman <david.zafman@inktank.com>
Based on ceph/src/test/ceph_objectstore_tool.py but only does
replicated pool testing and doesn't test argument validation.
Signed-off-by: David Zafman <david.zafman@inktank.com>
ceph.created_pool allows the user (via yaml lines) to add pools
that the ceph_manager knows about.
Fixes: 9091
Signed-off-by: Warren Usui <warren.usui@inktank.com>
This will enable using .yaml changes to switch this
guy over to use kcephfs client once the teuthology
code around it supports all the same hooks as I've added
for fuse.
Signed-off-by: John Spray <john.spray@redhat.com>
This is for any test config that needs to run
some workunit with clients unmounted. It allows
you to go toggle the mountedness of a client as
you go up and down the stack list this:
- ceph-fuse:
client.0:
mounted: true
- workunit:
clients:
client.0:
- fs/misc/trivial_sync.sh
- ceph-fuse:
client.0:
mounted:
false
The initial use case for this is running the
cephfs_journal_tool_smoke.sh workunit, which
tests administrative operations that are meant
to be run on an unmounted filesystem.
Signed-off-by: John Spray <john.spray@redhat.com>
So that we can explicitly stop daemons on demand. Useful
for MDS tool tests that want the MDS daemons not to be running,
is this is more solid and explicit than doing e.g. "ceph mds
stop" from within workunits.
Signed-off-by: John Spray <john.spray@redhat.com>