Previously, the task would search for the lexicographically-greatest
filename matching ICE-*.tar.gz; now it builds a specific name
ICE-{ice_version}-{ice_distro}.tar.gz
Fixes: #10521
Signed-off-by: Dan Mick <dan.mick@redhat.com>
The small segments and small segment limit
were used when doing a hacky flush by doing
IO and waiting: now that we have the explicit
'flush journal' asok in use, we can just use
a normal journal configuration.
Signed-off-by: John Spray <john.spray@redhat.com>
This was only used in get_first_mon, which doesn't actually
need the parameter itself. Makes it easier to casually
use Filesystem from any place with a ctx to hand.
Signed-off-by: John Spray <john.spray@redhat.com>
When unused clients were mounted during an fs new,
they would end up in a state where they stalled
on subsequent attempts to umount them (ceph-fuse
stalls on exit if it can't terminate its mds_session)
Signed-off-by: John Spray <john.spray@redhat.com>
Instead of blocking the whole port range (which
might make OSDs running on that node collateral
damage), read the MDS's port out of the MDS map
and just block that.
Signed-off-by: John Spray <john.spray@redhat.com>
...because this is the one that will store up
changes to roll back during teardown.
Doing this makes it easy to run lots of test cases
togeher in a single teuthology run, raher than
setting up/tearing down the ceph cluster for each
on.
Signed-off-by: John Spray <john.spray@redhat.com>
Now that we have more of these cases, there was lots
of duplication in setup and teardown. For some tests
the "reset everything" setup/teardown is overkill,
but it's harmless.
Signed-off-by: John Spray <john.spray@redhat.com>
Since the new 'tell' for the MDS was introduced,
caps have to have the '*' to permit running remote
administrative commands.
Signed-off-by: John Spray <john.spray@redhat.com>
Now that #10387 is fixed in master, we can tighten
up this test to ensure that the expected deletions
are happening.
Signed-off-by: John Spray <john.spray@redhat.com>
This reverts commit 26a33c3a5aa2aedb52eb5ce140c76503f099b253.
This is tryign to create the archive dir on the remote host:
2014-12-29T12:15:30.213 INFO:teuthology.orchestra.run.plana31:Running: 'mkdir -p /var/lib/teuthworker/archive/sage-2014-12-29_11:40:52-rgw-next---basic-multi/683052'
2014-12-29T12:15:30.231 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested
vars.append(enter())
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/s3readwrite.py", line 241, in run_tests
ctx.cluster.only(client).run(args=['mkdir', '-p', archive_dir])
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/cluster.py", line 64, in run
return [remote.run(**kwargs) for remote in remotes]
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 128, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 368, in run
r.wait()
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 106, in wait
exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on plana31 with status 1: 'mkdir -p /var/lib/teuthworker/archive/sage-2014-12-29_11:40:52-rgw-next---basic-multi/683052'
...but it should only be on the local host.
This tests:
* The new 'flush journal' asok command
* That the resulting on disk structures are as expected
* That cephfs-journal-tool is happy with the result
Fixes: #9881
Signed-off-by: John Spray <john.spray@redhat.com>
The format of the output of --op list was changed to include the PG to
which the object belong. It simplifies the loop in
ceph_objectstore_tool.py.
http://tracker.ceph.com/issues/10376Fixes: #10376
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Previously was always using the default values of things
so querying mon instead of the appropriate service
worked fine. However, for things we might want to
update on a per-test basis we need to go ask the
correct service what the setting really is.
Needed for osd_mon_report_interval_max in the ENOSPC
testing.
Signed-off-by: John Spray <john.spray@redhat.com>
Fixes: #9892
Need to wait through the usage interval before trimming usage, otherwise we might not
remove all pending usage info.
Backport: dumpling, firefly, giant
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit dd09ecbfab8a659f3faaf879a52849caab5e8e8e)
It now checks for 'notify1' and 'notify2' strings, allowing it to work
on both old and new versions of rados watch command.
Signed-off-by: Sage Weil <sage@redhat.com>
Leave the legacy handling out in cephfs_setup, move
the filesystem creation stuff into Filesystem. I
anticipate this being the right place for it if/when
we have tests that want to do 'fs rm' 'fs new' type
cycles within themselves.
Signed-off-by: John Spray <john.spray@redhat.com>
This was tripping over the recent commit 42c85e80
in Ceph master, which tightens the limits on
acceptable PG counts per OSD, and was making
teuthology runs fail due to never going clean.
Rather than put in a new hardcoded count, infer
it from config. Move some code around so that
the ceph task can get at a Filesystem object
to use in FS setup (this already has conf-getting
methods).
Signed-off-by: John Spray <john.spray@redhat.com>
New CephFS tests for MDS's auto repair functions. (So far the only
test case is verify/repair backtrace on fetch dirfrag)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
The s3readwrite.py task formerly wrote too much output while excuting.
It now saves the data on the local machine in either the archive
directory or in /tmp if no archive directory is specified.
The new file contains a client name and timestamp in its name.
Once all processing has completed, that file is saved locally.
Fixes: 9117
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Create an erasure coded pool and run tests on it. The list of PGs is
adapted to contain the shard id.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Instead of hardcoding 12 use a configuration option that defaults to
12. It is handy during development to lower the number to 4 and speed up
the test cycle.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Move code out of the task into function. Also remove the "REP" specifics
from helper functions that could also be used for erasure coded pools.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
When an hinfo_key attribute is found, assume an erasure coded object and
verify set-attr/get-attr works as expected by removing its content and
restoring it.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
For erasure coded pools to be tested, the json object representation
must be preserved for all PG because they are all different. The
internal representation is changed from
db[name]["pgid"] = pg
db[name]["json"] = objjson
to a per pg map:
db[name].setdefault("pg2json", {})[pg] = objjson
and the rest of the code is modified to adapt accordingly.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The osd dump command displays pool types using numerics instead of
symbolic names. Create constants in the CephManager class to use instead
of numbers.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Made suggestions from code reviews.
Added no_epel option.
Merged Dan Mick's changes that add the ability to get
iceballs from http URL.
Remove duplicate assignment and added some log.debugs
Signed-off-by: Warren Usui <warren.usui@inktank.com>
Calamari_setup can be used to set up a calamari gui for manual testing,
or be run in a suite to test the calamari setup and calamari ceph
installation code.
Fixes: 9759
Signed-off-by: Warren Usui <warren.usui@inktank.com>
We mostly do a variety of successful ones, but we also corrupt the store
using the rados tool and make sure we get the expected error codes. Includes
a yaml fragment so the task gets run as part of the fs/basic suite.
Signed-off-by: Greg Farnum <greg@inktank.com>
Old version libfuse treats both flock and posix lock requests as posix
lock request. This is a workaround for the bug.
Fixes: #9995
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Don't need to explicitly turn off the test during some upgrades
Leaving disabled until merge of import/export fixes
Fixes: #9805
Signed-off-by: David Zafman <dzafman@redhat.com>
Other rados put will fail as follows
$ touch /tmp/bar
$ ./rados -p rbd put existing_3 /tmp/bar
$ ./rados -p rbd put existing_3 /tmp/bar
WARNING: could not create object: existing_3
error putting rbd/existing_3: (17) File exists
it should be considered a bug in the rados command line but needs to be
addressed separately.
http://tracker.ceph.com/issues/9387Fixes: #9387
Signed-off-by: Loic Dachary loic-201408@dachary.org
Adding this so that we can modify the clients' conf file as needed with slow backend.
This can be achieved by:
overrides:
s3tests:
slow_backend: true
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 61409179df)
Adding this so that we can modify the clients' conf file as needed with slow backend.
This can be achieved by:
overrides:
s3tests:
slow_backend: true
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Other rados put will fail as follows
$ touch /tmp/bar
$ ./rados -p rbd put existing_3 /tmp/bar
$ ./rados -p rbd put existing_3 /tmp/bar
WARNING: could not create object: existing_3
error putting rbd/existing_3: (17) File exists
it should be considered a bug in the rados command line but needs to be
addressed separately.
http://tracker.ceph.com/issues/9387Fixes: #9387
Signed-off-by: Loic Dachary loic-201408@dachary.org
To make the logs clearer when trying to work out
if/when something went wrong, rather than always
having client logs start with some failures.
Signed-off-by: John Spray <john.spray@redhat.com>
'client_id' was ambiguous because in other places it
meant the '0' in client.0, whereas here it means
the runtime-generated global ID of the client.
Signed-off-by: John Spray <john.spray@redhat.com>
Some of this stuff could be even more general for embedding
unittest-style suites, but for the moment let's keep the cephfs
stuff in a walled garden.
Signed-off-by: John Spray <john.spray@redhat.com>
May have been causing spurious failures on
trying to read session state after MDS restart (
session list isn't populated until recovery is
complete)
Signed-off-by: John Spray <john.spray@redhat.com>
To make the logs clearer when trying to work out
if/when something went wrong, rather than always
having client logs start with some failures.
Signed-off-by: John Spray <john.spray@redhat.com>
'client_id' was ambiguous because in other places it
meant the '0' in client.0, whereas here it means
the runtime-generated global ID of the client.
Signed-off-by: John Spray <john.spray@redhat.com>
Some of this stuff could be even more general for embedding
unittest-style suites, but for the moment let's keep the cephfs
stuff in a walled garden.
Signed-off-by: John Spray <john.spray@redhat.com>
...so that there will at least be multiple segments
in the log during the rewrite.
Also make the test stricter by checking that
cephfs-journal-tool can happily read the resulting
journal.
Signed-off-by: John Spray <john.spray@redhat.com>
Previously would fail because the cap waiter
completed too soon, without noticing that the
reason it completed quickly was because it failed.
Signed-off-by: John Spray <john.spray@redhat.com>
Check for more than 1 osd down and randomize on chance_move_pg (100%)
For now only export from older down osd to newly down osd to avoid missing map
Signed-off-by: David Zafman <david.zafman@inktank.com>
Based on ceph/src/test/ceph_objectstore_tool.py but only does
replicated pool testing and doesn't test argument validation.
Signed-off-by: David Zafman <david.zafman@inktank.com>
ceph.created_pool allows the user (via yaml lines) to add pools
that the ceph_manager knows about.
Fixes: 9091
Signed-off-by: Warren Usui <warren.usui@inktank.com>
This will enable using .yaml changes to switch this
guy over to use kcephfs client once the teuthology
code around it supports all the same hooks as I've added
for fuse.
Signed-off-by: John Spray <john.spray@redhat.com>
This is for any test config that needs to run
some workunit with clients unmounted. It allows
you to go toggle the mountedness of a client as
you go up and down the stack list this:
- ceph-fuse:
client.0:
mounted: true
- workunit:
clients:
client.0:
- fs/misc/trivial_sync.sh
- ceph-fuse:
client.0:
mounted:
false
The initial use case for this is running the
cephfs_journal_tool_smoke.sh workunit, which
tests administrative operations that are meant
to be run on an unmounted filesystem.
Signed-off-by: John Spray <john.spray@redhat.com>
So that we can explicitly stop daemons on demand. Useful
for MDS tool tests that want the MDS daemons not to be running,
is this is more solid and explicit than doing e.g. "ceph mds
stop" from within workunits.
Signed-off-by: John Spray <john.spray@redhat.com>
But don't error if it fails, as this would mean that the monitors
are just taking longer to form quorum. Go and try the next block which will
wait up to 15 minutes for a successful gatherkeys to happen (that only works
if monitors have formed quorum).
Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
If erasure_code_profile is present at the same leve as ec-data-pool, it
is used to override the default hard coded profile.
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Inside a conditional to affect only 2.4, set User, Group, and the
module config to load mpm_event. This is normally done with the
default configuration files, but since this abbreviated conf bypasses
those, we must set them here.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
instead of rados.py because ceph.py is only run once where rados.py
could be run multiple time, leading to race conditions
http://tracker.ceph.com/issues/9027Fixes: #9027
Signed-off-by: Loic Dachary <loic@dachary.org>
mount_osd_data and make_admin_daemon_dir are only used by
ceph_manager.py although they are defined in ceph.py
Signed-off-by: Loic Dachary <loic@dachary.org>
Globally overriding the rgw idle_timeout is not possible because it it
needs to be done on a per client.0, client.1, etc. basis. Add the
default_idle_timeout key to the rgw config : it defaults to the
previously hardcoded default (30) and can be changed via the override.
The existing tasks that were previously overriding the idle_timeout on a
per client basis are changed to use the default_idle_timeout instead for
consistency and to allow a global override.
Signed-off-by: Loic Dachary <loic@dachary.org>
gevent may hold the rados.py thread when it has an opportunity. The
if not hasattr(ctx, 'manager'):
must therefore be immediately before the manager creation it is supposed
to protect. If any of the functions called as a side effect of
first_mon = teuthology.get_first_mon(ctx, config)
(mon,) = ctx.cluster.only(first_mon).remotes.iterkeys()
give gevent an opportunity to hold the thread, it creates a race
condition.
The other possibility would be use a ctx lock to protect the code, but
this solution seem simpler.
http://tracker.ceph.com/issues/9027Fixes: #9027
Signed-off-by: Loic Dachary <loic@dachary.org>