So if there are a lot fo missing objects on primary, we can
make use of auth_log_shard to restore client I/O quickly.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
* refs/pull/23439/head:
qa: whitelist cap revoke warning
doc: document cap revoke non-responders client eviction
test: validate client eviction for cap revoke non-responders
mds: add counter for tracking cap non-responding clients
mds: evict clients that do not respond to cap revoke by MDS
mds: pass timeout argument for fetching late clients
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Zheng Yan <zyan@redhat.com>
* check if geom_gate can be loaded before doing the actual tests
Otherwise continuing does not make sense.
Major reason for this problem is due to mismatch between
kernel and module versions.
* After FreeBSD kernevel 1200078 ggate resizing is possible
So set the flag that resizing can be tested
* Only sudo commands that really need sudo
rbd-ggate list is available in regular user mode
* be a bit more verbose during testing and list the test purpose
* list-mapped is an option in rbd-nbd, not (yet) in rbd-ggate
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
callers of get_python_path were not passing in a $1 parameter, so
ceph_lib was an empty string resulting in an invalid path to the built
cython modules. assume this is called from the `lib` parent directory.
pass path to the manager modules when starting ceph-mgr.
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
* refs/pull/23240/head:
qa/suites/rados, qa/workunits/rados: Add suite/workunit for ceph-crash
add ceph-crash service
common/options: enable mgr 'crash' module by default
global/signal_handler: add 'done' file to signal crashdump is ready
Reviewed-by: Sage Weil <sage@redhat.com>
mgr/dashboard: Add backend support for changing dashboard configuration settings via the REST API
Reviewed-by: Ricardo Marques <rimarques@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
Generally the slow warnings we get are just over the threshold. These warnings
are related to deploying multiple Ceph daemons side-by-side. Let's see how we
do with two minutes.
Ignoring the warnings entirely is unsatisfactory as they serve as a useful
canary in the coal mine when you see warnings for ops > some unreasonably large
amount of time.
Fixes: http://tracker.ceph.com/issues/26900
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Enables to change (set/unset) values of settings of the dashboard using
the REST API.
Fixes: https://tracker.ceph.com/issues/24273
Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
* refs/pull/23471/head:
mon/PGMap: fix spacing around pretty-printed SI units
include/types: render SI units adjacent to number
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: João Eduardo Luís <joao@suse.de>
the default set of packages to install is in
$suite/qa/packages/packages.yaml . see get_package_list() in
teuthology/teuthology/task/install/__init__.py for how we prepare a
package list for install task.
for running python3 tests in
fs/basic_functional/tasks/volume-client, we need to install
python3-cephfs. please note that,
_package_override() in teuthology/teutholoy/task/install/rpm.py will
take care of the different naming on centos/rhel, where the python3
packages are named python34-*.
Signed-off-by: Kefu Chai <kchai@redhat.com>
This reverts commit c1efd59f61
task.install.rpm installs packages listed in
$suites/qa/packages/packages.yaml, the packge list applies to the
upgrade tests also. but we don't have python3 bindings packages in jewel
-- they were introduced in kraken.
Signed-off-by: Kefu Chai <kchai@redhat.com>
an ugly workaround for a python dependency conflict that's broken the
rgw/tempest suite. allows us to preserve the pinned versions of
keystone/tempest without having to maintain a fork of the keystone
repository
Fixes: http://tracker.ceph.com/issues/23659
Signed-off-by: Casey Bodley <cbodley@redhat.com>
radosgw now uses 512 frontend threads by default, and valgrind won't
start with its default --max-threads=500
Fixes: http://tracker.ceph.com/issues/25214
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Drop unused suites, which ATM means all of them except upgrade/luminous-x
which recently got a cleanup in https://github.com/ceph/ceph/pull/23162
Signed-off-by: Nathan Cutler <ncutler@suse.com>
'policy show' returns a json-encoded representation of
RGWAccessControlPolicy, while key.get_xml_acl() returns
RGWAccessControlPolicy_S3 encoded as xml. so even with '&format=xml',
the strings won't match
Signed-off-by: Casey Bodley <cbodley@redhat.com>
result.json() throws a 'JSONDecodeError: Expecting value: line 1 column 1'
for requests that return no body, such as 'user rm' 'key rm' 'subuser
rm', 'bucket unlink', etc
Signed-off-by: Casey Bodley <cbodley@redhat.com>
* Assert `pg_placement_num` has the same value as `pg_num`.
* Only set `application_metadata`, if not None.
* `osd pool set` only accepts strings.
* Sync `pgp_num` with `pg_num`.
Signed-off-by: Stephan Müller <smueller@suse.com>
Avoid need for each module to expose a self-test
command: they can just implement the method,
and then get it called via the selftest module.
As well as fewer LOC, this means that the self
test commands are not cluttering the interface
for end users, as they've invisible until
the selftest module is loaded.
Signed-off-by: John Spray <john.spray@redhat.com>
This is being done by passing native CPython objects
back and forth. It's safe because sub-interpreters in CPython
share memory allocation infrastructure and share the GIL.
With a view to PEP554, we limit inter-interpreter calls
to pickleable objects, so that this may be implemented
using byte-arrays in future.
This infrastructure should enable:
- the dashboard to display the status of other modules, for
example the set of progress indicators from `progress`
- dashboard and restful to share an underlying long running
job mechanism.
Signed-off-by: John Spray <john.spray@redhat.com>
This fixes errors caused by remount done by some tests (test_recovery_pool.py)
where the fs name is not given.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The MDS may not be on the same machine where the cluster command is run.
Fixes: http://tracker.ceph.com/issues/24858
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/21885/head:
qa: update cluster log health warning message
qa: add tests for client features
mds: evict clients that lack required features
mds: cleanup MDSRank::evict_client
mds: infer client version by client metadata and connection's features
mds: introduce "ceph fs set <fs_name> min_compat_client <release_name>"
mds: tell client why it's rejected
mds: introduce cephfs' own feature bits
mds: make Server::prepare_force_open_sessions() update client metadata
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Having lots of deletes will mean deletes on objects that don't exist,
which will in turn mean error log entries and more coverage of the
append_log_entries_update_missing code. Hopefully this will trigger
http://tracker.ceph.com/issues/24597
Signed-off-by: Sage Weil <sage@redhat.com>
The log trimming case wasn't quite right. Before HEAD^ we were
rolling forward too aggressively and miscalculating the can_rollforward_to,
which affected the trim_to calculation.
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/22455/head:
qa/ceph-volume: add a test for put_object_versioned()
ceph-volume-client: allow atomic updates for RADOS objects
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Selecting force peering on a single PG. In reality this probably induces
*2* interval changes.
Note that in the case of a single OSD cluster we can't actually force a
repeer on a single PG because the pg_temp code is pretty robust about
filtering out redundant or meaningless changes, so we can't pg_temp our
way into a new interval if there are no other OSDs to switch to and the
code also prevents an empty pg_temp.
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/22740/head:
qa: create common conf for all cephfs suites
qa: remove wrongly created random distro conf
Reviewed-by: Zheng Yan <zyan@redhat.com>
This will be followed by removing common CephFS configurations in the
ceph.conf.template in teuthology.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/22165/head:
qa: add one-off clusters to qa/cephfs/clusters
qa: allocate more space for VM disk
qa/cephfs/clusters/*: bigger cinder volumses
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Performance evaluations of medium to large size Ceph clusters have
demonstrated negligible performance impact from unnecessarily deep
directory hierarchies but significant performance impact from filestore
split and merge activity. Disable merges by default.
Fixes: http://tracker.ceph.com/issues/24686
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
* refs/pull/22725/head:
qa/workunits/suites/blogbench.sh: use correct dir name
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
This utilizes the recent feature in teuthology [1] to skip hidden files in
suites when building the job matrix.
Idea of this change is to enable referring to the top-level qa directory in a
position-independent way such that copies of a suite to another location do not
break any symlinks.
[1] https://github.com/ceph/teuthology/pull/1185
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mgr/dashboard: Add support for URI encode
Reviewed-by: Ricardo Dias <rdias@suse.com>
Reviewed-by: Ricardo Marques <rimarques@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
Created a decorator and pipe to help encode special URI components in the
frontend.
Modified the backend request handler to decode all the string args.
fixes: http://tracker.ceph.com/issues/24621
Signed-off-by: Tiago Melo <tmelo@suse.com>
If ulimit is set to a 1024 value, ceph-osd will segfault with the
following error :
filestore(td/smoke/0) error (24) Too many open files not handled on operation 0x55565d1fd004 (2182.1.0, or op 0, counting from 0)
This patch is about to insure that before setting up ceph daemons in tests, a valid ulimit value is setup.
Signed-off-by: Erwan Velu <erwan@redhat.com>
get_timeout_delays() is a generic function to compute delays for a long
period of time without saturating the CPU is busy loops.
It works pretty fine when the delay is short like having the following
series when requesting a 20seconds timeout : "0.1 0.2 0.4 0.8 1.6 3.2 6.4 7.3 ".
Here the maximum between two loops is 7.3 which is perfectly fine.
When the timeout reaches 300sec, the same code produces the following
series : "0.1 0.2 0.4 0.8 1.6 3.2 6.4 12.8 25.6 51.2 102.4 95.3 "
In such example there is delays which are nearly 2 minutes !
That is not efficient as the expected event, between two loops, could
arrive just after this long sleep occurs making a minute+ sleep for
nothing. On a local system that could be ok while on a CI, if all jobs
run like CI the overall is pretty unefficient by generating useless CPU
waits.
This patch is about adding a maximum acceptable delay time between two
loops while keeping the same rampup behavior.
On the same 300 seconds delay example, with MAX_TIMEOUT set to 10, we
now have the following series: "0.1 0.2 0.4 0.8 1.6 3.2 6.4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 7.3"
We can see that the long 12/25/51/102/95 values vanished and being
replaced by a series of 10 seconds. It's up to every test defining the
probability of having a soonish event to complete.
The MAX_TIMEOUT is set to 15seconds.
Signed-off-by: Erwan Velu <erwan@redhat.com>
Fixes: http://tracker.ceph.com/issues/24436
To fully support the role based authentication/authorization system it is necessary to replace the RGW proxy controller by separate controllers for RGW user and bucket.
Signed-off-by: Volker Theile <vtheile@suse.com>
Avoid sporadic failures in combination with msgr-failures/many.yaml,
where assert_locked() might take over 10 seconds.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>