Separate diskprediction local cloud from the diskprediction plugin.
Devicehealth invoke device prediction function related on the global
configuration "device_failure_prediction_mode".
Signed-off-by: Rick Chen <rick.chen@prophetstor.com>
before this change, we assume that the variable set if rados::radospp is
found will be radospp_FOUND, but this is a feature cmake 3, see
https://cmake.org/cmake/help/v3.3/module/FindPackageHandleStandardArgs.html
while the cmake shipped by centos is cmake 2.8.12, where the variable
name will be <UPPERCASED_NAME>_FOUND, see
https://cmake.org/cmake/help/v2.8.12/cmake.html#module:FindPackageHandleStandardArgs
in the test of test_envlibrados_for_rocksdb.sh, we are using cmake not
the cmake3 offered by EPEL7, so RADOSPP_FOUND will be set instead. that's why
executable env_librados_test will fail to link against rados::radospp.
as rados::radospp won't be defined if radospp_FOUND is not defined/set.
after this change, the 2nd mode of FIND_PACKAGE_HANDLE_STANDARD_ARGS()
is used instead to ensure that radospp_FOUND is defined even if cmake
2.8.12 is used.
also, the message() commands for debugging purpose are removed.
Signed-off-by: Kefu Chai <kchai@redhat.com>
we use the playbook of "testnodes.yml" defined by ceph-cm-ansible for
initializing test nodes, and the role of "testnode" is used by
testnodes.yml. "testnode" requires "qemu-system-x86" or "qemu-kvm"
package to be installed. the qemu in turn depends on librbd1 and
librados2.
before librados3 was introduced, this worked perfectly. because in ceph
repo, qa/packages/packages.yaml defines the default set of packages the
"install" tasks should install. and in that yaml file, librados2 was
listed. so the package management system will overwrite the librados2
installed by ansible playbook with the version specified by the
"install" task, as apt/yum thinks this is what user requires explicitly,
so it's fine to install a different version of librados2.
after librados3 was introduced, librados2 was removed from
qa/packages/packages.yaml. because, by default, we need to install
librados3 instead of librados2 for ready a nautilus cluster. but the
problem is, the packge list also applies to "install" tasks installing
releases before nautilus, where we still need to replace the librados2
installed by ansible.
so, to address this issue, "librados2" is added to "extra_packages" of
the "install" tasks of tests installing old releases to install
librados2 explicitly instead of as a dependency of other ceph packages
like librbd1.
Signed-off-by: Kefu Chai <kchai@redhat.com>
The new info endpoint will provide the frontend with the necessary
information it needs to create new profiles.
Fixes: https://tracker.ceph.com/issues/25156
Signed-off-by: Stephan Müller <smueller@suse.com>
Use --rmtype snapmap with new obj16 to remove snapmap only, check for repair message
Use --rmtype nosnapmap to remove obj5 while leaving snapmap behind
Signed-off-by: David Zafman <dzafman@redhat.com>
* refs/pull/24828/head:
qa/osd-bluefs-volume-ops: use ceph-bluestore-tool for fsck
qa/osd-bluefs-volume-ops: reduce space usage for the test case
Reviewed-by: David Zafman <dzafman@redhat.com>
mgr/dashboard: tasks.mgr.dashboard.test_osd.OsdTest failures
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Patrick Nawracay <pnawracay@suse.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
we have switched from tmap to omap long ago.
but keep the server side implementation around, in case ancient
client is still using these tmap APIs.
also, tmap_update() is kept, because librbd is using it for v1 image
backward compatibility.
Signed-off-by: Kefu Chai <kchai@redhat.com>
- Fix bug in Dashboard QA unit test framework. Don't set the application type header manually, this is done by the requests library if required.
- Enhance QA unit test helper: Print the response of the API request if it fails. This should help to identify the problem more easily.
- Fix bug in the OSD controller. A parameter needs to be converted to integer.
- Take care that the params of the request object are not modified.
The issue was introduced by PR https://github.com/ceph/ceph/pull/24475. The CherryPy json_in plugin disclosed the errorneous unit test helper implementation.
Fixes: https://tracker.ceph.com/issues/36708
Signed-off-by: Volker Theile <vtheile@suse.com>
It is particularly useful when running multiple rbd-mirror instances
in Active-Passive or Active-Active mode.
Signed-off-by: Mykola Golub <mgolub@suse.com>
rgw: Return tenant field in bucket_stats function
Reviewed-by: Lenz Grimmer <lgrimmer@suse.com>
Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
include a patch so rocksdb can use libradospp instead of librados. will
upstream the patch and make it work for both pre-nautilus librados and
nautilus libradospp
Signed-off-by: Kefu Chai <kchai@redhat.com>
the goal is to decouple C++ API from C API, and to version them
differently, as they are targeting different consumers.
this allows us to change the C++ API and bumping up its soversion
without requiring consumer to recompile the librados client for
using the new librados. in this way, C++ API can move faster than
C API. for example, if bufferlist interface is changed for better
performance, and this breaks existing API/ABI, we can bump up
the C++ library's soversion, and and the C library's version unchanged
but ship the new librados's C binding. so the librados client linked
against librados's C library will be able to take advantage of
the improvement in C++ library. while the librados client
linked against C++ library won't break at runtime due to unresolved
symbol or changed structure layout.
this is massive change, the genereal idea is to
* split librados.cc into two source files: librados_c.cc and
librados_cxx.cc, the former for implementing C APIs, the later
for C++ APIs.
* extract the C++ API in librados into librados-cxx, the library
name will be libradospp. but we can change it before nautilus
is released.
* link these librados libraries with static libraries which it
depends on, so "-Wl,--exclude-libs,ALL" link flags can help
hide the non-public symbols.
* extract the tests exercising librados' C++ API into a different
source file named *_cxx.cc. for instance, to move the C++ tests
in aio.cc into aio_cxx.cc
* extract the shared helper functions which do not use any librados
or librados-cxx APIs into test_shared{.cc,h}. the "shared" here
means, *shared* by C++ and C tests.
* extract the test fixtures, i.e., the subclasses of testing::Test,
for testing C++ APIs into testcase_cxx.cc.
* update qa/workunits/rados/test.sh accordingly to add the splitted
tests
* update the consumers of librados to link against librados-cxx
instead, if they are using the C++ API.
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/24809/head:
os/bluestore: omit redundant '/' in OSD path for ceph-bluestore-tool if
os/bluestore: improve error handling for migrate ops in
qa/standtalone/osd-bluefs-volume-ops: remove redundant code.
Reviewed-by: Sage Weil <sage@redhat.com>
* refs/pull/24787/head:
Merge PR #24796 into nautilus
osd: fix heartbeat_reset unlock
Merge PR #24780 into nautilus
Merge PR #24761 into nautilus
Merge PR #24651 into nautilus
osd: fix race between op_wq and context_queue
test: Make sure kill_daemons failure will be easy to find
test: Add flush_pg_stats to make test more deterministic
Python 3.7 now shows a warning as below.
/usr/bin/ceph:128: DeprecationWarning: Using or importing the ABCs from
'collections' instead of from 'collections.abc' is deprecated, and in
3.8 it will stop working
import rados
This patch addresses the that particular issue.
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
* refs/pull/24651/head:
test: Make sure kill_daemons failure will be easy to find
test: Add flush_pg_stats to make test more deterministic
Reviewed-by: Neha Ojha <nojha@redhat.com>
This is related to http://tracker.ceph.com/issues/36453. It is far from
a complete solution, but seems like a positive move.
I tested this change by first disabling my browser cache, and then used
the /docs endpoint to query /api/dashboard/health. Before compression:
Content-Length: 60748
Time: 615ms
After:
Content-Length: 7505
Time: 92ms
Then, I logged into the dashboard as normal and reloaded the page once I
was in. Some values for the reload operation before compression:
Total page load time: 58.48s
vendor.js Content-Length: 6486025
vendor.js time: 48.09s
After:
Total page load time: 14.55s
vendor.js Content-Length: 1143178
vendor.js time: 4.50s
Signed-off-by: Zack Cerza <zack@redhat.com>
This fixes "TypeError: admin_socket() got an unexpected keyword argument
'timeout'". The value is never used.
Signed-off-by: Zack Cerza <zack@redhat.com>
If there is a workunit task associated with the same client, the two
tasks will attempt to clone the suite repo to the same directory.
Worse, if it's parallel tasks, the two clones will clobber each
other.
Fixes: http://tracker.ceph.com/issues/36542
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 5d56014c61)
If there is a workunit task associated with the same client, the two
tasks will attempt to clone the suite repo to the same directory.
Worse, if it's parallel tasks, the two clones will clobber each
other.
Fixes: http://tracker.ceph.com/issues/36542
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
For EC pools we have a lot of shards, and 30% probability on each one
means we are very like to repeatedly fail backfill reservations.. long
enough that teuthology gives up waiting.
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/24359/head:
qa/tests: update ansible version to 2.6 for master branch testing.
qa/tests: use lvm as default for ceph-ansible testing, this should also work with raw devices
Reviewed-by: Alfredo Deza <adeza@redhat.com>
* refs/pull/24292/head:
qa: add test for rctime on root inode
mds: set rctime on new system inode
mds: small refactor
Reviewed-by: Zheng Yan <zyan@redhat.com>
This reverts a27fd9d25c and
b863883ca7.
Quote form Sébastien Han:
> IIRC at some point, we were able to create a device class from the CLI.
Now it seems that the device class gets created when at least one OSD
of a particular class starts.
In ceph-ansible, we create pools after the initial monitors are up and
we want to assign a device crush class on some of them.
That's not possible at the moment since there no device class available yet.
Also, someone might want to create its own device class.
Something as crazy as running Filestore with a tmpfs osd store and
might want to isolate them.
I know it's a very limited use case, but still, it could be desired.
See also https://www.spinics.net/lists/ceph-devel/msg41152.html
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
This is shown to corrupt otherwise healthy rocksdb databases. Rename to
make it clear that it is generally not safe to run and shoud only be used
as a last resort.
Signed-off-by: Sage Weil <sage@redhat.com>
This makes it easier to re-run tests against a suite branch without
requiring a full ceph-ci build and repo.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* refs/pull/23069/head:
tests/libcephfs: add simple reclaim test
mds: check auth name before reclaiming session
mds: reclaim session before allowing mds to become active
mds: allow client to specify its session timeout
mds: initial code for client states reclaim
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Apparently 15m is not long enough for some workunits like fsstress.
Fixes: http://tracker.ceph.com/issues/36365
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
It is now commented out like it was before,
but I've added a comment what happened during this test with the QA
system. The problem was that even with only a increase of 1 PG the QA
cluster went into a cluster warning state and did not recover in time.
The QA coverage timeout is 2 minutes.
I could not reproduce this behavior with a local cluster, but I've
added a loop to wait until pgp and pg number are equal and the cluster
is in a healthy state again. This can take locally about 5 seconds.
The internal loop has a timeout of 3 minutes.
Fixes: https://tracker.ceph.com/issues/36362
Signed-off-by: Stephan Müller <smueller@suse.com>
The dashboard backend can now unset all set compression arguments if the
compression mode is switched to 'unset'. In the case of 'unset' Ceph
itself will only delete the 'compression_mode' argument, not all other
set arguments. The other arguments that should be removed, too, are
added to the update arguments in order to delete all set arguments.
Fixes: https://tracker.ceph.com/issues/36355
Signed-off-by: Stephan Müller <smueller@suse.com>
Refactor '_get_mon_allow_pool_delete_config' method to be a little bit
more general. The method can now be used to get the value of every
config option known to the cluster.
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
expect_fail incorrectly unset '-e' option and if a consequent test
failed it did not abort the execution. And two typos in the namespace
tests were not detected due to this.
Signed-off-by: Mykola Golub <mgolub@suse.com>
Otherwise a bug preventing an asok operation from completing will cause the
entire job to fail.
Fixes: http://tracker.ceph.com/issues/36335
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
when enabling a module attempt to determine if it is an always on
module, and if it is, then return without waiting on the active manager
daemon to restart---which it won't if it is an always on module.
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
* refs/pull/21566/head:
test: add test for mds drop cache command
mds: command to trim mds cache and client caps
mds: implement journal flush as asynchronous context execution
mds: cleanup some asok commands
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
If there is a bug preventing rm from completing, the workunit will get stuck.
Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Otherwise QA sits forever waiting for the kclient to umount when there is a
problem.
Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Otherwise the command will hang if the mount is broken.
Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
in b38b8e980c, we changed the upper
limit of size of `config key` 's value to 64k, so we need to update
the test accordingly.
Fixes: http://tracker.ceph.com/issues/36260
Signed-off-by: Kefu Chai <kchai@redhat.com>