in b38b8e980c, we changed the upper
limit of size of `config key` 's value to 64k, so we need to update
the test accordingly.
Fixes: http://tracker.ceph.com/issues/36260
Signed-off-by: Kefu Chai <kchai@redhat.com>
Sporadically the rbd-mirror fsx stress test would fail due to very
slow sync times due to overloaded clusters. Attempt to wait for all
images to be replicated before proceeding with the comparison.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* refs/pull/23187/head:
test: make rank argument mandatory when running journal_tool
cephfs-journal-tool: make "--rank" argument mandatory
cephfs-journal-tool: pass local arg vector for Journal actions
cephfs-journal-tool: dump to per rank output file wherever necessary
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/23530/head:
qa/vstart_runner: fix daemons list
PendingReleaseNotes: note multifs support in libcephfs
test/cephfs: add pybind test for mount_root
pybind/cephfs: enable passing filesystem name to mount
libcephfs: add ceph_select_filesystem
common: add doc strings to client_mds_namespace
client: allow passing fs name to mount()
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Conflicts:
PendingReleaseNotes
We can't (easily) build updated hammer packages, but all this sh script does
it run this one test binary with --gtest_filter arguments, so just do
it directly and skip the test explicitly here. (Newer version of the .sh
understand the environemnt variable but the hammer version does not.)
Fixes: http://tracker.ceph.com/issues/36104
Signed-off-by: Sage Weil <sage@redhat.com>
Two instances of fsstress clobber each other. Just build it in the local sandbox.
Fixes: http://tracker.ceph.com/issues/24177
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Specifically fixes the recurringly occurring `test_osd.py` error on the
`test_scrub` method. But this change should also prevent other issues of
the same kind. Issues of "same kind" are issues which occurr due to
tests which do not immediately result in a clean cluster status and
aren't manually programmed to wait for it.
Fixes: http://tracker.ceph.com/issues/36107
Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
* refs/pull/23985/head:
ceph-objectstore-tool: add back pool dne check
qa/suites/rados/singleton/reg11184: remove old test
ceph-objectstore-tool: import pg at original epoch
osd: handle null pg slot on startup
ceph-objectstore-tool: drop support for ancient export files
osd: avoid dropping osd_lock when pg osdmaps are not laggy
qa/standalone/osd/pg-merge.sh: add merge vs pg import test
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Also, fix a bunch of quirky journal_tool invocations that pass
"--rank" argument as the command argument rather than passing it
as function argument.
Fixes: https://tracker.ceph.com/issues/24780
Signed-off-by: Venky Shankar <vshankar@redhat.com>
This bug was about filtering missing and divergent when doing a partial
PG import. We don't support partial PG imports any more, so this can
go away!
Signed-off-by: Sage Weil <sage@redhat.com>
- In the jewel era, we fast-forwarded the PG to the OSD's latest epoch
and cleared past_intervals.
- In mimic, as of 2347ecb961, we brought the
PG up to date while updating past_intervals. (At the same time we removed
the OSD's parallel past_intervals regeneration.)
The problem is that the tool then has to reimplement the past_intervals
update logic, and *also* has to cope with splits and merges. Splits are
somewhat easier (until now we enable partial import of a PG into a split
child), but merges are not so easy.
This patch changes it so we import the PG and leave the pg_epoch matching
the import file. The OSD is then responsible for bringing it up to date
with the latest map, and dealing with any intervening splits or merges.
We also adjust the safety check to ensure that we don't collide with
any existing PG, either a child we eventually split into, or a parent
we eventually merge into.
Fixes: http://tracker.ceph.com/issues/35955
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/24143/head:
qa/workunits/cephtool/test_kvstore_tool.sh: run test in ., not /tmp
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Updated integration tests to check data from new python code
Fixes: https://tracker.ceph.com/issues/24573
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
5s -> 5m to gives us more leeway for when the mons are thrashing.
Also, *only* set this timeout when we expect a timeout. If we don't,
wait forever.
Signed-off-by: Sage Weil <sage@redhat.com>
This was missing a cluster name prefix that
was added at some point, and consequently
calls to iter_daemons_of_role were returning
no daemons.
This was causing e.g. TestVolumeClient.test_data_isolated
to fail when run in vstart_runner.
Signed-off-by: John Spray <john.spray@redhat.com>
- You can't import the source half a PG that's since merged. Sorry! We
could implement this later.
- You can import the target half, but the result will then be incomplete,
and you rely on backfill to clean it up.
- Map gaps don't affect this behavior.
Signed-off-by: Sage Weil <sage@redhat.com>
This module is written by Rick Chen <rick.chen@prophetstor.com> and
provides both a built-in local predictor and a cloud mode that queries
a cloud service (provided by ProphetStor) to predict device failures.
Signed-off-by: Rick Chen <rick.chen@prophetstor.com>
Signed-off-by: Sage Weil <sage@redhat.com>
On a quick look at the source code, I noticed this binary file, which
looks like was committed by mistake.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
* refs/pull/23845/head:
osd/OSDMap: include age in up and in counts for ceph status
mon/OSDMonitor: set new_last_{up,in}_change
osd/OSDMap: store last_up_change and last_in_change
mgr/MgrMap: include mgr age in map printer
mon/MgrMap: track active_changed timestamp
mon: include mon quorum age in status
include/utime: add utimespan_str helper
Reviewed-by: John Spray <john.spray@redhat.com>
Grep from the primary's log, not every osd's log.
For the backfill_remapped task in particular, after the pg_temp change it
just so happens that the primary changes across the pool size change and
thus two different primaries do (some) backfill. Fix that test to pass
the correct primary.
Other tests are unaffected as they do not (happen to) trigger a primary
change and already satisfied the (removed) check that only one OSD does
backfill.
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/20469/head:
osd/PG: remove warn on delete+merge race
osd: base project_pg_history on is_new_interval
osd: make project_pg_history handle concurrent osdmap publish
osd: handle pg delete vs merge race
osd/PG: do not purge strays in premerge state
doc/rados/operations/placement-groups: a few minor corrections
doc/man/8/ceph: drop enumeration of pg states
doc/dev/placement-groups: drop old 'splitting' reference
osd: wait for laggy pgs without osd_lock in handle_osd_map
osd: drain peering wq in start_boot, not _committed_maps
osd: kick split children
osd: no osd_lock for finish_splits
osd/osd_types: remove is_split assert
ceph-objectstore-tool: prevent import of pg that has since merged
qa/suites: test pg merging
qa/tasks/thrashosds: support merging pgs too
mon/OSDMonitor: mon_inject_pg_merge_bounce_probability
doc/rados/operations/placement-groups: update to describe pg_num reductions too
doc/rados/operations: remove reference to lpgs
osd: implement pg merge
osd/PG: implement merge_from
osdc/Objecter: resend ops on pg merge
osd: collect and record pg_num changes by pool
osd: make load_pgs remove message more accurate
osd/osd_types: pg_t: add is_merge_target()
osd/osd_types: pg_t::is_merge -> is_merge_source
osd/osd_types: adding or substracting invalid stats -> invalid stats
osd/PG: clear_ready_to_merge on_shutdown (or final merge source prep)
osd: debug pending_creates_from_osd cleanup, don't use cbegin
ceph-objectstore-tool: debug intervals update
mgr/ClusterState: discard pg updates for pgs >= pg_num
mon/OSDMonitor: fix long line
mon/OSDMonitor: move pool created check into caller
mon/OSDMonitor: adjust pgp_num_target down along with pg_num_target as needed
mon/OSDMonitor: add mon_osd_max_initial_pgs to cap initial pool pgs
osd/OSDMap: set pg[p]_num_target in build_simple*() methods
mon/PGMap: adjust SMALLER_PGP_NUM warning to use *_target values
mon/OSDMonitor: set CREATING flag for force-create-pg
mon/OSDMonitor: start sending new-style pg_create2 messages
mon/OSDMonitor: set last_force_resend_prenautilus for pg_num_pending changes
osd: ignore pg creates when pool FLAG_CREATING is not set
mgr: do not adjust pg_num until FLAG_CREATING removed from pool
mon/OSDMonitor: add FLAG_CREATING on upgrade if pools still creating
mon/OSDMonitor: prevent FLAG_CREATING from getting set pre-nautilus
mon/OSDMonitor: disallow pg_num changes while CREATING flag is set
mon/OSDMonitor: set POOL_CREATING flag until initial pool pgs are created
osd/osd_types: add pg_pool_t FLAG_POOL_CREATING
osd/osd_types: introduce last_force_resend_prenautilus
osd/PGLog: merge_from helper
osd: no cache agent or snap trimming during premerge
osd: notify mon when pending PGs are ready to merge
mgr: add simple controller to adjust pg[p]_num_actual
mon/OSDMonitor: MOSDPGReadyToMerge to complete a pg_num change
mon/OSDMonitor: allow pg_num to adjusted up or down via pg[p]_num_target
osd/osd_types: make pg merge an interval boundary
osd/osd_types: add pg_t::is_merge() method
osd/osd_types: add pg_num_pending to pg_pool_t
osd: allow multiple threads to block on wait_min_pg_epoch
osd: restructure advance_pg() call mechanism
mon/PGMap: prune merged pgs
mon/PGMap: track pgs by state for each pool
osd/SnapMapper: allow split_bits to decrease (merge)
os/bluestore: fix osr_drain before merge
os/bluestore: allow reuse of osr from existing collection
os/filestore: (re)implement merge
os/filestore: add _merge_collections post-check
os: implement merge_collection
os/ObjectStore: add merge_collection operation to Transaction
We currently import a portion of the PG if it has split. Merge is more
complicated, though, mainly because COT is operating in a mode where it
fast-forwards the PG to the latest OSDMap epoch, which means it has to
implement any transformations to the PG (split/merge) independently.
Avoid doing this for merge.
Signed-off-by: Sage Weil <sage@redhat.com>
Commit 0d8887652d53 ("qa/tasks/cram: use suite_repo repository for all
cram jobs") removed hardcoded git.ceph.com links, but as it turned out
it is still used for nightlies. There is no good way to accommodate
the different URL schemes, so let's get rid of URLs altogether.
Fixes: https://tracker.ceph.com/issues/27211
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
It's no longer necessary to pass `-k testing` to teuthology-suite. We're also
now regularly testing RHEL 7.5 kernel in upstream testing.
This work is prep for eventually integrating kclient into fs.
Fixes: http://tracker.ceph.com/issues/26995
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
awk uses some tests that the native FreeBSD awk does not support:
like: BEGIN{print 0 < 90}
And TESTDIR is not set when calling ceph-helpers from smoke.sh
So fix with keeping the archive in /tmp
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
Also:
- Do not print **offset** until specified
- Count missing objects correctly (used to be primary's local missing)
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
The task uses netem to emulate wide area network delay.
Provides three different configurable options.
1. standard delay: Constant delay with +/- 5ms jitter with normal distribution as default.
2. variable delay: To provide a delay between two given min-max range in milliseconds.
3. packet drop: Toggles packet drop and recovery in regular interval.
Useful in simulating network delays between two clusters while testing
rgw multisite and rbd mirroring configurations.
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
mgr/dashboard: Add support for managing individual OSD settings in the backend
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
Currently git.ceph.com is hardcoded for all cram jobs. Testing
modifications is a pain: one needs to push to either ceph/ceph.git or
ceph/ceph-ci.git (depending on where the ceph branch is at, triggering
unnecessary builds in the latter case) and wait for the mirror to sync.
Runs scheduled against branches in developer's forks fail.
Move away from git.ceph.com to allow mixing branches and repositories,
similar to workunits.
Fixes: https://tracker.ceph.com/issues/27211
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
mgr/dashboard: Add REST API for role management
Reviewed-by: Ricardo Dias <rdias@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
Add options to mark OSDs in/out/down/reweight/lost/remove/destroy/create
Fixes: http://tracker.ceph.com/issues/24270
Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
* refs/pull/23540/head:
include/ceph_fs: rename old auid field
PendingReleaseNotes: note about auid support removal
radosgw-admin: remove -a --auth-uid arg
rgw: remove auid member from RGWUserInfo
auth: remove auid member from EntityAuth
osd: remove auid session member
mon: remove auid session member
doc/dev/cephx_protocol: drop auid reference
auth: remove auid args from handle_request and verify_authorizer
mon/OSDMonitor: remove 'osd pool {get,set} <name> auid ...'
mon/OSDMonitor: remove auid arg for 'osd lspools' and deprecate
osd/OSDCap: remove auid from grammar
osd/OSDCap: remove auid from is_capable() etc args
auth: clean up cap parse error messages
mon/AuthMonitor: raise health warning on invalid caps
mon/AuthMonitor: drop ancient auth inc encoding compat
messages/MPoolOp: drop auid member
osdc/Objecter: drop change_pool_auid
pybind/rados: drop auid arg to pool_create
pybind/rados: drop change_auid
rados: drop mkpool, rmpool commands
rados: remove 'chown' command
librados: deprecate calls that take auid
librados: mark all auid calls deprecated
mon/OSDMonitor: drop variable pool auid for prepare_new_pool
mon/OSDMonitor: remove pool auid change support
osdc/Objecter: do not pass auid to create_pool
ceph-authtool: remove auid options
qa/workunits/cephtool: remove auid tests
Reviewed-by: Gregory Farnum <gfarnum@redhat.com>
So if there are a lot fo missing objects on primary, we can
make use of auth_log_shard to restore client I/O quickly.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
* refs/pull/23439/head:
qa: whitelist cap revoke warning
doc: document cap revoke non-responders client eviction
test: validate client eviction for cap revoke non-responders
mds: add counter for tracking cap non-responding clients
mds: evict clients that do not respond to cap revoke by MDS
mds: pass timeout argument for fetching late clients
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Zheng Yan <zyan@redhat.com>
* check if geom_gate can be loaded before doing the actual tests
Otherwise continuing does not make sense.
Major reason for this problem is due to mismatch between
kernel and module versions.
* After FreeBSD kernevel 1200078 ggate resizing is possible
So set the flag that resizing can be tested
* Only sudo commands that really need sudo
rbd-ggate list is available in regular user mode
* be a bit more verbose during testing and list the test purpose
* list-mapped is an option in rbd-nbd, not (yet) in rbd-ggate
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
callers of get_python_path were not passing in a $1 parameter, so
ceph_lib was an empty string resulting in an invalid path to the built
cython modules. assume this is called from the `lib` parent directory.
pass path to the manager modules when starting ceph-mgr.
Signed-off-by: Noah Watkins <nwatkins@redhat.com>
* refs/pull/23240/head:
qa/suites/rados, qa/workunits/rados: Add suite/workunit for ceph-crash
add ceph-crash service
common/options: enable mgr 'crash' module by default
global/signal_handler: add 'done' file to signal crashdump is ready
Reviewed-by: Sage Weil <sage@redhat.com>
mgr/dashboard: Add backend support for changing dashboard configuration settings via the REST API
Reviewed-by: Ricardo Marques <rimarques@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
Generally the slow warnings we get are just over the threshold. These warnings
are related to deploying multiple Ceph daemons side-by-side. Let's see how we
do with two minutes.
Ignoring the warnings entirely is unsatisfactory as they serve as a useful
canary in the coal mine when you see warnings for ops > some unreasonably large
amount of time.
Fixes: http://tracker.ceph.com/issues/26900
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Enables to change (set/unset) values of settings of the dashboard using
the REST API.
Fixes: https://tracker.ceph.com/issues/24273
Signed-off-by: Patrick Nawracay <pnawracay@suse.com>
* refs/pull/23471/head:
mon/PGMap: fix spacing around pretty-printed SI units
include/types: render SI units adjacent to number
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: João Eduardo Luís <joao@suse.de>
the default set of packages to install is in
$suite/qa/packages/packages.yaml . see get_package_list() in
teuthology/teuthology/task/install/__init__.py for how we prepare a
package list for install task.
for running python3 tests in
fs/basic_functional/tasks/volume-client, we need to install
python3-cephfs. please note that,
_package_override() in teuthology/teutholoy/task/install/rpm.py will
take care of the different naming on centos/rhel, where the python3
packages are named python34-*.
Signed-off-by: Kefu Chai <kchai@redhat.com>
This reverts commit c1efd59f61
task.install.rpm installs packages listed in
$suites/qa/packages/packages.yaml, the packge list applies to the
upgrade tests also. but we don't have python3 bindings packages in jewel
-- they were introduced in kraken.
Signed-off-by: Kefu Chai <kchai@redhat.com>
an ugly workaround for a python dependency conflict that's broken the
rgw/tempest suite. allows us to preserve the pinned versions of
keystone/tempest without having to maintain a fork of the keystone
repository
Fixes: http://tracker.ceph.com/issues/23659
Signed-off-by: Casey Bodley <cbodley@redhat.com>
radosgw now uses 512 frontend threads by default, and valgrind won't
start with its default --max-threads=500
Fixes: http://tracker.ceph.com/issues/25214
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Drop unused suites, which ATM means all of them except upgrade/luminous-x
which recently got a cleanup in https://github.com/ceph/ceph/pull/23162
Signed-off-by: Nathan Cutler <ncutler@suse.com>
'policy show' returns a json-encoded representation of
RGWAccessControlPolicy, while key.get_xml_acl() returns
RGWAccessControlPolicy_S3 encoded as xml. so even with '&format=xml',
the strings won't match
Signed-off-by: Casey Bodley <cbodley@redhat.com>
result.json() throws a 'JSONDecodeError: Expecting value: line 1 column 1'
for requests that return no body, such as 'user rm' 'key rm' 'subuser
rm', 'bucket unlink', etc
Signed-off-by: Casey Bodley <cbodley@redhat.com>