For backwards compatibility and upgrade reasons, the librados2
API needs to be preserved and it needs to continue to be compatible
with dependent libraries like librbd1.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The test extracts the mon addresses from the monmap, but with the
recent v2 format change it extracted an invalid address.
Fixes: http://tracker.ceph.com/issues/38385
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* refs/pull/26485/head:
qa/suites/upgrade/luminous-x: force clone v1 format for final rbd python test
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
mgr/dashboard: Add support for managing RBD QoS
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ricardo Dias <rdias@suse.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
If we use the defaults, the MDS/client will recall/release everything quickly.
We want it to take time to see things like the timeout get hit.
Fixes: http://tracker.ceph.com/issues/38348
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The markdown test is based on marking down a specific number of times, but
the duplicate commands from the CLI may not get absorbed/batched by the
mon, breaking the test. Override the default qa/tasks/workunit.py
behavior of sending dups.
Fixes: http://tracker.ceph.com/issues/38359
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/24805/head:
qa/suite: add dedup test
src/tools: fix compile error (master version issue)
src/tools: add stats (fixed objects,total objects)
src/tools: make room for cdc
src/tools: make enhacned stats and interface class
src/tools: set timelimit and add signal handler to check progress
src/tools: use the slice thing and make parallel (chunk_scrub)
src/test: add max-thread test in test_dedup_tool.sh
src/tools: use the slice thing and make parallel
src/test: add chunk-scrub test in test_dedup_tool.sh
src/tools: add chunk-scrub op in dedup tool
src/cls/cas: add has_chunk op
src/test: add test_dedup_tool.sh
src/tools: initial works for dedup tool
Reviewed-by: Sage Weil <sage@redhat.com>
* refs/pull/26455/head:
qa/suites/upgrade/mimic-x/stress-split: drop pglog_hardlimit test
qa/suites/upgrade/mimic-x/stress-split: update for msgr2
qa/suites/upgrade/mimic-x/parallel: update for msgr v2
Reviewed-by: Neha Ojha <nojha@redhat.com>
on all distributions
OpenSUSE does not automatically add the | back when setting
the corepattern. I tested this on openSUSE Leap 15.0.
Fixes: http://tracker.ceph.com/issues/38325
Signed-off-by: David Zafman <dzafman@redhat.com>
This reverts commit 5ba6286834.
Signed-off-by: David Zafman <dzafman@redhat.com>
Conflicts:
qa/run-standalone.sh (reseting core_pattern moved to function)
mgr/dashboard: Add UI to configure the telemetry mgr plugin
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Patrick Nawracay <pnawracay@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Otherwise it's annoying because the class list changes between luminous and nautilus,
and we don't want to futz around with changing this setting during the upgrade.
The problematic classes are 'cas' (added) and 'sdk' (not enabled by default but
included by the cls/ workunit.
Signed-off-by: Sage Weil <sage@redhat.com>
The luminous version is (1) not what we want and (2) will fail because
ceph_test_rados_api_tier no longer exists in master.
Signed-off-by: Sage Weil <sage@redhat.com>
This caused qa/standalone/misc/test-ceph-helpers.sh to fail
"MGR_MODULE_DEPENDENCY 8 mgr modules have failed dependencies"
Fixes: http://tracker.ceph.com/issues/38262
Signed-off-by: David Zafman <dzafman@redhat.com>
allow a user of the orchestrator interface to express that the inventory
query should not read from any cached inventory state.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
* refs/pull/26336/head:
qa/tasks/keystone.py: no need for notcmalloc in example
qa/suites/rgw/tempest/tasks/rgw_tempest: no need for notcmalloc
Reviewed-by: Alfredo Deza <adeza@redhat.com>
Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
* refs/pull/25977/head:
qa/suites: exclude new packages when installing old versions
rpm: add dependency on python-kubernetes module to ceph-mgr-rook package
rpm,deb: add rbd_support module to ceph-mgr
packaging: split ceph-mgr diskprediction and rook plugins into own packages
Reviewed-by: Tim Serong <tserong@suse.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
* refs/pull/26059/head:
mon/MonClient: fix keepalive with v2 auth
msg/async/ProtocolV2: reject peer_addrs of -
msg/async/ProtocolV2: clean up feature management
mon/MonClient: set up rotating_secrets, etc before msgr ready
msg/async: let client specify preferred order of modes
msg/async/ProtocolV2: include entity_name, features in reconnect
msg/async/ProtocolV2: fix write_lock usage around AckFrame
qa/suites/rados/verify/validator/valgrind: debug refs = 5
qa/standalone/ceph-helpers: fix health_ok test
auth/AuthRegistry: only complain about disabling cephx if cephx was enabled
auth/AuthRegistry: fix locking for get_supported_methods()
auth: remove AUTH_UNKNOWN weirdness, hardcoded defaults.
msg/async/ProtocolV2: remove unused get_auth_allowed_methods
osd: set up messener auth_* before setting dispatcher (and going 'ready')
mon/AuthMonitor: request max_global_id increase from peon in tick
mon: prime MgrClient only after messengers are initialized
qa/suites/rados/workloads/rados_api_tests.yaml: debug mgrc = 20 on mon
auth: document Auth{Client,Server} interfaces
auth: future-proof AUTH_MODE_* a bit in case we need to change the encoding byte
mon/MonClient: request monmap on open instead of ping
mgr/PyModuleRegistry: add details for MGR_MODULE_{DEPENDENCY,ERROR}
crimson: fix build
mon/MonClient: finsih authenticate() only after we get monmap; fix 'tell mgr'
mon: add auth_lock to protect auth_meta manipulation
ceph-mon: set up auth before binding
mon: defer initial connection auth attempts until initial quorum is formed
mon/MonClient: make MonClientPinger an AuthCleint
ceph_test_msgr: use DummyAuth
auth/DummyAuth: dummy auth server and client for test code
mon/Monitor: fix leak of auth_handler if we error out
doc/dev/cephx: re-wordwrap
doc/dev/cephx: document nautilus change to cephx
vstart.sh: fix --msgr2 option
msg/async/ProtocolV2: use shared_ptr to manage auth_meta
auth/Auth{Client,Server}: pass auth_meta in explicitly
mon/MonClient: behave if authorizer can't be built (yet)
osd: set_auth_server on client_messenger
common/ceph_context: get_moduel_type() for seastar cct
auth: make connection_secret a std::string
auth,msg/async/ProtocolV2: negotiate connection modes
auth/AuthRegistry: refactor handling of auth_*_requred options
osd,mgr,mds: remove unused authorize registries
switch monc, daemons to use new msgr2 auth frame exchange
doc/dev/msgr2: update docs to match implementation for auth frames
auth/AuthClientHandler: add build_initial_request hook
msg/Messenger: attach auth_client and/or auth_server to each Messenger
auth: introduce AuthClient and AuthServer handlers
auth: codify AUTH_MODE_AUTHORIZER
msg/Connection: track peer_id (id portion of entity_name_t) for msgr2
auth/AuthAuthorizeHandler: add get_supported_methods()
auth/AuthAuthorizeHandler: fix args for verify_authorizer()
auth: constify bufferlist arg to AuthAuthorizer::add_challenge()
auth/cephx: share all tickets and connection_secret in initial reply
msg/async,auth: add AuthConnectionMeta to Protocol
auth/AuthClientHandler: pass in session_key, connection_secret pointers
auth/AuthServiceHandler: take session_key and connection_secret as args
auth/cephx: pass more specific type into build_session_auth_info
mon/Session: separate session creation, peer ident, and registration
mon/AuthMonitor: bump max_global_id from on_active() and tick()
mon/AuthMonitor: be more careful with max_global_id
mon: only all ms_handle_authentication() if auth method says we're done
mon/AuthMonitor: fix "finished with auth" condition check
auth: clean up AuthServiceHandler::handle_request() args
auth: clean up AuthServiceHandler::start_session()
mon/AuthMonitor: drop unused op arg to assign_global_id()
msg/async: separate TAG_AUTH_REQUEST_MORE and TAG_AUTH_REPLY_MORE
msg/async: consolidate authorizer checks
msg/async: move get_auth_allowed into ProtocolV2.cc
mon/MonClient: trivial cleanup
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Stopping the osd daemon won't reliably get you HEALTH_WARN or ERR; you have
to make sure it is also marked down.
Signed-off-by: Sage Weil <sage@redhat.com>
Seeing some hangs when the mon is forwarding mgr commands (pg deep-scrub)
to the mgr. This is a buggy test (it should send it to the mgr directly)
but it is helpful to verify the mon forwarding behavior works.
Signed-off-by: Sage Weil <sage@redhat.com>
mgr/orchestrator: make use of @CLICommand
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
Reviewed-by: Noah Watkins <noahwatkins@gmail.com>
krbd was being tested with filestore, up until recently when the
default for osd_objectstore was changed to bluestore. This broke
rbd_simple_big.yaml because bluestore_block_size defaults to 10G.
Pick up the sepia setting of 90G from bluestore-bitmap.yaml.
Run fsx subsuite with both filestore and bluestore.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
failure_reason: '"2019-02-03 22:52:41.561332 osd.10 (osd.10) 790 : cluster [WRN] slow
request 30.154662 seconds old, received at 2019-02-03 22:52:11.406639: osd_op(client.56148.0:39092
8.9 8.70387d99 (undecoded) ondisk+retry+write+known_if_redirected e1372) currently
waiting for peered" in cluster log'
We're restarting OSDs, and may see slow requests in the process.
Signed-off-by: Sage Weil <sage@redhat.com>
Discard no longer guarantees zeroing, use BLKZEROOUT and "fallocate -z"
instead (blkdiscard(8) in xenial doesn't support -z).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
mgr/orchestrator: Unify `osd create` and `osd add`
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
Also:
* Added some more tests
* Better validation of drive Groups
* Simplified `TestWriteCompletion`
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
* refs/pull/26038/head:
mds: simplify recall warnings
mds: add extra details for cache drop output
qa: test mds_max_caps_per_client conf
mds: limit maximum number of caps held by session
mds: adapt drop cache for incremental recall
mds: recall caps incrementally
mds: adapt drop cache for incremental trim
mds: add throttle for trimming MDCache
mds: cleanup SessionMap init
mds: cleanup Session init
Reviewed-by: Zheng Yan <zyan@redhat.com>
Instead of a timeout and complicated decisions about whether the client is
releasing caps in an expeditious fashion, just use a DecayCounter that tracks
the number of caps we've recalled. This counter is decremented whenever the
client releases caps. If the counter passes a threshold, then we raise the
warning.
Similar reworking is done for the steady-state recall of client caps. Another
release DecayCounter is added so we can tell when the client is not releasing
any more caps.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
-filter out mons from other clusters
-fix parsing of mon name from role
Fixes: http://tracker.ceph.com/issues/38115
Signed-off-by: Casey Bodley <cbodley@redhat.com>
* When the creation of the cluster is delegated to vstart_runner.py
(--create or --create-target-only) the amount of MGRs required
is calculated by the script so there is no more skipped tests
due to insufficient amount of MGRs.
* Additionally, this issue is not reproducible anymore:
Fixes: https://tracker.ceph.com/issues/37964
* Fixed typo: TEUTHOLOFY_PY_REQS
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
As with trimming, use DecayCounters to throttle the number of caps we recall,
both globally and per-session.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The pool and namespace can now be specified as in a
<pool-name>[/<namespace-name>] format as positional
arguments.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Add trigger_deep_scrub osd command for testing
Publish stats when trigger_scrub/trigger_deep_scrub is used for testing
Add optional argument to trigger_scrub/trigger_deep_scrub
for amount of extra time to change last scrub stamps
Signed-off-by: David Zafman <dzafman@redhat.com>
`cache drop` is a long running command that will block the asok interface
(while the tell version does not). Attempting to abort the command with ^C or
equivalents will simply cause the `ceph` command to exit but won't stop the
asok command handler from waiting for the cache drop operation to complete.
Instead, just allow the tell version.
Fixes: http://tracker.ceph.com/issues/38020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/26012/head:
qa: add test that down fs does not ERR
mon/MDSMonitor: skip offline ERR for down fs
Reviewed-by: Douglas Fuller <dfuller@redhat.com>
* refs/pull/25973/head:
qa: use simpler fs fail to bring fs down
MDSMonitor: add fs fail command
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Douglas Fuller <dfuller@redhat.com>
osd: Deny reservation if expected backfill size would put us over bac…
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
* refs/pull/25900/head:
qa/tasks/ceph.py: bracket addrvecs in mon_host etc
vstart.sh: bracket addrvec on mon_host for msgr2-only mode
unittest_addrs: entity_addr_t: strengthen tests slightly
common/ceph_argparse: make parse_ip_port_vec handle list of addrs or addrvecs
common/ceph_argparse: parse_ip_port_vec returns addrvecs, not addrds
msg/msg_types: entity_addrvec_t: require brackets for size >1
msg/msg_types: entity_addrvec_t: allow brackets when parsing addrvec to match output
msg/msg_types: entity_addrvec_t: allow only ',' as an addrvec separator
msg/msg_types: entity_addr_t: we should not parse an addrvec
msg/msg_types: entity_addr_t: fix empty string parse cases
msg/msg_types: entity_addr_t: is_ipv6() and is_ipv4()
Reviewed-by: Ricardo Dias <rdias@suse.com>
* refs/pull/25849/head:
qa/suites/rados/upgrade: one mon per node, and enable-msgr2 at end
qa/rados/thrash-old-clients: avoid msgr2
mon: make bootstrap rank check more robust
mon: clean up probe debug output a bit
msg/async: use v1 for v1 <-> [v2,v1] peers
msg/async/AsyncMessenger: drop single-use _send_to
mon/HealthMonitor: raise MON_MSGR2_NOT_ENABLED if mons not bound to msgr2
doc/rados/operations/health-checks: document MON_* health warnings
mon/MonMapMonitor: add 'mon enable-msgr2' command
mon: respawn if rank addr changes
mon/MonMap: calc_addr_mons() after setting rank addrvec
Reviewed-by: Ricardo Dias <rdias@suse.com>
There may be a situation where data digest in object info is
inconsistent with that computed from object data, then deep-scrub
will fail even though all three repicas have the same object data.
Fixes: https://tracker.ceph.com/issues/37935
Signed-off-by: Li Yichao <liyichao.good@gmail.com>
With automatic balancing on, and if mode is set to upmap,
balancer will fail silently if min_compat_client is lower than
luminous.
You can't figure out that unless you take a closer look at the
mgr log, which is super annoying..
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
If the ms_bind_msgr2 option is enabled, and all mons are nautilus,
raise a health alert if any mons aren't bound to msgr2 addresses.
Whitelist tests that mon_bind_addrvec=false or mon_bind_msgr2=false.
Signed-off-by: Sage Weil <sage@redhat.com>