Commit Graph

92 Commits

Author SHA1 Message Date
Sage Weil
a4a3a3c0a0 qa/suites/rados/singleton/all/thrash-eio: whitelist 'slow request'
Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-24 08:47:43 -06:00
Sridhar Seshasayee
e527067666 qa: Whitelist 'slow request' within a bunch of tests
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2020-02-24 19:59:56 +05:30
Sage Weil
3e85a09ad2 Merge PR #33328 into master
* refs/pull/33328/head:
	osd/OSD: Log slow ops/types to cluster logs

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-02-23 15:12:48 -06:00
Sage Weil
f8d0e3d73a qa/suites/rados: disable device scraping
We need no pools to avoid breaking some tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2020-02-19 15:31:26 -06:00
Sridhar Seshasayee
d20f57000b osd/OSD: Log slow ops/types to cluster logs
In addition to logging slow ops in mon and osd specific log files,
re-introduce logging the same information along with slow op type
details to cluster logs as well. The objective is to make debugging
slow ops easier.

Modify the log whitelisting string to "slow request" within qa suites in
order to make the search for the new warning log message within the
cluster log successful. This should not cause any issue as it's a
substring of the earlier string.

Fixes: https://tracker.ceph.com/issues/43975
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2020-02-19 14:31:48 +05:30
Sage Weil
f4156aea10 qa/suites/rados/singleton/all/lost-unfound*: whitelist SLOW_OPS
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-29 07:11:15 -06:00
Sage Weil
14d1490f58 Merge PR #32898 into master
* refs/pull/32898/head:
	qa/suites/rados/singleton/all/recovery-preemption: fix pg log length

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-01-27 10:37:16 -06:00
Sage Weil
695d0be225 qa/suites/rados/singleton/all/recovery-preemption: fix pg log length
This was broken by the variable PG log lengths in
9c69c2f7cc585b5e13e4d1b0432016d38135a3de.

Disable the new option to get (roughly) the old behavior, or at least the
short logs that we want to trigger some backfill.

Fixes: https://tracker.ceph.com/issues/43810
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-27 07:42:50 -06:00
Kefu Chai
fdc1e88b87 qa/workunits/rados/test_crash.sh: do not fail if coredump not found
Fixes: https://tracker.ceph.com/issues/43653
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-01-27 01:20:56 +08:00
Sage Weil
290992e519 qa/workunits/rados/test_crash.sh: suppress core files
The cores will make teuthology fail the job--and we don't want them for
this test, where we are deliberately causing crashes.

Fixes: https://tracker.ceph.com/issues/43653
Signed-off-by: Sage Weil <sage@redhat.com>
2020-01-19 17:12:09 -06:00
Sage Weil
71d74aa8c6 qa: more tries for mon tell when injecting msgr failures
With failure injection the default 2 tries isn't quite enough

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-11 14:16:42 -05:00
Sage Weil
bbc7bb5a22 Merge PR #30217 into master
* refs/pull/30217/head:
	crimson: common/admin_socket kludge so that it builds
	mon/MonClient: fix sending mon command to a specific rank
	src/.gitignore: ignore .tox
	mon/MonClient: interpret numeric mon target name as rank
	mgr,mgr/MgrClient: use fsid to signal mon-mgr vs cli MCommands
	qa/workunits/cephtool: fix errpr checks for 'ceph daemon' commands
	common/ceph_context: make 'config unset' idempotent
	qa/tasks/dump_stuck: mon.a, not mon.0
	qa/suites/rados/singleton/all/admin-socket: fix test
	common/config: EPERM setting config option after startup
	qa/workunits/cephtool/test.sh: fix tell output error check
	common/admin_socket: pass Formatter from generic infrastructure
	common/admin_socket: pass ostream to call() for error output
	os/bluestore: fix asok hook return value
	rgw: fix asok return value
	common/ceph_context: return error code from asok commands
	test/pybind/test_rados: fix accidental mon tell test
	mon: print entity_name along with caps to debug log
	PendingReleaseNotes: notes about asok changes
	mgr/MgrClient: empty target string for 'tell' means active mgr
	common/admin_socket: report error code as part of output string
	osd: change trigger_[deep_]scrub tommands to a pg tell command
	osd: remove old command workqueue, threadpool
	osd: drop MMonCommand handling
	osdc/Objecter: resend OSD tell commands on EAGAIN
	osd: route tell commands to asok; migrate commands
	osd: use unique_ptr<Formatter> for asok_command
	common/ceph_context: add generic asok 'injectargs'
	common/admin_socket: allow dup prefixes
	common/admin_socket: refactor with sync and async execute_command variants
	common/admin_socket: pass input bufferlist
	osd: transition to call_async() for asok
	common/admin_socket: support alternative call_async()
	mon/MonClient: send tell commands out of band via MCommand
	mon: accept tell commands via MCommand and send them to asok handler
	common/admin_socket: return int from hook call()
	mgr/DaemonServer: route MCommand (for octopus+) to asok commands
	do not use 'ceph tell mgr'
	pybind/ceph_argparse: disambiguate mgr tell and CLI commands
	ceph: make 'ceph tell mgr.*' send to the active mgr
	ceph: send 'ceph tell mgr.X' to the right mgr
	librados: add rados_mgr_command_target
	mgr/MgrClient: add start_command variant that takes a target
	common/admin_socket: drop unregister_command(); use per-hook variant
	common/admin_socket: drop explicit prefix arg to register_command
	common/admin_socket: simplify command routing
	common/admin_socket: add ability to process MCommand via asok queue
	common/admin_socket: pass cmdvec to execute_command
	common/admin_socket: use pipe for general wakeup
	include/compat: add flags arg to pipe_cloexec
	common/admin_socket: drop unused args

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-10-06 09:08:28 -05:00
David Zafman
fdf93add0b
Merge pull request #30714 from dzafman/wip-41743
test: Ignore OSD_SLOW_PING_TIME* if injecting socket failures

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-10-04 18:28:48 -07:00
Sage Weil
7b644f599b qa/suites/rados/singleton/all/admin-socket: fix test
We can't set the filestore setting because filestore isn't active and so
the option isn't observed, so it isn't changeable.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-10-04 09:07:03 -05:00
David Zafman
ded58ef91d test: Ignore OSD_SLOW_PING_TIME* if injecting socket failures
Fixes: https://tracker.ceph.com/issues/41743

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-10-03 09:09:10 -07:00
Sage Weil
764dc0d2cd qa/suites/rados/singleton/all/ec-lost-unfound: no rbd pool
This can interfere with the test.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-09-26 09:46:10 -05:00
Sridhar Seshasayee
1034782d4c mon/OSDMonitor: Add standalone test for mon_memory_target
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2019-08-06 20:22:16 +05:30
Sage Weil
9257175f08 qa/suites/rados/singleton/all/test-crash: whitelist RECENT_CRASH
Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-22 17:14:01 -05:00
myoungwon oh
89f41ad9ba qa/suite: add dedup test
Signed-off-by: Myoungwon Oh <omwmw@sk.com>
2019-02-09 12:45:10 +09:00
David Zafman
99ddd3666b
Merge pull request #22797 from dzafman/wip-19753
osd: Deny reservation if expected backfill size would put us over bac…

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-01-18 07:42:00 -08:00
Sage Weil
c18a5d2e1c qa/tasks/rebuild_mondb: use monmap to properly name the mons
We used to rely on the monmap bootstrap code to magically create a valid
monmap with named mons because our old-style ceph.conf had mon_addr
values in each mon.foo section.  Instead, just feed it a real monmap
from pre-destruction.

In practice, a user can manually generate this monmap, or rename the
mons after the fact with --inject-monmap, or whatever.  Out of scope
for this test, so we just do the simplest thing to make the rebuild test
work.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-11 16:10:14 -06:00
Sage Weil
e069c30cb3 Merge remote-tracking branch 'private/wip-mon-kv-fix' into wip-mimic-4 2019-01-04 14:03:56 -06:00
Sage Weil
d518eb6cac qa/msgr: move msgr factet into generic re-usable dir
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:38 -06:00
Sage Weil
16980bd12f qa/suites/rados: replace mon_seesaw.py task with a small bash script
The teuthology test did not like the change to remove 'mon addr' from
ceph.conf.  The standalone script is easier to test.

Note that it avoids mon names 'a', 'b', 'c' since the MonMap::build_initial
uses those.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
Sage Weil
b8d45b262c qa/suites/rados/singletone/all/pg-autoscaler: whitelist health warnings
Signed-off-by: Sage Weil <sage@redhat.com>
2018-12-19 14:37:01 -06:00
Sage Weil
2cd1ca6625 qa/suites/rados: add simple pg-autoscaler test
Signed-off-by: Sage Weil <sage@redhat.com>
2018-12-18 13:30:54 -06:00
David Zafman
316f039dfd test: Add singleton rados suite test for backfill full
This injects backfill full as opposed to lowering the backfill full ratio

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-18 09:30:44 -08:00
Joao Eduardo Luis
5fff611041 mon/config-key: limit caps allowed to access the store
Henceforth, we'll require explicit `allow` caps for commands, or for the
config-key service. Blanket caps are no longer allowed for the
config-key service, except for 'allow *'.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
2018-10-17 14:42:15 +01:00
Sage Weil
4e5f2bb596 qa/suites/rados/singleton/reg11184: remove old test
This bug was about filtering missing and divergent when doing a partial
PG import.  We don't support partial PG imports any more, so this can
go away!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-20 12:58:00 -05:00
Sage Weil
35820f4b88 mon/AuthMonitor: raise health warning on invalid caps
Raise a health warning if we have invalid (unparsable) caps in the auth
database.  Include a simple test.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-08-31 15:54:58 -05:00
Dan Mick
298a1d92d2 qa/suites/rados, qa/workunits/rados: Add suite/workunit for ceph-crash
Signed-off-by: Dan Mick <dan.mick@redhat.com>
2018-08-13 13:53:26 -07:00
Patrick Donnelly
b39f9d06dc
qa: fix symlinks indirectly pointing at qa to .qa
Building on the previous commit.

Command used:

$ find suites/ -type l -and -not -name .qa -execdir ~/fix.sh {} \;

fix.sh:
    #!/bin/bash

    link="$(readlink "$1")"

    echo $link
    dirlink="$(dirname "$link")"
    baselink="$(basename "$link")"

    while true; do
        echo $dirlink
        if [ "$dirlink" -ef ~/ceph/qa ]; then
            ln -nsf ".qa/$baselink" "$1"
            exit
        else
            baselink="$(basename "$dirlink")/$baselink"
            dirlink="$(dirname "$dirlink")"
            if [ "$dirlink" -ef . ]; then
                break
            fi
        fi
    done

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-26 11:48:38 -07:00
Patrick Donnelly
716db6e2fd
qa: add .qa helper link
This utilizes the recent feature in teuthology [1] to skip hidden files in
suites when building the job matrix.

Idea of this change is to enable referring to the top-level qa directory in a
position-independent way such that copies of a suite to another location do not
break any symlinks.

[1] https://github.com/ceph/teuthology/pull/1185

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-06-26 11:33:48 -07:00
David Zafman
918921ab2f test: Need to escape parens in log-whitelist for grep
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-05-21 09:47:59 -07:00
Yuri Weinstein
9f2c485942 tests/qa: adding rados/.. dirs
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
2018-05-11 14:03:15 -07:00
Kefu Chai
966c76330b qa: reduce "mon client hunt interval max multiple" to 2 for all clients
because with high failure rate, we need to connect to mon more
frequently if the connection fails.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-04-26 12:04:49 +08:00
Sage Weil
35c14a0162 qa/suites/rados/singleton/all/random-eio: whitelist eio error message
"cluster [ERR] 2.1 shard 1: soid 2:8007ad8d:::benchmark_data_smithi115_12935_object2439:head candidate had a read error"

is normal when we're injecting EIO.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-09 07:40:23 -05:00
Gregory Farnum
6d2e4c9b7b
Merge pull request #19973 from liewegas/wip-peering-fast-dispatch
osd: fast dispatch of peering events and pg_map + osd sharded wq refactor

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2018-04-06 11:48:11 -07:00
Sage Weil
b235a3f62a qa/suites/rados/singleton/all/ec-lost-unfound: whitelist SLOW_OPS
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-06 10:38:45 -05:00
Sage Weil
3b3c32f643 qa/suites/rados/singleton/all/recovery_preemption: whitelist SLOW_OPS
Recovery and peering can be slow enough with all the logging enabled to
trigger a slow ops warning.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 16:24:31 -05:00
Sage Weil
29a885c915 qa/suites/rados/singleton/all/recovery_preemption: make test more reliable
A 30 second run did only 7000 ops, which means ~50 log entires per pg...
not enough to trigger backfill.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
c3589df320 qa/suites/rados/singleton/all/mon-seesaw: whitelist PG_AVAILABILITY
The seesaw might delay pg creation by more than 60s.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
c77e19c9f2 qa: test config CLI interface
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-06 14:44:49 -06:00
Nathan Cutler
b69530e647 tests: rados suite: drop rest-api test cases
Fixes: http://tracker.ceph.com/issues/21264
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2018-03-06 06:58:59 +01:00
Sage Weil
c2d28e2750
Merge pull request #18971 from liewegas/wip-pg-scrub-preempt
osd/PG: allow scrub preemption

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-01-18 12:37:48 -06:00
Sage Weil
5ac3bfa34c qa/suites/rados/singleton/all/divergent_priors*: unsquelch osd debug
Signed-off-by: Sage Weil <sage@redhat.com>
2018-01-16 21:52:09 -06:00
David Zafman
c77941f593 qa: Ignore degraded PGs when injecting random eio errors
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-01-14 18:17:23 -08:00
Sage Weil
f33ab7e03a Merge remote-tracking branch 'gh/mimic-dev1' 2017-12-20 15:08:30 -06:00
Kefu Chai
6b3d0f61f9 qa: decrease the msg_inject_socket_failures from 1/500 to 1/1000
Fixes: http://tracker.ceph.com/issues/22093
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-12-15 14:21:43 +08:00
John Spray
91655239fa
Merge pull request #19114 from tchaikov/wip-rm-request-slow
mgr/PGMap: drop REQUEST_{SLOW,STUCK} HEALTH_WARNs

Reviewed-by: John Spray <john.spray@redhat.com>
2017-12-13 11:46:34 +00:00