Commit Graph

40 Commits

Author SHA1 Message Date
David Zafman
23ed63e15f
Merge pull request #22441 from ErwanAliasr1/evelu-makecheck
Improving make check reliability

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: David Zafman <dzafman@redhat.com>
2018-06-28 14:55:12 -04:00
David Zafman
f0964beac5 qa: For teuthology copy logs to teuthology expected location
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-25 18:06:01 -07:00
Erwan Velu
57df91380b qa/standalone/ceph-helpers.sh: Setup ulimit in setup()
If ulimit is set to a 1024 value, ceph-osd will segfault with the
following error :
    filestore(td/smoke/0)  error (24) Too many open files not handled on operation 0x55565d1fd004 (2182.1.0, or op 0, counting from 0)

This patch is about to insure that before setting up ceph daemons in tests, a valid ulimit value is setup.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-25 22:09:14 +02:00
Erwan Velu
7b0d1c8b8a qa/standalone/ceph-helpers.sh: Thinner resolution in get_timeout_delays()
get_timeout_delays() is a generic function to compute delays for a long
period of time without saturating the CPU is busy loops.

It works pretty fine when the delay is short like having the following
series when requesting a 20seconds timeout : "0.1 0.2 0.4 0.8 1.6 3.2 6.4 7.3 ".
Here the maximum between two loops is 7.3 which is perfectly fine.

When the timeout reaches 300sec, the same code produces the following
series : "0.1 0.2 0.4 0.8 1.6 3.2 6.4 12.8 25.6 51.2 102.4 95.3 "
In such example there is delays which are nearly 2 minutes !

That is not efficient as the expected event, between two loops, could
arrive just after this long sleep occurs making a minute+ sleep for
nothing. On a local system that could be ok while on a CI, if all jobs
run like CI the overall is pretty unefficient by generating useless CPU
waits.

This patch is about adding a maximum acceptable delay time between two
loops while keeping the same rampup behavior.

On the same 300 seconds delay example, with MAX_TIMEOUT set to 10, we
now have the following series: "0.1 0.2 0.4 0.8 1.6 3.2 6.4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 7.3"
We can see that the long 12/25/51/102/95 values vanished and being
replaced by a series of 10 seconds. It's up to every test defining the
probability of having a soonish event to complete.

The MAX_TIMEOUT is set to 15seconds.
Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-25 22:09:14 +02:00
Sage Weil
3cd7d5eb22 Merge PR #22343 into master
* refs/pull/22343/head:
	qa/standalone remove ceph-disk from activate_osd helper
	cmake: remove subman.sh tests
	test remove ceph-disk directory
	debian: remove ceph_detect_init python files from base
	qa/standalone remove virtualenv paths for ceph-disk and ceph-detect-init
	debian: remove ceph-disk ceph-detect-init python files
	rpm: remove ceph-disk ceph-detect-init python files
	alpine: remove ceph-disk ceph-detect-init python files
	alpine: remove ceph-osd and parttypeuuid udev rules
	debian: remove ceph-osd and parttypeuuid udev rules
	rpm: remove ceph-osd and parttypeuuid udev rules
	ceph-helpers.sh: remove ceph-disk, set up osds directly
	CMakeLists.txt: add back CEPH_BUILD_VIRTUALENV
	alpine: remove ceph-disk, add ceph-volume in APKBUILD.in
	upstart: remove ceph-disk activation call
	doc/install add anchor for manual osd deployment in freebsd guide
	doc/dev remove ceph-disk from freebsd guide, link to manual reference
	doc/dev/config-key remove ceph-disk references
	doc/dev remove ceph-disk.rst
	doc/dev: change ceph-disk suite examples for ceph-deploy
	doc/man_index: remove ceph-disk, ceph-detect-init refs
	doc/install: remove ceph-disk from freebsd examples
	doc/rados remove ceph-disk from man references
	doc/man remove ceph-disk ref from ceph-volume-systemd
	doc/man: update reference from ceph-disk to ceph-volume
	doc/man: remove ceph-disk, ceph-detect-init from cmake
	doc/man/ceph-volume remove doc reference to ceph-disk
	doc/man: remove ceph-disk, ceph-detect-init
	qa/suites: remove ceph-disk
	qa/run-standalone.sh: remove requirement for ceph-detect-init virtualenv
	qa/workunits: remove ceph-detect-init from rbdmapfile test
	qa/workunits: remove ceph-detect-init from ceph-helpers-root.sh
	qa/workunits: remove ceph-disk
	build: remove ceph-disk from freebsd script
	cmake: remove ceph-disk, ceph-detect-init tox tests
	init-ceph: remove ceph-disk
	cmake: remove top-level entries for ceph-disk, ceph-detect-init
	debian: remove ceph-detect-init references
	debian: remove ceph-disk references
	src: remove ceph-detect-init tool
	rpm: remove ceph-disk, ceph-detect-init from spec file
	test: remove subman script
	script: remove subman script
	udev: remove parttypeuuid rules for ceph-disk
	tool remove ceph-disk from ps-ceph.pl
	upstart: remove ceph-disk conf file
	systemd: remove ceph-disk from CMakeLists
	systemd: remove ceph-disk service
	udev: remove ceph-disk rules
	src: remove ceph-disk tool
2018-06-19 07:07:55 -05:00
David Zafman
f886ebba08 test: Fix some function desciptions
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-18 14:09:14 -07:00
Erwan Velu
2ce480b8fd qa/standalone/ceph-helpers.sh: Fixing comment for wait_for_health()
wait_for_health doesn't check if the cluster is making progress. So
let's adjust the comment accordingly.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-14 11:06:52 +02:00
Erwan Velu
62d2646c30 qa/standalone/ceph-helpers.sh: Defining custom timeout for wait_for_clean()
The wait_for_clean() is using the default timeout aka 300sec = 5mn.

wait_for_clean() is trying to find a clean status within that timeout
_or_ reset its counter if any progress got made in between loops.

In a case where the cluster is sane, the recovery should be made in
shorter than 5mn but it the cluster died, waiting for 5mn for nothing is
unefficient.

This patch is about defining a custom timeout for a wait_for_clean() not
to wait much more that 1m30 (90sec). If no progress is made in that
period, there is very few chance this will read the a valid state
anyhow.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2018-06-14 11:06:52 +02:00
Alfredo Deza
5b3a540045 qa/standalone remove ceph-disk from activate_osd helper
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-06-13 15:16:27 -04:00
Alfredo Deza
aa4f5569c3 qa/standalone remove virtualenv paths for ceph-disk and ceph-detect-init
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-06-13 15:16:27 -04:00
Dan Mick
50f2b72f2f ceph-helpers.sh: remove ceph-disk, set up osds directly
Signed-off-by: Dan Mick <dan.mick@redhat.com>
2018-06-13 15:16:26 -04:00
Nathan Cutler
f03b9028f5 qa/standalone/ceph-helpers.sh: provide argument to dirname
Fixes: http://tracker.ceph.com/issues/23805
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2018-04-20 10:10:15 +02:00
David Zafman
ce9c029858 test: Eliminate use of bc (use awk) in get_timeout_delays()
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-28 10:24:33 -07:00
David Zafman
51b740ad41 test: Fail upon flush_pg_stats timeout
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-11 16:26:11 -07:00
Sage Weil
5ee5bbace1 qa/standalone: drop CEPH_LIB hacks
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-06 14:44:49 -06:00
Kefu Chai
ac56a202fd qa/standalone: extract delete_pool()
some tests, like osd-backfill-stats.sh are using delete_pool(), but
they don't have this function defined. and this function is defined
in standalone tests separately, so would be simpler if we can
consolidate them in ceph-helper.sh.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2018-02-28 15:40:28 +08:00
Patrick Donnelly
46c25abd1c
test/encoding: refactor to avoid escaping shell magic
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-02-07 18:03:05 -08:00
David Zafman
aeba36a660 ceph-helpers.sh: Add flush_pg_stats() to wait_for_clean() to make it reliable
osd-scrub-repair.sh: Fixes for omap keys landing on different OSDs due to flush

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-01-14 18:17:23 -08:00
Sage Weil
f33ab7e03a Merge remote-tracking branch 'gh/mimic-dev1' 2017-12-20 15:08:30 -06:00
Sage Weil
06b7707cee
Merge pull request #19456 from liewegas/wip-22373
qa/standalone/ceph-helpers: pass --verbose to ceph-disk
2017-12-19 11:55:07 -06:00
Kefu Chai
2ceff9eb4e qa/stanalone: pass options using --<option-name>=<value>
not "--<option-name> <value>', otherwise `ceph-authtool` would error
out:

$ CEPH_ARGS='--osd-map-max-advance 1000' bin/ceph-authtool --gen-print-key
bin/ceph-authtool: unexpected '1000'
usage: ceph-authtool keyringfile [OPTIONS]...
....

but using the syntax of `--<option-name>=<value>', it works:

$ CEPH_ARGS='--osd-map-max-advance=1000' bin/ceph-authtool --gen-print-key
AQBAhTNamf5+ABAASkAp/6IGq7LkUTEOMp/fgw==

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-12-15 16:19:15 +08:00
Kefu Chai
4e621762ed qa/standalone/ceph-helpers.sh: silence ceph-disk DEPRECATION_WARNING
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-12-13 19:42:50 +08:00
Sage Weil
86dc162686 qa/standalone/ceph-helpers: pass --verbose to ceph-disk
Signed-off-by: Sage Weil <sage@redhat.com>
2017-12-12 12:56:45 -06:00
Sage Weil
c6529ad93e qa/standalone/ceph-helpers.sh: fix full ratio ordering
Signed-off-by: Sage Weil <sage@redhat.com>
2017-11-29 16:07:12 -06:00
Sage Weil
15b63d6795 qa/standalone/scrub/osd-scrub-repair: no -y to diff
With -y you can't see the entire line when it is long, which is
needed to identify the diff failure in
http://tracker.ceph.com/issues/21618

Instead, let the interactive user specify the option if they want it.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-10-03 14:35:35 -05:00
Kefu Chai
279d2980fa qa/standalone/ceph-helpers.sh: pass btrfs subvolume options the right way
with the latest btrfs-progs, it complains with

$ sudo btrfs subvolume list . -t
btrfs subvolume list: too many arguments

so, we need to pass `-t` right after `list` subcommand.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-15 12:19:50 +08:00
Kefu Chai
0c47aa8217 qa: respect $TEMPDIR
ceph-disk and ceph-detect-init are build in $TEMPDIR if it's defined.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-15 12:19:50 +08:00
Kefu Chai
30b5b4627c Merge pull request #16494 from asomers/bin_bash
misc: Fix bash path in shebangs

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-08-27 10:14:14 +08:00
David Zafman
e24ac51a82 qa: Fix broken test_activate_osd() due to missing space
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
ae2c5331fb qa: Fix races with waiting for scrubs
The trigger_scrub sets the last_scrub_stamp backwards to
force a scheduled scrub.  In a small window this stamp could get propagated
to the mgr.  A test failure occurred because wait_for_scrub() was confused
by seeing a backward moving date.

The most critical change is having wait_for_scrub() make sure that the
date advances past the previous in value.

A test failed because the random backoff kept delayed triggered scrub, so
set osd_scrub_backoff throughout.

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
dddda523d1 qa: Testing of ceph-helpers.sh, teardown on fail to dump logs, save cores
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:05 -07:00
David Zafman
229de6b71d qa: Add support for core dumps
Save core dumps when running tests locally
Dump logs to output whenever cores seen

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 12:37:04 -07:00
David Zafman
61bfd236ad qa: Raise mon-data-avail-warn to pass tests with less space
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
574b3cd3d4 qa: Add common generalized inject_eio() to ceph-helpers.sh
Retry for a while to allow pool to appear

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
69413618a0 qa: ceph-helpers.sh fixes
Add missing teardown to cleanup test directory
Fix pgid due to elimination of initial default pool
Testing could never fail because run_tests return ignored

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-10 08:30:47 -07:00
David Zafman
99ad4bbd91 qa: Add create_pool() which sleeps 1 second like python variant
wait_for_clean() can miss the new pool if it races with pool create.

Fixes: http://tracker.ceph.com/issues/20465

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
David Zafman
4314cdd666 qa: Dump logs after daemons are killed to make sure everything is flushed
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-08-04 06:38:09 -07:00
Alan Somers
3aae5ca6fd scripts: fix bash path in shebangs
/bin/bash is a Linuxism.  Other operating systems install bash to
different paths.  Use /usr/bin/env in shebangs to find bash.

Signed-off-by: Alan Somers <asomers@gmail.com>
2017-07-27 13:24:26 -06:00
Sage Weil
cabad62242 qa/standalone/ceph-helpers: factor rbd pool create out of run_mon
Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:50 -04:00
Sage Weil
71ea171604 qa: move ceph-helpers and misc src/test/*.sh tests to qa/standalone
- stop running via make check
- add teuthology yamls to run them
- disable ceph_objecstore_tool.py for now (too slow for make check, and
we can't use vstart in teuthology via a package install)
- drop cephtool tests since those are already covered by other teuthology
tests
- leave a handful of (fast!) ceph-helpers tests for make check for minimal
integration tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-07-24 22:11:49 -04:00