* refs/pull/33809/head:
qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
qa/standalone/scrub/osd-scrub-test: wait longer for update
Reviewed-by: David Zafman <dzafman@redhat.com>
Adds option `mon_allow_pool_size_one` which will be disabled by default
to ensure pools are not configured without replicas.
If the user still wants to use pool size 1, they will have to change the
value of `mon_allow_pool_size_one` to true and then have to pass flag
`--yes-i-really-mean-it` to cli command:
Example:
`ceph osd pool test set size 1 --yes-i-really-mean-it`
Fixes: https://tracker.ceph.com/issues/44025
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
test: Expect being off by up to 2 and make sure all PGs are active+clean
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
The -N option to vstart.sh was removed, use -k
Old hinfo_key binary happen to be utf-8 decodable, now it
throws an exception trying to decode it. Use new
option to ceph-objectstore-tool to treat stdout as a terminal
and convert binary data to base64.
Signed-off-by: David Zafman <dzafman@redhat.com>
One of our customers wants to verify the data safety of Ceph during scaling
the cluster up, and the test case looks like:
- keep checking the status of a speficied pg, who's up is [1, 2, 3]
- add more osds: up [1, 2, 3] -> up [1, 4, 5], acting = [1, 2, 3], backfill_targets = [4, 5],
pg is remapped
- stop osd.2: up [1, 4, 5], acting = [1, 3], backfill_targets = [4, 5], pg is undersized
- restart osd.2, acting will stay unchanged as 2 belongs to neither current up nor acting set,
hence leaving the corresponding pg pinning undersized for a long time until all backfill
targets completes
It does not pose any critical problem -- we'll end up getting that pg back into active + clean,
except that the long live DEGRADED warnings keep bothering our customer who cares about data
safety more than any thing else.
The right way to achieve the above goal is for:
boost::statechart::result PeeringState::Active::react(const MNotifyRec& notevt)
to check whether the newly booted node could be validly chosen for the acting set and
request a new temp mapping. The new temp mapping would then trigger a real interval change
that will get rid of the DEGRADED warning.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: Yan Jun <yan.jun8@zte.com.cn>
Make sure PGs peer (simply flushing state to mon isn't enough).
Fixes: https://tracker.ceph.com/issues/43721
Signed-off-by: Sage Weil <sage@redhat.com>
To avoid confusion fix function names in osd-backfill-space.sh for how
they actually work.
Fixes: https://tracker.ceph.com/issues/43592
Signed-off-by: David Zafman <dzafman@redhat.com>
There were a couple of problems found by flake8 in the qa/
directory (most of them fixed now). Enabling flake8 during the usual
check runs hopefully avoids adding new issues in the future.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
* refs/pull/32138/head:
ceph-daemon: combine SUDO and ARGS into a single var
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
- reduce the amount of typing/noise for each CEPH_DAEMON invocation
- ensure the `--image` param is passed to each test invocation
- allow passing additional args to ceph-daemon via CEPH_DAEMON_ARGS
Signed-off-by: Michael Fritch <mfritch@suse.com>
The py2 ConfigParser doesn't like whitespace before the config option
name. (The py3 version doesn't care.) Filter it out before parsing.
Signed-off-by: Sage Weil <sage@redhat.com>
I thought I took this out of the PR but somehow it got merged in... must
have repushed and old branch and not realized. :/
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/32039/head:
test: Improve races by using kill_daemons which waits for OSDs terminate
test: run-standalone.sh: Only run execs in the subdirectories of qa/standalone
test: Use activate_osd() when restarting OSDs
test: osd-scrub-snaps.sh: Fix race with osd restart and doing a scrub
Reviewed-by: Neha Ojha <nojha@redhat.com>
Allow passing CEPH_DAEMON via the environment or default to using the
script from the standard location.
Signed-off-by: Michael Fritch <mfritch@suse.com>
We need to pay attention to account for CRUSH_ITEM_NONE entries in the
EC PG acting set.
Fixes: https://tracker.ceph.com/issues/43151
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/31869/head:
ceph-daemon: bootstrap: deploy initial mon via deploy_daemon()
qa/standalone/test_ceph_daemon.sh: more $SUDO
ceph-daemon: configure firewalld for new daemon deploys
ceph-daemon: name mgr the same way mgr/ssh does
Reviewed-by: Michael Fritch <mfritch@suse.com>
* refs/pull/31913/head:
ceph-daemon: Allow env var for setting the used image
Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Instead of always adding "--image my-custom-image" when calling
ceph-daemon with a non-standard image, allow to set the environment
variable called CEPH_DAEMON_IMAGE which will adjust the --image
default.
That way, the command line arguments when using ceph-daemon with a
custom image are a bit shorter.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
The deepsea.tgz tar contains actual device nodes for the OSD block devices
(not symlinks or files). Must be root to untar.
Signed-off-by: Sage Weil <sage@redhat.com>
mktemp creates these files, so we have to pass --allow-overwrite (or
delete them after we get the unique name but before we write to them--this
is easier).
Broken by c7fe27a72a
Signed-off-by: Sage Weil <sage@redhat.com>
osd/OSDMap: Show health warning if a pool is configured with size 1
Reviewed-by: Sage Weil <sweil@redhat.com>
Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Introduce a config option called 'mon_warn_on_pool_no_redundancy' that is
used to show a health warning if any pool in the ceph cluster is
configured with a size of 1. The user can mute/unmute the warning using
'ceph health mute/unmute POOL_NO_REDUNDANCY'.
Add standalone test to verify warning on setting pool size=1. Set the
associated warning to 'false' in ceph.conf.template under qa/tasks so
that existing tests do not break.
Fixes: https://tracker.ceph.com/issues/41666
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Moving ceph-daemon into src/ceph-daemon/ makes it simpler to add extra
code (eg. tox.ini, README, unittests, ...) specific to ceph-daemon.
That way related files are in a single directory.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
Instead of hardcoding the images, make them configureable via
environment variables.
That way, downstream can use the script with custom images.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
When running test_ceph_daemon.sh from the root dir and not setting
$CEPH_DAEMON manually, the call hangs at:
$ ./qa/standalone/test_ceph_daemon.sh
[...]
+ for p in $PYTHONS
+ echo '=== re-running with python3 ==='
=== re-running with python3 ===
++ which python3
+ ln -s /usr/bin/python3 /tmp/tmp.6hneCsNMio/python
+ echo '#!/tmp/tmp.6hneCsNMio/python'
+ cat
Check that there is a ceph-daemon found before continue.
Signed-off-by: Thomas Bechtold <tbechtold@suse.com>
It's usually okay to use the mon. key for CLI commands, except we had a
mgr but that prevented you from issuing mgr commands correctly. We have
the new client.admin key available, so use that instead.
Update tests to not --skip-ssh (now that it doesn't hang).
Signed-off-by: Sage Weil <sage@redhat.com>
Sometimes we run containers on a host that doesn't have a crash dir set
up (becuase no daemon has been deployed). Examples include shell and
ceph-volume.
Signed-off-by: Sage Weil <sage@redhat.com>
Skip new check for meta collection
test:
Turn off osd_pool_default_pg_autoscale_mode just like bash tests do
Fix test by checking for new error message
Caused by: f88b353454
Fixes: https://tracker.ceph.com/issues/42476
Signed-off-by: David Zafman <dzafman@redhat.com>
* refs/pull/30859/head:
auth: EACCES, not EPERM
mon: shunt old tell commands from cli interface to asok
mon: allow mgr to tell mon.foo smart
mon: include quorum features in quorum_status
qa/workunits/mon/caps.sh: fix test
ceph_test_rados_api_cmd: fix MonDescribe test
Merge branch 'vstart-fs-auth' of git://github.com/batrick/ceph into wip-cleanup-mon-asok
test/pybind/test_ceph_argparse: fix tests
vstart: add volume client keys to keyring
vstart: use fs authorize to create master client key
vstart: redirect some output to stderr
vstart: output command strings to stderr
qa/workunits/cephtool/test.sh: fix 'quorum enter' caller
qa: change mon_status calls to quorum_status or tell commands
mon: fix 'heap ...' command
mon: consolidate 'sync force' commands
mon: allow asok commands to return an error code
mon: move 'quorum enter|exit' and 'mon_status' to asok
mon: fix 'smart' asok command
mon: remove old 'config set' and 'injectargs'
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
It's not identical to enter. enter seems more intuitive to me, but that
may be because I'm not a longtime docker user.
Signed-off-by: Sage Weil <sage@redhat.com>
- sudo as needed
- clean up afterward
There is still a bit of missing coverage, but this captures most of it.
Signed-off-by: Sage Weil <sage@redhat.com>
This is cleaner. All users are currently standalone tests; updated.
It also means that *all* commands that have a name=pgid arg are pg tell
commands.
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/30475/head:
qa/standalone/ceph-helpers: default pg autoscale mode off for standalone
os/bluestore: fix objectstore_blackhole read-after-write
test,misc: do not specify pg_num per pool
mgr/volumes: do not specify pg_num
pybind/ceph_volume_client: do not specify pg_num for new pools
doc: remove all pg_num arguments to 'osd pool create'
mon: do not require pg_num to 'osd pool create'
common: default pg_autoscale_mode=on for new pools
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
This is similar to how recovery reservations are split between
local and remote.
It was the case that scrubs_pending was used for reservations at
the replicas as well as at the primary while requesting reservations
from the replicas. There was no need for scrubs_pending to turn
into scrubs_active at the primary as nothing treated that value
as special. scrubber.active = true when scrubbing is
actually going.
Now scurbber.local_reserved indicates scrubs_local incremented
Now scrubber.remote_reserved indicates scrubs_remote incremented
Fixes: https://tracker.ceph.com/issues/41669
Signed-off-by: David Zafman <dzafman@redhat.com>
feature: Health warnings on long network ping times, add "dump_osd_network" to get a report
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>