Previously it still could race when unmap_device returned success
because the device was not found in `rbd-nbd list-mapped` (the nbd
device was removed) but the test failed because the process was still
found in the ps table.
Fixes: https://tracker.ceph.com/issues/47394
Signed-off-by: Mykola Golub <mgolub@suse.com>
Since cephadm is py3 based, and py2 is EOL this patch
removes the py2 test iteration from test_adoption.sh
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Since py2 is EOL, and cephadm requires py3 anyway this
patch removes the py2 test iteration from the functional
testing suite.
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Previously, the peer uuid variable was empty which resulted in the failure
to remove the duplicate peer.
Fixes: https://tracker.ceph.com/issues/47007
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sometimes when teuthology machines are provisioned, the command
`hostname --fqdn` does not provide a fully qualified domain name but
instead just the hostname (e.g., smithi149 instead of
smithi149.front.sepia.ceph.com). This prevents the teuthology test for
rgw-orphan-list from running successfully [for example, the hostname
was for some reason mis-interpreted as the bucket name in the
request].
This commit checks whether the hostname derived from `hostname --fqdn`
contains any '.'s and if it does not, it will append
".front.sepia.ceph.com" to the hostname. This is a hack, but until
teuthology machines are configured appropriately it seems to be a
reasonable work-around.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
I haven't seen it be an issue, but I'm worried a slight different in ping
report timing might result in flapping leaders even with the new
ignore-out-of-quorum code.
Imagine DCs A, B, C where A and B are netsplit: C might first elect A, then
get a propose from B immediately following a successful ping reply that gives
it a better score than A and thus gets an election win; then A could do
the same, etc.
In a default 12-hour halflife, 2-second ping config, the most a single ping
can change the score is by 0.00002314814. Therefore a code default of .0001
and a config default of .0005 should be plenty of room to prevent that in
sane monitor configurations, while still responding quickly if connections are
restored.
Plus of course this only applies to out-of-quorum monitors to peons, so if
a monitor manages to contact the leader they will be allowed to join
instantly.
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
- add write_or_get method
- fix PrimaryPG caller ot use write_or_get
- remove old method it previously called that did weird things
- cls_chunk_refcount_* -> cls_cas_chunk_*
- add _ref suffix for get and put to avoid confusion (get/put could mean
read/write)
- some comments
- move (internal) refcount representation into separate header
Signed-off-by: Sage Weil <sage@newdream.net>
The "proxy" and "forward" cache-tier modes have been completely removed,
so it's sufficient to test once that they cannot be set.
Fixes: a0a3ed324a
Signed-off-by: Nathan Cutler <ncutler@suse.com>
It's no longer necessary to handle Xenial as a special case.
Fixes: https://tracker.ceph.com/issues/45561
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
We should be building the version of rocksdb the release is pinned to,
not master. Let's just update the rocksdb submodule and clone that.
Fixes: https://tracker.ceph.com/issues/44981
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
Add teuthology test for `rgw-orphan-list` in a new tool suite under
rgw. It only needs to be tested under one configuration. And the new
tool sub-suite can be used by other tooling int he
future. radosgw-admin `radoslist` is tested indirectly through
`rgw-orphan-list` and therefore does not need its own test.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
We need to temporary disable "exit on error" mode so it does not
abort when `rbd mirror pool peer add` returns "already exists"
error code.
Signed-off-by: Mykola Golub <mgolub@suse.com>
In recent versions `rbd list-mapped` does not print the white space
at the end of the line.
Fixes: https://tracker.ceph.com/issues/45305
Signed-off-by: Mykola Golub <mgolub@suse.com>
Give the monitoring stack (node-exporter, prom, grafana) a few more
retries to become available before giving up
Signed-off-by: Michael Fritch <mfritch@suse.com>
fb4311f5 has fixed this for setup, but "remove mirroring pool"
test needs fixing too.
Fixes: https://tracker.ceph.com/issues/44938
Signed-off-by: Mykola Golub <mgolub@suse.com>
Isa / jerasure codec ‘technique’, obtained by the following statements,
"eval technique_parameter=\$${plugin}2technique_${technique}",
which generate a string such as "isa2technique_vandermonde",
and assign the value of "technique_parameter" to "isa2technique_vandermonde".
String such as "isa2technique_vandermonde" should have a preset value, but it does not,
which will cause string "technique_parameter" to be empty.
Run the script, prompt the following error message and exit:
isa technique= is not a valid coding technique. Choose one of the following: reed_sol_van,cauchy
To fix the bug, specify a preset value for "technique_parameter":
+ isa2technique_vandermonde='reed_sol_van'
+ isa2technique_cauchy='cauchy'
+ jerasure2technique_vandermonde='reed_sol_van'
+ jerasure2technique_cauchy='cauchy_good'
Signed-off-by: lijiaxu <lijiaxu@cmss.chinamobile.com>
We might race with the remote rbd-mirror daemon creating a
tx-only peer when adding a new peer. Therefore, delete the
tx-only peer and attempt to re-create it.
Fixes: https://tracker.ceph.com/issues/44938
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Print warning when using cephadm from master
See also "use quay octopus tip until 15.2 tag is available"
* a9b15c7e1a0c14376cd66f166370694294398494.
See also "update default container images"
* 1f05f7578794380f969a7e93db07345626b3e4df.
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
The free-form journal replay status description is now JSON-encoded. The
"master"/"mirror" designators have been changed to "primary"/"non_primary"
to better align with RBD terminology.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
verify whether min_size is recalculated when osd
pool size is changed.
fixes: https://tracker.ceph.com/issues/44862
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
Passing an empty 'args' dict as a data argument when calling
requests.get somehow confuses the transaction, causing it to fail. Pass
'None' instead.
Fixes: https://tracker.ceph.com/issues/43720
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
* refs/pull/34105/head:
Merge PR #34042 into octopus
Merge PR #33959 into octopus
Merge PR #34067 into octopus
mgr/DaemonServer: add explicit check that acting matches for merge
Merge pull request #34040 from dillaman/wip-44396-partial-fix
Merge PR #34098 into octopus
mgr/rook: list rgw services
mgr/rook: tolerate timestamps that are None
mgr/orch: add 'subcluster' property to RGWSpec
mgr/rook: do not create radosgw pools
mgr/rook: refactor apply/add for rgw
Merge PR #34082 into octopus
Merge PR #34068 into octopus
cephadm: relabel /etc/ganesha mount
Merge PR #34046 into octopus
Merge PR #34092 into octopus
Merge pull request #33719 from ukernel/wip-44416
rbd-mirror: leader watcher should not cancel get locker if locker is invalid
rbd-mirror: snapshot sync request needs to check for interruption
librbd: request exclusive lock when moving to trash
rbd-mirror: basic integration with sync throttling
rbd-mirror: don't prematurely finish snapshot replay loop
rbd-mirror: pass InstanceWatcher to snapshot Replayer
doc/releases/octopus.rst: add note about ec recovery below min_size
mgr/cephadm: configure rgw_frontends for rgw service
cephadm: switch grafana image to the ceph repo
Merge PR #34034 into octopus
qa/suites/rados/cephadm/upgrade: update starting version
Merge PR #33540 into octopus
Merge PR #34023 into octopus
Merge PR #34044 into octopus
Merge PR #34030 into octopus
doc/orchestrator: update rgw creation
mgr/cephadm: clean up client.crash.* container_image settings after upgrade
cephadm: make add-repo --release and --version independent
cephadm: env over last used
mgr/orch: accept port and ssl flags to 'apply rgw'
mgr/orch: 'ceph upgrade ...' -> 'ceph orch upgrade ...'
cephadm: fall back to default for infer_image
cephadm: remove outdated check
cephadm: consolidate default image logic
remove ceph_test_rados_watch_notify
python-common/ceph/deployment/service_spec: add ssl to RGWSpec
cephadm: only infer image for shell, run, inspect-image, pull, ceph-volume
mgr/test_orchestrator: fix service filtering when using dummy data
mgr/dashboard: fix adding/removing host errors
mgr/rook: fix 'orch ps' for osds
qa: fix all the fsx.sh-invoking yaml files to install dependencies
mds: pass proper MutationImpl::LockOp to Locker::wrlock_start()
Reviewed-by: Kiefer Chang <kiefer.chang@suse.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
* refs/pull/34060/head:
Merge PR #34027 into octopus
Merge PR #34045 into octopus
Merge pull request #34035 from dillaman/wip-rbd-permissions
mgr/progress: fix duration strings
Merge PR #34014 into octopus
Merge PR #34001 into octopus
Merge PR #34011 into octopus
qa/workunits/rbd: use context managers to control Rados lifespan
Merge pull request #34032 from dillaman/wip-rbd-octopus-docs
doc/releases/octopus: add additional RBD improvements
qa/workunits/cephadm/test_cephadm: mark services unmanaged for test
mgr/cephadm: do not reconfig unmanaged services
Merge PR #33981 into octopus
Merge pull request #34018 from ajarr/octopus-subvolume-clone-cancel
qa/workunits/cephadm/test_cephadm: output file for pub key
Merge PR #33866 into octopus
Merge PR #34005 into octopus
Merge PR #34013 into octopus
mgr/cephadm: pytest: Enable SpecStore
mgr/orchestrator: add test for default implementation for apply()
python-common: validate ServiceSpec.service_type
fixup mgr/cephadm: Fix ceph orch apply -i
mgr/dashbaord: orchestrator service: Revert wait_api_result to a single completion
mgr/orchestrator: `orch daemon add` accepts a yaml
mgr/cephadm: apply_drivegroups() returns a single Completion
mgr/cephadm: remove `trivial_result()`
mgr/cephadm: Fix `ceph orch apply -i`
Merge pull request #33994 from dillaman/wip-librbd-poll-event-race
doc: document `clone cancel` command
test: add `clone cancel` tests
mgr/volumes: introduce "clone cancel" volume command
mgr/volumes: allow canceling a single asynchronous job for a volume
mgr/volumes: helper for looking up a clone entry index
mgr/volumes: periodically check if clone operations should be canceled
mgr/volumes: periodically check if copy operations should be canceled
mgr/volumes: introduce 'canceled' state in clone op state machine
qa/suites/rados/verify/validater/valgrind: tolerate SLOW_OPS
qa/suites/rados/verify/validater/valgrind: less bluestore logging
qa/suites/rados/verify/validater: increase heartbeat grace
Revert "qa/suites/rados/verify: debug_ms = 1, osd_heartbeat_grace = 60"
Revert "qa/suites/rados/verify/validator/valgrind: debug refs = 5"
ceph_test_watch_notify: try notify 10x if ALLOW_TIMEOUTS is set
ceph_test_rados_api_misc: ShutdownRace timeout if ALLOW_TIMEOUTS is set
qa/suites/rados/verify: set ALLOW_TIMEOUTS for workunits
doc/install: edits
doc/cephadm: more edits
doc/cephadm/install: edits
doc/cephadm/adoption: improvements
doc/cephadm/install: a few edits
doc/cephadm/install: do not install ceph-common on host (by default)
doc/cephadm: drop os recs link
doc/cephadm/upgrade: improvements
doc/cephadm/upgrade: document upgrade
doc/cephadm/install: revamp install docs
doc: reorganize cephadm docs
doc/cephadm/administration: update docs on customizing SSH config
doc/cephadm/administration: add a note about the 'removed' dir
mgr/balancer: tolerate pgs outside of target weight map
qa/workunits/cephadm/test_cephadm: --skip-monitoring-stack
Merge PR #33974 into octopus
Merge PR #33442 into octopus
Merge PR #33997 into octopus
Merge PR #34000 into octopus
use quay octopus tip until 15.2 tag is available
python-common: reduce output of ServiceSpec.to_json()
python-common,mgr/cephadm: move assert_valid_host to service_spec
mgr/cephadm: add HostAssignment.validate()
mgr/dashboard: adapt create_osds interface change
mon/MgrMonitor: make 'mgr fail' work with no arguments
cephadm: add allow_ptrace option to enable SYS_PTRACE
update default container images
mgr/cephadm: limit number of times check host is performed in the serve loop
Merge PR #33961 into octopus
Merge PR #33952 into octopus
Merge PR #33990 into octopus
Merge PR #33955 into octopus
Merge PR #33936 into octopus
mgr/orch: add --all-available-devices to 'orch apply osd'
qa/workunits/cephadm: --skip-mon-network when using 127.0.0.1
cephadm: add tests
qa/tasks/cephadm: pass -v to bootstrap
mgr/cephadm: only try to place mons on hosts matching public_network
mgr/cephadm: keep track of host networks, ips
cephadm: automatically infer mon public_network, if we can
cephadm: add list-networks command
cephadm: bootstrap: deploy monitoring stack by default
librbd: defer event socket completion until after callback issued
cephadm: add-repo: add --version
mgr/cephadm: respect 'unmanaged' flag in spec
mgr/orch: orch ls: show <no spec> or <unmanaged> as appropriate
mgr/orch: orch ls: rename SPEC -> PLACEMENT
mgr/orch: add 'unmanaged' property to ServiceSpec
cephadm: rename distro args in repo methods
mgr/orch: combine 'orch daemon add <type> ...' into one command
mgr/orch: combine 'orch apply <type> [<placement>]' into one command
Reviewed-by: Laura Paduano <lpaduano@suse.com>
* refs/pull/34027/head:
qa/workunits/cephadm/test_cephadm: mark services unmanaged for test
mgr/cephadm: do not reconfig unmanaged services
qa/workunits/cephadm/test_cephadm: output file for pub key
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Specify either --release name (to get the latest) or --version x.y.z to
get a specific version.
Adapt to updated locations on download.ceph.com so that we don't need to
know the release name for a specific x.y.z release.
Signed-off-by: Sage Weil <sage@redhat.com>
This is an old test, we have good watch/notify coverage in the newer
tests, and it is buggy.
Fixes: https://tracker.ceph.com/issues/43861
Signed-off-by: Sage Weil <sage@redhat.com>
There is a potential race between the expected exceptions being
thrown and Python shutting down racing with librados background
threads. Ensure that librados is properly shut down prior to
exiting Python.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
We are deploying containers manually. Mark them unmanaged so that we
do not fight against mgr/cephadm cleaning up orphan daemons.
Signed-off-by: Sage Weil <sage@redhat.com>
- For tests, use bleeding-edge octopus branch
- For production defaults, use ceph/ceph:v15.2 tag
- For bootstrap, grab cephadm script from latest octopus branch
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/33952/head:
qa/workunits/cephadm: --skip-mon-network when using 127.0.0.1
cephadm: add tests
qa/tasks/cephadm: pass -v to bootstrap
mgr/cephadm: only try to place mons on hosts matching public_network
mgr/cephadm: keep track of host networks, ips
cephadm: automatically infer mon public_network, if we can
cephadm: add list-networks command
Reviewed-by: Sebastian Wagner <swagner@suse.com>
This was present, but a no-op.
By default, install cephadm.
Optionally take a list of packages to install instead (e.g., ceph-common).
Signed-off-by: Sage Weil <sage@redhat.com>
* refs/pull/33064/head:
cephadm: add version to `command_ls` output
cephadm: add type checking to `update_filewalld`
cephadm: allow prepare-host to start an enabled service
cephadm: add type checking for `check_host` and `prepare_host`
cephadm: generalize logic for checking and enabling units
cephadm: add 'CEPH_CONF' to the NFS ganesha container envs
cephadm: trim nfs.json sample
qa/workunits/cephadm/test_cephadm.sh: systemctl stop nfs-server
qa/workunits/cephadm/test_cephadm.sh: make pgs available
cephadm: add some log lines
cephadm: check port in use
cephadm: add/remove nfs ganesha grace
cephadm: update firewalld with nfs service
qa/workunits/cephadm/test_cephadm.sh: add nfs-ganesha test
cephadm: add ganasha.conf
cephadm: add NFSGanesha deployment type
cephadm: consolidate list of supported daemons
cephadm: use keyword instead of positional args
Reviewed-by: Sebastian Wagner <swagner@suse.com>
For the case when the non-global level does not have a schedule
and a higher level is used as the parent, it wrongly listed
schedules from all branches under the parent, instead of only the
interested one.
Signed-off-by: Mykola Golub <mgolub@suse.com>
we normalize object-locator to object_locator when parsing command line
options. but object-locator is more consistent with other options
suppored by "rados" cli, and "-" is easier to type than "_". it's also
more widely used in command line options.
Signed-off-by: Kefu Chai <kchai@redhat.com>
The OpenStack tempests tests do not stay stable and break approximately
every six months. Remove the test suite for now.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The unmap action only sends a signal to the kernel to notify the
rbd-nbd daemon to disconnect. Therefore, it's possible that an
unmap followed by an immediate re-map to the same device might
fail since the unmap is still in-progress.
Fixes: https://tracker.ceph.com/issues/44567
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Ensure that snapshot-based mirroring is tested in different RBD image
feature combinations.
Fixes: https://tracker.ceph.com/issues/44396
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Adds option `mon_allow_pool_size_one` which will be disabled by default
to ensure pools are not configured without replicas.
If the user still wants to use pool size 1, they will have to change the
value of `mon_allow_pool_size_one` to true and then have to pass flag
`--yes-i-really-mean-it` to cli command:
Example:
`ceph osd pool test set size 1 --yes-i-really-mean-it`
Fixes: https://tracker.ceph.com/issues/44025
Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
* refs/pull/33636/head:
qa: add upgrade test for volume upgrade from legacy
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
This tests that volumes created using the ceph_volume_client.py library
continue to be accessible/function via the Nautilus/Octopus ceph-mgr
volumes plugin.
Fixes: https://tracker.ceph.com/issues/42723
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/33138/head:
common/TextTable: only pad between columns
mgr/status: align with ceph table style
mgr/osd_perf_query: make table match ceph style
mgr: adjust tables to have 2 space column separation
common/TextTable: default to 2 spaces separating columns
Reviewed-by: Sebastian Wagner <swagner@suse.com>
The 'ceph' CLI and 'rbd mirror pool/image status' commandsshould revert
to use the admin user so that it has proper credentials for the cluster.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This ticket seems to suggest that (1) the root cause is related to an
exec that is orphaned and screws up the container state (due to, e.g., ssh
dropping, or a timeout), (2) -f may be needed, sometimes, to recover, and
(3) newer versions fix it.
https://github.com/containers/libpod/issues/3226
Way back in 26f9fe54cb we found that using
-f the first time around was a Bad Idea, so we'd rather avoid this.
Instead, just avoid triggering the bug.
Signed-off-by: Sage Weil <sage@redhat.com>