* refs/pull/46421/head:
doc/dev: move option -R to a different section of doc
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
It's incorrect to pass option "-R fail" to the teuthology-suite command
meant for triggering tests for first time. teuthology-suite command will
fail if "-R" is passed without "-r". Therefore, move this option and its
description from the section meant for triggering tests for first time
to the section dedicated to re-running of tests.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
* refs/pull/46516/head:
doc/dev/developer_guide/testing_integration_tests: document how to test custom kernels
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
CreatePrimaryRequest::unlink_peer() invoked via "rbd mirror image
snapshot" command or via rbd_support mgr module when creating a new
scheduled mirror snapshot at rbd_mirroring_max_mirroring_snapshots
capacity on the primary cluster can race with Replayer::unlink_peer()
invoked by rbd-mirror when finishing syncing an older snapshot on the
secondary cluster. Consider the following:
[ primary: primary-snap1, primary-snap2, primary-snap3
secondary: non-primary-snap1 (complete), non-primary-snap2 (syncing) ]
0. rbd-mirror is syncing snap1..snap2 delta
1. rbd_support creates primary-snap4
2. due to rbd_mirroring_max_mirroring_snapshots == 3, rbd_support picks
primary-snap3 for unlinking
3. rbd-mirror finishes syncing snap1..snap2 delta and marks
non-primary-snap2 complete
[ snap1 (the old base) is no longer needed on either cluster ]
4. rbd-mirror unlinks and removes primary-snap1
5. rbd-mirror removes non-primary-snap1
6. rbd-mirror picks snap2 as the new base
7. rbd-mirror creates non-primary-snap3 and starts syncing snap2..snap3
delta
[ primary: primary-snap2, primary-snap3, primary-snap4
secondary: non-primary-snap2 (complete), non-primary-snap3 (syncing) ]
8. rbd_support unlinks and removes primary-snap3 which is in-use by
rbd-mirror
If snap trimming on the primary cluster kicks in soon enough, the
secondary image becomes corrupted: rbd-mirror would eventually finish
"syncing" non-primary-snap3 and mark it complete in spite of bogus data
in the HEAD -- the primary cluster OSDs would start returning ENOENT
for snap trimmed objects. Luckily, rbd-mirror's attempt to pick snap3
as the new base would wedge the replayer with "split-brain detected:
failed to find matching non-primary snapshot in remote image" error.
Before commit a888bff8d0 ("librbd/mirror: tweak which snapshot is
unlinked when at capacity") this could happen pretty much all the time
as it was the second oldest snapshot that was unlinked. This commit
changed it to be the third oldest snapshot, turning this into a more
narrow but still very much possible to hit race.
Unfortunately this race condition appears to be inherent to the way
snapshot-based mirroring is currently implemented:
a. when mirror snapshots are created on the producer side of the
snapshot queue, they are already linked
b. mirror snapshots can be concurrently unlinked/removed on both
sides of the snapshot queue by non-cooperating clients (local
rbd_mirror_image_create_snapshot() vs remote rbd-mirror)
c. with mirror peer links off the list due to (a), there is no
existing way for rbd-mirror to persistently mark a snapshot as
in-use
As a workaround, bump rbd_mirroring_max_mirroring_snapshots to 5 and
always unlink the newest snapshot (i.e. slot 4) instead of the third
oldest snapshot (i.e. slot 2). Hopefully this gives enough leeway,
as rbd-mirror would need to sync two snapshots (i.e. transition from
syncing 0-1 to 1-2 and then to 2-3) before potentially colliding with
rbd_mirror_image_create_snapshot() on slot 4.
Fixes: https://tracker.ceph.com/issues/55803
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This PR corrects some usage errors in the "Memory" section
of the hardware-recommendations.rst file. It also closes
some opened but never closed parentheses.
Signed-off-by: Zac Dover <zac.dover@gmail.com>
Set custom metadata on the snapshot as a key-value pair using
$ ceph fs subvolume snapshot metadata set <vol_name> <subvol_name> <snap_name> <key_name> <value> [--group_name <subvol_group_name>]
note: If the key_name already exists then the old value will get replaced by the new value.
note: The key_name and value should be a string of ASCII characters (as specified in python's string.printable). The key_name is case-insensitive and always stored in lower case.
note: Custom metadata on a snapshots is not preserved when snapshotting the subvolume, and hence, is also not preserved when cloning the subvolume snapshot.
Get custom metadata set on the snapshot using the metadata key::
$ ceph fs subvolume snapshot metadata get <vol_name> <subvol_name> <snap_name> <key_name> [--group_name <subvol_group_name>]
List custom metadata (key-value pairs) set on the snapshot using::
$ ceph fs subvolume snapshot metadata ls <vol_name> <subvol_name> <snap_name> [--group_name <subvol_group_name>]
Remove custom metadata set on the snapshot using the metadata key::
$ ceph fs subvolume snapshot metadata rm <vol_name> <subvol_name> <snap_name> <key_name> [--group_name <subvol_group_name>] [--force]
Using the '--force' flag allows the command to succeed that would otherwise fail if the metadata key did not exist.
Fixes: https://tracker.ceph.com/issues/55401
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
as per https://www.json.org/json-en.html, JSON encodes bool as
"true" or "false", without the quotes. before this change, the quotes
are always added when encoding boolean values.
but this change is not backward compatible.
encode_json()'s bool overload is used by rgw. it uses JSONObj
defined in common/ceph_json.h to decode JSON-encoded structs.
and it does not differentiate bool from str when decoding a boolean
value despite that it could have check the "quoted" member variable
of JSONObj for validating the type of value. so we should be fine.
Fixes: https://tracker.ceph.com/issues/55189
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
I'm changing "3" to "three" for two reasons:
1. It's correct.
2. This allows me to test backports into Octopus, Pacific, and Quincy.
I am particularly interested to see what happens when I attempt
the backport into Octopus, because backports into Octopus have
failed. This will provide me with another unit of data.
Signed-off-by: Zac Dover <zac.dover@gmail.com>
This is one in a set of PRs meant to keep the Basic
Workflow in the Developer guide current. It refines
the English in the "Integration Tests AKA ceph-qa-suite"
section of "Basic Workflow".
Several other small updates like this are expected. I
intend to avoid refining half of the page in one commit,
as I did last month when I refined the first half of the
basic workflow.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@gmail.com>
before lua script is being executed, we keep the tracer runtime configuration value, and then decides whether to trace or not the request based on the value that maybe changed during lua exeuction, so we can disable/enable tracing for request even if the tracer is in the opposite state at the same time
Signed-off-by: Omri Zeneva <ozeneva@redhat.com>
This PR updates the basic-workflow.rst file
to serve the needs of people in 2022 who were not
present at jump street.
The text has been refined up to the section called
"Integration Tests" (non-inclusive).
Signed-off-by: Zac Dover <zac.dover@gmail.com>
Add additional explanation about stretch mode as suggested by Anthony,
rework a bit the layout of argument details.
Signed-off-by: Eneko Lacunza <elacunza@binovo.es>