If two StartScrub messages are received in quick succession, the earlier
one might clear the queued_or_active flag as it fails for being from an old
interval.
When that happens - a 3'rd scrub request will actually be allowed to go
through, while the scrubber is still handling the second one.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
As the state of 'being registered in the OSDs scrub queue'
corresponds to the PrimaryActive FSM state.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
set_contents causes the overflow at times because
alloc_extent is allowed to use uint32_t.
Specifically, in random_writes case, PADDING_SIZE is 256<<10,
whereas set_contents's len is uint16_t.
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
It has a limitation to keep track of the modified region using the existing
deltas because we can not get the correct region in two cases: 1) a case where replay
is done and 2) duplicate_for_write. This commit introduces modified region
to solve the problem.
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Follow the same formula to build up obj_state and version_id
in all call sites.
Resolves: rhbz#2163667
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
…ca()
Intelligently push an object to a replica. make use of existing
clones/heads and dup data ranges where possible.
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Edit the section "Add/Remove a Key" in doc/radosgw/admin.rst. Each
operation (e.g. "Adding an S3 key pair for a user", "Removing an S3 key
pair for a user") now has its own subsection. This increased granularity
should make it easier in the future to link to each of these specific
operations, if needed.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
Make minor corrections to doc/releases/reef.rst. These corrections were
suggested by Anthony D'Atri in https://github.com/ceph/ceph/pull/55049.
Signed-off-by: Zac Dover <zac.dover@proton.me>
When the fields was "secs" instead, we could hit
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1380, in _run_cephadm_json
out, err, code = await self._run_cephadm(
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1525, in _run_cephadm
raise OrchestratorError(
orchestrator._interface.OrchestratorError: cephadm exited with an error code: 2, stderr: usage: cephadm
[-h] [--image IMAGE] [--docker] [--data-dir DATA_DIR]
[--log-dir LOG_DIR] [--logrotate-dir LOGROTATE_DIR]
[--sysctl-dir SYSCTL_DIR] [--unit-dir UNIT_DIR] [--verbose]
[--timeout TIMEOUT] [--retry RETRY] [--env ENV] [--no-container-init]
[--no-cgroups-split]
{version,pull,inspect-image, . . .
...
cephadm: error: argument --timeout: invalid int value: '295.0'
where the value ends up as a floating point value
after converting to a string (which is necessary to actually
pass it to the binary). By setting the field to be an
int, we should be able to avoid this.
Signed-off-by: Adam King <adking@redhat.com>
Adds a test that will set the default cephadm command
timeout and then force a timeout to occur by holding
the cephadm lock and triggering a device refresh.
This works because cephadm ceph-volume commands
require the cephadm lock to run, so the command will
timeout waiting for the lock to become available.
Signed-off-by: Adam King <adking@redhat.com>
On python 3.6 which Ceph currently uses for its
container builds (which are based on centos 8 stream builds
hence the python version) the exception raised by a timeout
from a concurrent.futures.Future is successfully caught by
looking for asyncio.TimeoutError. However, in builds with
later python versions, e.g. 3.9.16, the timeout is no
longer caught. This results in situations like
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/utils.py", line 79, in do_work
return f(*arg)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 241, in refresh
r = self._refresh_host_devices(host)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 352, in _refresh_host_devices
devices = self.mgr.wait_async(self._run_cephadm_json(
File "/usr/share/ceph/mgr/cephadm/module.py", line 635, in wait_async
return self.event_loop.get_result(coro, timeout)
File "/usr/share/ceph/mgr/cephadm/ssh.py", line 63, in get_result
return future.result(timeout)
File "/lib64/python3.9/concurrent/futures/_base.py", line 448, in result
raise TimeoutError()
concurrent.futures._base.TimeoutError
which causes the cephadm module to crash whenever one of these
command timeouts happen. This patch is to also catch the
newer exception type so it works on later python versions as well
Signed-off-by: Adam King <adking@redhat.com>