* create an instance of the QuiesceDbManager in the rank
* update membership with a new mdsmap
* add an admin socket command for sending requests to the manager
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
When used as a decorator, it saves one indented try-catch block inside the decorated method.
This can be applied to most of the methods in the file, subject to a separate refactoring commit
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
with the parameter set, the message won't be held on to when the remote end resets
or fails to reconnect.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
QuiesceAgent is the layer that converts updates from the QuiesceDb
into calls to the QuiesceProtocol APIs, and then sends async acks
back to the db manager following the quiesce protocol events.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Quiesce DB is one of the components of the "Consistent Snapshots" epic.
The solution is discussed in a slide deck available for viewing to @redhat users:
https://docs.google.com/presentation/d/1wE3-e9AAme7Q3qmeshUSthJoQGw7-fKTrtS9PsdAIVo/edit?usp=sharing
This commit is focusing on the replicated quiesce database maintained by the MDS rank cluster.
One of the major goals was to design the component in a way that can be easily tested
outside of the MDS infrastructure, which is why the communication layer
has been asbtracted out by introducing just two communication callbacks
that will need to be implemented by the infrastructure.
The most of the component code is delivered in a single coherent commit, along with the uint tests.
Other commits will be dedicated to integration with the MDS infrastructure and other changes
that can't be attributed to the core quiesce db code or its tests.
The quiesce db component is composed of the following major parts/actors:
* QuiesceDbManager is the main actor, implementing both the leader and the replica roles.
Normally, there will be an instance of the manager per MDS rank, although given the
decoupling of the infrastructure and the manager, one can run any number of instances
on a single node, which is how test are working.
* The manager interfaces to the infrastructure via two main APIs with the infrastructure
that provides communication and cluster configuration (actor 2) and the quiesce db
client that is responsible for the quiescing of the roots (actor 3)
** ClusterMembership is how manager is configured to be part of a (virtual) cluster.
This structure will deliver information about other peers, the leader and provide
two communication APIs: send_listing_to for db replication from the leader to replicas
and send_ack for reporting quiesce success from the agents.
** Client Interface consists of a QuisceMap notify callback and a dedicated manager
method to submit asynchronous acks following the agent (rank) quiesce progress.
The API of the quiesce db is described in the slide deck mentioned above. The full scope
of capabilities are encapsulated in a single QuiesceDbRequest structure. This should
simplify the implementation of other components that will have to propagate the functionality
to the administrator user of the volumes plugin.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Remove mention of the "PG calc" tool from the documentation. I have
removed all mention of this in one fell swoop to help posterity restore
mention of this tool if we decide we need to do so.
Signed-off-by: Zac Dover <zac.dover@proton.me>
this allows us to use newer liburing features. Seastar is using
some of them which are not provided by liburing 0.7.
in this change, `--use-libc` is passed to configure. otherwise
it does not link against libc, and the symbles like memset()
won't be available when compiling liburing.so with -fPIC using
clang, which does not pull libc in that case.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Getting warning about node16 being deprecated. The workflow doesn't use node
directly, but through the external actions. Moving to node20 requires
changing setup-python version; Bhacaz/checkout-files is deprecated and
recommends actions/checkout.
Signed-off-by: Dan Mick <dmick@redhat.com>
Add a manual RADOSGW installation procedure to
doc/install/manual-deployment.rst. This procedure was developed by Janne
Johansson and reported to the ceph-users mailing list on 29 Jan 2024
here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LB3YRIKAPOHXYCW7MKLVUJPYWYRQVARU/
Co-authored-by: Janne Johansson <icepic.dz@gmail.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
The rbd-wnbd daemon currently caches one rados context per cluster.
However, it's registering hooks against the global context
admin socket, which won't be available. For this reason,
the "rbd-wnbd stats" command no longer works.
To address this issue, we'll ensure that rbd-wnbd sets command hooks
against the right admin socket instance, leveraging the image
context.
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
For each rbd-wnbd mapping we set an admin socket hook that can
be used to retrieve IO stats.
Now that the same daemon is reused for multiple mappings, we need
to distinguish the images when receiving a "stats" request.
For this reason, we'll add the image identifier to "wnbd stats"
admin socket commands.
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
The "rbd-wnbd unmap" command is currently telling the WNBD driver
to remove the mapping without contacting the rbd-wnbd daemon
and waiting for it to perform its cleanup.
For this reason, attempting to delete the image immediately after
unmapping it can fail due to existing watchers.
As a temporary solution, we'll retry the image remove operation.
At a later time, we'll update the "rbd-wnbd unmap" command to go
through the rbd-wnbd daemon, ensuring that all the necessary
cleanup is performed before returning.
While at it, we're dropping a redundant LOG.error call so that we
won't print expected exceptions.
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
This commit will store the mapping config in the Windows registry
only after initializing the mapping. This ensures that we aren't
replacing the registry settings for already mapped images.
We'll also check if the registry setting was added by us before
cleaning it up.
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
We're currently using one rbd-wnbd process per image mapping.
Since OSD connections aren't shared across those processes,
we end up with an excessive amount of TCP sessions, potentially
exceeding Windows limits:
https://ask.cloudbase.it/question/3598/ceph-for-windows-tcp-session-count/
In order to improve rbd-wnbd's scalability, we're going to use
a single process per host (unless "-f" is passed when mapping the
image, in which case the daemon will run as part of the same
process). This allows OSD sessions to be shared across image
mappings.
Another advantage is that the "ceph-rbd" service starts faster,
especially when having a large number of image mappings.
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
We're moving most of the WNBD mapping handling to a separate
class called RbdMapping. This simplifies cleanup and makes it
easier to reuse.
The WnbdHandler class covers WNBD specific operations and IO
callbacks while the RbdMapping wrapper will take care of RBD
operations.
A subsequent change will make use of it while switching from
one process per mapping to a single process per host.
While at it, we're also moving the rbd-wnbd config helpers
to separate files.
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Improve the entry for "MDS" in doc/glossary.rst by linking to the
"ceph-mds" man page and mentioning the relationship between clients and
MDS (or MDSes).
Signed-off-by: Zac Dover <zac.dover@proton.me>
... to rbd and krbd suites respectively.
This allows the compare-mirror-image tests introduced in ea3a567
to be run against various kernel branches, e.g., testing branch.
And allows diff_continuous test in rbd_suite to run against distro
kernel.
Fixes: https://tracker.ceph.com/issues/64574
Signed-off-by: Ramana Raja <rraja@redhat.com>