Replaced the content of `HACKING.rst` in the dashboard source code
directory with a pointer to the new location in the developer guide.
Updated references in `README.rst` to also point to the online versions
of these files.
Fixes: tracker.ceph.com/issues/47396
Signed-off-by: Lenz Grimmer <lgrimmer@suse.com>
mds_cluster.mds_fail() runs command "mds fail" not "fs fail". The reason
for failure was PR #32581 which accidentally changed the return code
from 0 to EINVAL. Since this was reversed in PR #37159, the change
introduced by 04ed58f is not only incorrect but also redundant.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
mgr/dashboard: Disabling the form inputs for the read_only modals
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Tiago Melo <tmelo@suse.com>
Added a separate endpoint for osd/histogram - api/osd/{svc_id}/histogram
Fixes:https://tracker.ceph.com/issues/46898
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
In filesystem.py, don't set value of reset_obj_attrs to False.
Fixes: https://tracker.ceph.com/issues/47526
Signed-off-by: Rishabh Dave <ridave@redhat.com>
When checking if a certain fs subcommand can and should be executed in
FSCommands.cc, check permissions in "profile_grants" too when the caps
for that entity contains a cap profile.
Fixes: https://tracker.ceph.com/issues/47423
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This allows a specific IOContext to be used regardless of the image's
current read and write snapshot state.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This allows a specific IOContext to be used regardless of the image's
current read and write snapshot state.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Deep-copy will require the ability to issue IOs against arbitrary
IOContexts via the image-extent IO dispatcher.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This was a legacy implementation where it was assigned by the ImageRequestWQ
and therefore needs to be part of the factory methods.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Now that we don't need to worry about read requests issuing a finish
callback, we can use a simple counter to track in-flight writes.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Any dispatch layer can now directly place themselves in the finish
callback handler chain without the use of the generic callback.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This mimics the design from the object dispatcher and will allow
for simplified in-flight IO tracking.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
We can now pass the flush through the exclusive-lock dispatch layer
to ensure all in-flight IOs have been processed.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Only flush requests coming from the refresh state machine or from the
exclusive-lock dispatch layer initializationshould be ignored. This is
because both can be initiated from the refresh state machine and
therefore deadlock.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The exclusive-lock dispatch layer should be locked and flushed to
ensure no IO is waiting for a refresh. Once that is complete, interlock
with the refresh state machine and re-flush one last time w/ the
refresh dispatch layer skipped.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The exclusive-lock dispatch layer will already block IOs as required
so this second layer of blocking just increases the complexity and
the potential for deadlocks when attempting to flush.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
If the exclusive-lock layer is being initialized/shut down at image
open/close, there is no IO flowing so there is no need to flush.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This will allow improved tracking and bypassing of a flush request
that might cause IO deadlocks in dispatch layers.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
These packages are needed in order to scrape device health metrics from
devices used by OSD and MON daemons.
smartmontools' smartctl is what we use in order to scrape devices' SMART
attributes and general health metrics.
In addition, we use nvme-cli tool on NVMe devices, which fetches
vendor specific NVMe related health metrics.
Ceph rely on these tools for proper functioning of the underlying layers
of devicehealth mgr module, and other mgr modules which use devicehealth
functionality (such as diskprediction_local, telemetry, dashboard).
Essentially, most of devicehealth commands rely on proper functioning of
smartctl, otherwise they lack the device health metrics.
For example, in case smartctl is missing, the commands:
ceph device scrape-daemon-health-metrics <who>
ceph device scrape-health-metrics [<devid>]
will not be able to scrape health metrics, and the command:
ceph device predict-life-expectancy <devid>
will not provide any meaningful output (since there are no metrics).
In short, when we scrape a device by its daemon (be it an OSD or a MON):
ceph device scrape-daemon-health-metrics <who>
The devicehealth module command eventually invokes a
block_device_get_metrics() call in either osd/OSD.cc or mon/Monitor.cc,
which wraps calls to both
block_device_run_smartctl() (spawns smartctl)
block_device_run_vendor_nvme() (spawns nvme)
in common/blkdev.cc.
Minimum version requirements:
'smartmontools' is the package name, which contains two utility
programs: 'smartd' and 'smartctl'. Ceph uses the latter.
Version 6.7 of smartctl first introduced the --json option (beta), which
allows to output the metrics in a JSON format. Since then a few
adjustments were made and the feature officially launched in smartctl
version 7.0.
Since we rely on the JSON format to process the metrics, we must have
smartmontools' smartctl version >= 7.
That said, we choose not to specify smartmontools version here on
purpose, since there might be a scenario where:
We specified smartmontools version to be >= 7.
smartmontools 7 is not available yet in rhel 8 / centos 8.
A user installs via rpm ceph-osd, for example.
smartmontools will not be installed (since version >= 7 is not available
in this repo yet).
Then the user upgrades to 8.3 (which should have smartmontools >= 7),
but smartmontools will not get upgraded (since it's not installed).
In the scenario where we do not specify a version, smartmontools 6.6
will be installed, but it will be upgraded to >= 7 when a user upgrades
(and if it's a fresh installation - version >= 7 would be installed
anyway).
nvme-cli does not have a minimum version.
We use 'Recommends' for both rpm and deb packages since we do not want
the installation to fail in case of conflicts. 'Recommends' weakens the
dependency to be installed in case possible, but ignores it in cases of
conflicts with other dependencies.
It's worth mentioning that smartmontools and nvme-cli dependencies exist
in ceph-container builds.
We add them here for the cases of bare metal installations.
In the future we will add a separate package (with smartmontools and
nvme-cli dependencies) that can be installed on any node (running
rbd-mirror, rgw, mds, mgr, etc.), in order to be able to collect the
health metrics of its devices and offer their life expectancy
prediction.
Fixes: https://tracker.ceph.com/issues/47479
Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
Ceph AIO installation with single/multiple node is not friendly for
loopback mount, especially always get deadlock issue during graceful
system reboot.
We already have `rbdmap.service` with graceful system reboot friendly as
below:
[Unit]
After=network-online.target
Before=remote-fs-pre.target
Wants=network-online.target remote-fs-pre.target
[Service]
ExecStart=/usr/bin/rbdmap map
ExecReload=/usr/bin/rbdmap map
ExecStop=/usr/bin/rbdmap unmap-all
This PR introduce:
- `ceph-mon.target`: Ensure startup after `network-online.target` and
before `remote-fs-pre.target`
- `ceph-*.target`: Ensure startup after `ceph-mon.target` and before
`remote-fs-pre.target`
- `rbdmap.service`: Once all `_netdev` get unmount by
`remote-fs.target`, ensure unmap all RBD BEFORE any Ceph components
under `ceph.target` get stopped during shutdown
The logic is concept proof by
<https://github.com/alvistack/ansible-role-ceph_common/tree/develop>;
also works as expected with Ceph + Kubernetes deployment by
<https://github.com/alvistack/ansible-collection-kubernetes/tree/develop>.
No more deadlock happened during graceful system reboot, both AIO
single/multiple no de with loopback mount.
Also see:
- <https://github.com/ceph/ceph/pull/36776>
- <https://github.com/etcd-io/etcd/pull/12259>
- <https://github.com/cri-o/cri-o/pull/4128>
- <https://github.com/kubernetes/release/pull/1504>
Fixes: https://tracker.ceph.com/issues/47528
Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
* refs/pull/37163/head:
mds: silence warning ‘MDSRank::fs_name’ will be initialized after [-Wreorder]
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
This was a remnant of the original implimentation for the image
dispatch spec. Now it more closely aligns with the object dispatch
spec.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
If the IO that attempts to acquire the exclusive lock fails,
any queued IO will not be retried leading to a deadlock.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
distinguish unfound + impossible to find, vs start some down OSDs to get
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>