Make grammar improvements (and correct a verb disagreement) in the
section "Placement Groups Never Get Clean" in
doc/rados/troubleshooting/troubleshooting-pg.rst.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add a second method of changing the value of osd_deep_scrub_interval to
remedy the condition indicated by the "PGs not deep-scrubbed in time"
warning.
This procedure was developed by Eugen Block, and is at the time of this
commit available on his blog at
https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/
Co-authored-by: Eugen Block <eblock@nde.ag>
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add a procedure to doc/rados/operations/health-warnings.rst that
explains how to remedy the "X PGs not deep-scrubbed in time" health
warning.
This procedure was developed by Eugen Block, and is at the time of this
commit available on his blog at
https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/
Co-authored-by: Eugen Block <eblock@nde.ag>
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add a link to the page about Messenger v2 to the end of
doc/rados/configuration/mon-lookup-dns.rst.
Fixes: https://tracker.ceph.com/issues/58752
Signed-off-by: Zac Dover <zac.dover@proton.me>
Based on tests performed at scale on a HDD based cluster, it was found
that scheduling with mClock was not optimal with multiple OSD shards. For
e.g., in the scaled cluster with multiple OSD node failures, the client
throughput was found to be inconsistent across test runs coupled with
multiple reported slow requests.
However, the same test with a single OSD shard and with multiple worker
threads yielded significantly better results in terms of consistency of
client and recovery throughput across multiple test runs.
For more details see https://tracker.ceph.com/issues/66289.
Therefore, as an interim measure until the issue with multiple OSD shards
(or multiple mClock queues per OSD) is investigated and fixed, the
following change to the default HDD OSD shard configuration is made:
- osd_op_num_shards_hdd = 1 (was 5)
- osd_op_num_threads_per_shard_hdd = 5 (was 1)
The other changes in this commit include:
- Doc change to the OSD and mClock config reference describing
this change.
- OSD troubleshooting entry on the procedure to change the shard
configuration for clusters affected by this issue running on older
releases.
- Add release note for this change.
Fixes: https://tracker.ceph.com/issues/66289
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
# Conflicts:
# doc/rados/troubleshooting/troubleshooting-osd.rst
Explain how to deal with "unfound objects" when restarting OSDs in a
cache-tiered environment.
Fixes: https://tracker.ceph.com/issues/44286
Signed-off-by: Zac Dover <zac.dover@proton.me>
Document how to manually pass the search domain to "mon_dns_srv_name" in
doc/rados/configuration/mon-lookup-dns.rst.
This commit is made in response to a request by Lander Duncan that was made on the [ceph-users] mailing list, and can be seen here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/F7V4CWLIYCAJ4JXI2JLNY6QPCFPR4SLA/
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
In this example librados2-devel only install C header files on fedora 40,
therefore I added libradospp-devel to the command to include C++ header files.
Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com>
Credit Prashant D for creating the stretch-mode workaround procedure for
retrieving the correct size of datacenters.
Follows: https://github.com/ceph/ceph/pull/58109
Signed-off-by: Zac Dover <zac.dover@proton.me>
Make minor changes to the "Debugging Slow Requests" section of
doc/rados/troubleshooting/troubleshooting-osd.rst in preparation
for an expansion of this section in response to a reqeust from Joel
Davidow.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Incorporate Anthony D'Atri's suggestions in
https://github.com/ceph/ceph/pull/58057
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add a method for defining a CRUSH rule that returns the actual value of
the total available size.
Fixes: https://tracker.ceph.com/issues/56650
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add an explanation that directs the reader to replace the "X" part of
the command "ceph tell mon.X mon_status" with the value specific to the
reader's Ceph cluster (which is (probably) not "X").
In the future, such replaceable strings in commands may be bounded by
angle brackets ("<" and ">").
This improvement to the documentation was suggested on the [ceph-users]
email list by Joel Davidow. This email, an absolute model of user
engagement with an upstream project, can be reviewed here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KF67F5TXFSSTPXV7EKL6JKLA5KZQDLDQ/
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add "pg-states" and "pg-concepts" to the left tree pane on
docs.ceph.com.
This commit has been made in response to a request from the upstream
made in https://pad.ceph.com/p/Report_Documentation_Bugs.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add an explanation of leader-peon conditions that obtain when the
cluster is in the "HEALTH_OK" state. Previously, the text discussed
these two monitor states only in the context of a health detail entry.
This improvement to the documentation was suggested on the [ceph-users]
email list by Joel Davidow. This email, an absolute model of user
engagement with an upstream project, can be reviewed here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KF67F5TXFSSTPXV7EKL6JKLA5KZQDLDQ/
I will list Joel Davidow here as the co-author for the sake of more
expediently getting this change into the documentation, but though he is
listed as the co-author, he is the true author.
Co-authored-by: Joel Davidow <jdavidow@nso.edu>
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add the following options to
doc/rados/configuration/network-config-ref.rst:
- public_network_interface
- cluster_network_interface
These additions were made in response to a request from Blaine Gardner.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add the command for stopping a monitor to the procedure that explains
how to inject a monmap into a monitor.
Zac of the future: cf. 05 Aug 2023.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Make the changes suggested by Anthony D'Atri in
https://github.com/ceph/ceph/pull/57022.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
As stated in the commit-message line, this corrects the typo "cepg" to
the correct string "ceph".
This typo was discovered by https://github.com/test-erik and this was
brought to our attention way back in
https://github.com/ceph/ceph/pull/50420.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Incorporate Anthony D'Atri's suggestions from
https://github.com/ceph/ceph/pull/57022 into the text in
doc/rados/troubleshooting/troubleshooting-pg.rst.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Remove references to dual-stack mode in
doc/rados/configuration/network-config-ref.rst and
doc/rados/configuration/msgr2.rst. This feature seems to have been
planned but never to have been completely implemented.
See the tracker issue listed below for an email exchange detailing the
confusion caused by the presence in the documentation of this
now-removed information.
Fixes: https://tracker.ceph.com/issues/65631
Signed-off-by: Zac Dover <zac.dover@proton.me>
Incorporate the material in /doc/rados/operations/pg-repair into
/doc/rados/troubleshooting/troubleshooting-pg. Remove
/doc/rados/operations/pg-repair from the documentation. Redirect all
links to the old location to the new location.
Signed-off-by: Zac Dover <zac.dover@proton.me>
Add "ceph osd crush rename bucket" command. This commit is made in
response to a request from Michele Giacomoli.
Fixes: https://tracker.ceph.com/issues/65599
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
* Fix incorrect syntax
* Use underscores for config options, like other ceph docs did
* Fix incorrect statement that crush_location_hook adds fiels; it replaces
* Explain `root=default host=HOSTNAME` is not set if `crush_location` is given
* Remove duplication across sections
* Point out that `root=default` is important
Signed-off-by: Niklas Hambüchen <mail@nh2.me>