Commit Graph

1853 Commits

Author SHA1 Message Date
Zac Dover
2baa027b13 doc/rados: edit "Placement Groups Never Get Clean"
Make grammar improvements (and correct a verb disagreement) in the
section "Placement Groups Never Get Clean" in
doc/rados/troubleshooting/troubleshooting-pg.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-09-29 16:21:50 +10:00
Sridhar Seshasayee
aa19cd30da
Merge pull request #58509 from sseshasa/wip-hdd-osd-shard-params-for-mclock
common/options: Change HDD OSD shard configuration defaults for mClock

Reviewed-by: Mark Nelson  <mark.a.nelson@gmail.com>
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
2024-09-25 12:34:41 +05:30
Zac Dover
a159821ddf doc/rados: add confval directives to health-checks
Add confval directives to doc/rados/operations/health-checks.rst, as
requested by Anthony D'Atri here: https://github.com/ceph/ceph/pull/59635#pullrequestreview-2286205705

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-09-18 21:36:24 +10:00
Zac Dover
f57e99e173 doc/rados: add osd_deep_scrub_interval setting operation
Add a second method of changing the value of osd_deep_scrub_interval to
remedy the condition indicated by the "PGs not deep-scrubbed in time"
warning.

This procedure was developed by Eugen Block, and is at the time of this
commit available on his blog at
https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/

Co-authored-by: Eugen Block <eblock@nde.ag>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-09-15 21:33:55 +10:00
Zac Dover
d620a51c30 doc/rados: add "pgs not deep scrubbed in time" info
Add a procedure to doc/rados/operations/health-warnings.rst that
explains how to remedy the "X PGs not deep-scrubbed in time" health
warning.

This procedure was developed by Eugen Block, and is at the time of this
commit available on his blog at
https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/

Co-authored-by: Eugen Block <eblock@nde.ag>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-09-06 22:43:59 +10:00
Zac Dover
81f9d064f7 doc/rados: add link to messenger v2 info in mon-lookup-dns.rst
Add a link to the page about Messenger v2 to the end of
doc/rados/configuration/mon-lookup-dns.rst.

Fixes: https://tracker.ceph.com/issues/58752

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-09-05 17:59:15 +10:00
Anthony D'Atri
2aa82539ed doc/rados/operations: Improve health-checks.rst
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
2024-09-03 22:10:27 +10:00
Sridhar Seshasayee
0d81e72137 common/options: Change HDD OSD shard configuration defaults for mClock
Based on tests performed at scale on a HDD based cluster, it was found
that scheduling with mClock was not optimal with multiple OSD shards. For
e.g., in the scaled cluster with multiple OSD node failures, the client
throughput was found to be inconsistent across test runs coupled with
multiple reported slow requests.

However, the same test with a single OSD shard and with multiple worker
threads yielded significantly better results in terms of consistency of
client and recovery throughput across multiple test runs.

For more details see https://tracker.ceph.com/issues/66289.

Therefore, as an interim measure until the issue with multiple OSD shards
(or multiple mClock queues per OSD) is investigated and fixed, the
following change to the default HDD OSD shard configuration is made:

 - osd_op_num_shards_hdd = 1 (was 5)
 - osd_op_num_threads_per_shard_hdd = 5 (was 1)

The other changes in this commit include:
 - Doc change to the OSD and mClock config reference describing
   this change.
 - OSD troubleshooting entry on the procedure to change the shard
   configuration for clusters affected by this issue running on older
   releases.
 - Add release note for this change.

Fixes: https://tracker.ceph.com/issues/66289
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

# Conflicts:
#	doc/rados/troubleshooting/troubleshooting-osd.rst
2024-09-03 11:09:08 +05:30
Zac Dover
f01d7a8d5b doc/rados: document unfound object cache-tiering scenario
Explain how to deal with "unfound objects" when restarting OSDs in a
cache-tiered environment.

Fixes: https://tracker.ceph.com/issues/44286

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-08-20 22:54:23 +10:00
Anthony D'Atri
fda2db5ac7 doc: Harmonize 'mountpoint'
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
2024-08-18 11:23:39 -04:00
Zac Dover
1ca89e6ca3 doc/glossary: add "flapping OSD"
Add an entry for "Flapping OSD" to the glossary.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-08-15 04:08:14 +10:00
Kamoltat Sirivadhna
aa1d8cf4fa docs/rados/operations/stretch-mode: warn device class is not supported
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
2024-08-07 19:20:41 +00:00
Kamoltat (Junior) Sirivadhna
6a0d503a59
Merge pull request #56233 from kamoltat/wip-ksirivad-fix-64802
RADOS: Generalize stretch mode pg temp handling to be usable without stretch mode
Samuel Just <sjust@redhat.com>
2024-08-07 09:45:54 -04:00
Anthony D'Atri
62562ec65e doc/rados/operations: remove vanity cluster name reference from crush-map.rst
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
2024-07-30 20:45:11 -04:00
Kamoltat
fb0011a692 doc/rados/operations/pools.rst: Added docs for stretch pool set|unset|show
Fixes: https://tracker.ceph.com/issues/64802

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2024-07-17 22:12:10 +00:00
Adam Kupczyk
103cd8e78b
Merge pull request #57722 from sajibreadd/wip-62500
os/bluestore: Warning added for slow operations and stalled read
2024-07-17 11:57:45 +02:00
Zac Dover
98938a0312 doc/rados: document manually passing search domain
Document how to manually pass the search domain to "mon_dns_srv_name" in
doc/rados/configuration/mon-lookup-dns.rst.

This commit is made in response to a request by Lander Duncan that was made on the [ceph-users] mailing list, and can be seen here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/F7V4CWLIYCAJ4JXI2JLNY6QPCFPR4SLA/

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-07-04 03:52:55 +10:00
Pere Diaz Bou
7e87441601 doc/rados: update how to install c++ header files
In this example librados2-devel only install C header files on fedora 40,
therefore I added libradospp-devel to the command to include C++ header files.

Signed-off-by: Pere Diaz Bou <pere-altea@hotmail.com>
2024-06-26 15:57:47 +02:00
sajibreadd
73b80a9a2c Warning added for slow operations and stalled read in BlueStore. User can control how much time the warning should persist after last occurence and maximum number of operations as a threshold will be considered for the warning.
Fixes: https://tracker.ceph.com/issues/62500
Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>
2024-06-26 14:47:03 +06:00
Zac Dover
cf5ce305b2
Merge pull request #58226 from zdover23/wip-doc-2024-06-24-rados-troubleshooting-osd-debugging-slow-requests
doc/rados: edit troubleshooting-osd.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2024-06-26 15:53:07 +10:00
Zac Dover
2e777cb4f8 doc/rados: credit Prashant for a procedure
Credit Prashant D for creating the stretch-mode workaround procedure for
retrieving the correct size of datacenters.

Follows: https://github.com/ceph/ceph/pull/58109

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-06-25 17:19:49 +10:00
Zac Dover
8b211b9c7f doc/rados: edit troubleshooting-osd.rst
Make minor changes to the "Debugging Slow Requests" section of
doc/rados/troubleshooting/troubleshooting-osd.rst in preparation
for an expansion of this section in response to a reqeust from Joel
Davidow.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-06-24 20:32:30 +10:00
Patrick Donnelly
0efe88dd5d
Merge PR #58121 into main
* refs/pull/58121/head:
	doc: add documentation for `ceph auth rotate`
	PendingReleaseNotes: add note for new `auth rotate`
	qa: test `auth rotate`
	mon/AuthMonitor: add `ceph auth rotate` command

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2024-06-23 14:32:00 -04:00
Zac Dover
e46e6cd30a
Merge pull request #58156 from zdover23/wip-doc-2024-06-20-rados-troubleshooting-mon
doc/rados: followup to PR#58057

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2024-06-22 16:21:20 +10:00
Patrick Donnelly
b871bbebe0
doc: add documentation for ceph auth rotate
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-06-20 21:31:13 -04:00
Zac Dover
2e999a26ef doc/rados: followup to PR#58057
Incorporate Anthony D'Atri's suggestions in
https://github.com/ceph/ceph/pull/58057

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-06-20 21:43:53 +10:00
Zac Dover
007385a3ef doc/rados: add stretch_rule workaround
Add a method for defining a CRUSH rule that returns the actual value of
the total available size.

Fixes: https://tracker.ceph.com/issues/56650

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-06-18 15:22:41 +10:00
Zac Dover
d071ad2575 doc/rados: explain replaceable parts of command
Add an explanation that directs the reader to replace the "X" part of
the command "ceph tell mon.X mon_status" with the value specific to the
reader's Ceph cluster (which is (probably) not "X").

In the future, such replaceable strings in commands may be bounded by
angle brackets ("<" and ">").

This improvement to the documentation was suggested on the [ceph-users]
email list by Joel Davidow. This email, an absolute model of user
engagement with an upstream project, can be reviewed here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KF67F5TXFSSTPXV7EKL6JKLA5KZQDLDQ/

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-06-15 21:55:18 +10:00
Zac Dover
0629f47faf doc/rados: add pg-states and pg-concepts to tree
Add "pg-states" and "pg-concepts" to the left tree pane on
docs.ceph.com.

This commit has been made in response to a request from the upstream
made in https://pad.ceph.com/p/Report_Documentation_Bugs.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-06-13 21:29:47 +10:00
Zac Dover
6fb9a5ef81 doc/rados: improve leader/peon monitor explanation
Add an explanation of leader-peon conditions that obtain when the
cluster is in the "HEALTH_OK" state. Previously, the text discussed
these two monitor states only in the context of a health detail entry.

This improvement to the documentation was suggested on the [ceph-users]
email list by Joel Davidow. This email, an absolute model of user
engagement with an upstream project, can be reviewed here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KF67F5TXFSSTPXV7EKL6JKLA5KZQDLDQ/

I will list Joel Davidow here as the co-author for the sake of more
expediently getting this change into the documentation, but though he is
listed as the co-author, he is the true author.

Co-authored-by: Joel Davidow <jdavidow@nso.edu>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-06-11 08:16:34 +10:00
Zac Dover
33bc1a0241 doc/rados: add options to network config ref
Add the following options to
doc/rados/configuration/network-config-ref.rst:

- public_network_interface
- cluster_network_interface

These additions were made in response to a request from Blaine Gardner.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-06-05 14:30:01 +10:00
Zac Dover
c032188d66 doc/rados: add stop monitor command
Add the command for stopping a monitor to the procedure that explains
how to inject a monmap into a monitor.

Zac of the future: cf. 05 Aug 2023.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-06-03 10:23:43 +10:00
Patrick Donnelly
8664fe9c06
doc: document new --output-file switch
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-05-10 20:59:38 -04:00
Zac Dover
93898d8083 doc/rados: PR#57022 unfinished business
Make the changes suggested by Anthony D'Atri in
https://github.com/ceph/ceph/pull/57022.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-05-03 15:32:28 +10:00
Zac Dover
ddef880947 doc/rados: s/cepgsqlite/cephsqlite/
As stated in the commit-message line, this corrects the typo "cepg" to
the correct string "ceph".

This typo was discovered by https://github.com/test-erik and this was
brought to our attention way back in
https://github.com/ceph/ceph/pull/50420.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-05-02 02:46:49 +10:00
Piotr Parczewski
048f6e539b
doc/rados/operations: rephrase OSDs peering
Signed-off-by: Piotr Parczewski <piotr@stackhpc.com>
2024-04-30 12:56:44 +02:00
Anthony D'Atri
88eddb27f6
Merge pull request #57071 from zdover23/wip-doc-2024-04-24-rados-troubleshooting-pg
doc/rados: improve t-shooting pg
2024-04-24 09:21:27 -04:00
Zac Dover
44085c0dc9 doc/rados: improve t-shooting pg
Incorporate Anthony D'Atri's suggestions from
https://github.com/ceph/ceph/pull/57022 into the text in
doc/rados/troubleshooting/troubleshooting-pg.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-04-24 21:48:20 +10:00
Zac Dover
c65d2056c2 doc/rados: remove dual-stack docs
Remove references to dual-stack mode in
doc/rados/configuration/network-config-ref.rst and
doc/rados/configuration/msgr2.rst. This feature seems to have been
planned but never to have been completely implemented.

See the tracker issue listed below for an email exchange detailing the
confusion caused by the presence in the documentation of this
now-removed information.

Fixes: https://tracker.ceph.com/issues/65631

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-04-23 16:37:27 +10:00
Zac Dover
56e81df3ae
Merge pull request #57022 from zdover23/wip-doc-2024-04-22-rados-operations-pg-troubleshooting
doc/rados: remove redundant pg repair commands

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2024-04-23 00:59:35 +10:00
Pierre Riteau
23d2740241 doc/rados: fix outdated value for ms_bind_port_max
The highest port number used by OSD or MDS daemons was increased from
7300 to 7568 in [1] but the documentation still refers to 7300 in
multiple locations.

[1] https://github.com/ceph/ceph/pull/42210

Fixes: https://tracker.ceph.com/issues/65609
Signed-off-by: Pierre Riteau <pierre@stackhpc.com>
2024-04-22 11:28:53 +02:00
Zac Dover
3c2e8d35a9 doc/rados: remove redundant pg repair commands
Incorporate the material in /doc/rados/operations/pg-repair into
/doc/rados/troubleshooting/troubleshooting-pg. Remove
/doc/rados/operations/pg-repair from the documentation. Redirect all
links to the old location to the new location.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-04-22 17:10:06 +10:00
Zac Dover
1030b572fa doc/rados: add bucket rename command
Add "ceph osd crush rename bucket" command. This commit is made in
response to a request from Michele Giacomoli.

Fixes: https://tracker.ceph.com/issues/65599

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-04-22 16:16:54 +10:00
Niklas Hambüchen
d91e75e1e9 doc/rados/operations: Improve crush_location docs
* Fix incorrect syntax
* Use underscores for config options, like other ceph docs did
* Fix incorrect statement that crush_location_hook adds fiels; it replaces
* Explain `root=default host=HOSTNAME` is not set if `crush_location` is given
* Remove duplication across sections
* Point out that `root=default` is important

Signed-off-by: Niklas Hambüchen <mail@nh2.me>
2024-03-30 23:11:17 +01:00
Taha Jahangir
3cd39e3582
docs/rados: remove incorrect ceph command
The removed line was (incorrectly changed) output of the previous command.

Signed-off-by: Taha Jahangir <mtjahangir@gmail.com>
2024-03-25 13:32:12 +03:30
Zac Dover
063fb89b21
Merge pull request #56287 from rzarzynski/wip-ec-profile-set-paranoid-on-override
mon, doc: overriding ec profile requires --yes-i-really-mean-it

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2024-03-25 06:57:30 +10:00
Zac Dover
5d300db825 doc/start: link to mon map command
Link to the "ceph mon stat" command when "Intro to Ceph" document first
mentions Monitor Maps.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-03-22 08:14:57 +10:00
Casey Bodley
0c72fcc26a
Merge pull request #56008 from kchheda3/wip-notification-subsys
rgw/notification: add rgw notification specific debug log subsystem

Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>
2024-03-21 15:08:35 +00:00
kchheda3
429967917b rgw/notification: add rgw notification specific debug log subsystem.
decorate the events with event details while logging.

Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
2024-03-19 13:54:35 -04:00
Radoslaw Zarzynski
629ba7bd34 mon, doc: overriding ec profile requires --yes-i-really-mean-it
This is per https://tracker.ceph.com/issues/64333#note-17 describing
driving factors of a catastrophic cluster failure.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2024-03-19 14:12:38 +00:00