Commit Graph

137794 Commits

Author SHA1 Message Date
Samuel Just
dc3cc16691 osd/scheduler: remove unused PGOpItem::maybe_get_mosd_op
Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
d52cadbe61 osd/scheduler: remove OpQueueable::get_order_locker() and supporting machinery
Apparently unused.

Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
04e470d12c osd/scheduler: remove OpQueueable::get_op_type() and supporting machinery
Apparently unused.

Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
8302349296 PeeringState::clamp_recovery_priority: use std::clamp
Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Sridhar Seshasayee
d29548aca8 doc: Modify mClock configuration documentation to reflect new cost model
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:22:00 +05:30
Sridhar Seshasayee
b6a442c7cc osd: Retain overridden mClock recovery settings across osd restarts
Fix an issue where an overridden mClock recovery setting (set prior to
an osd restart) could be lost after an osd restart.

For e.g., consider that prior to an osd restart, the option
'osd_max_backfill' was successfully set to a value different from the
mClock default. If the osd was restarted for some reason, the
boot-up sequence was incorrectly resetting the backfill value to the
mclock default within the async local/remote reservers. This fix
ensures that no change is made if the current overriden value is
different from the mClock default.

Modify an existing standalone test to verify that the local and remote
async reservers are updated to the desired number of backfills under
normal conditions and also across osd restarts.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:22:00 +05:30
Sridhar Seshasayee
f93fd394b1 osd: Set default max active recovery and backfill limits for mClock
Client ops are sensitive to the recovery load and must be carefully
set for osds whose underlying device is HDD. Tests revealed that
recoveries with osd_max_backfills = 10 and osd_recovery_max_active_hdd = 5
were still aggressive and overwhelmed client ops. The built-in defaults
for mClock are now set to:

    1) osd_recovery_max_active_hdd = 3
    2) osd_recovery_max_active_ssd = 10
    3) osd_max_backfills = 3

The above may be modified if necessary by setting
osd_mclock_override_recovery_settings option.

Fixes: https://tracker.ceph.com/issues/58529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:21:59 +05:30
Samuel Just
0754de111a osd/scheduler/mClockScheduler: make is_rotational const
Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:21:59 +05:30
Samuel Just
589e1d9c24 osd/scheduler/mClockScheduler: simplify profile handling
Previously, setting default configs from the configured profile was
split across:
- enable_mclock_profile_settings
- set_mclock_profile - sets mclock_profile class member
- set_*_allocations - updates client_allocs class member
- set_profile_config - sets profile based on client_allocs class member

This made tracing the effect of changing the profile pretty challenging
due passing state through class member variables.

Instead, define a simple profile_t with three constexpr values
corresponding to the three profiles and handle it all in a single
set_config_defaults_from_profile() method.

Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:21:59 +05:30
Sridhar Seshasayee
514cb598fb osd: Modify mClock scheduler's cost model to represent cost in bytes
The mClock scheduler's cost model for HDDs/SSDs is modified and now
represents the cost of an IO in terms of bytes.

The cost parameters, namely, osd_mclock_cost_per_io_usec_[hdd|ssd]
and osd_mclock_cost_per_byte_usec_[hdd|ssd] which represent the cost
of an IO in secs are inaccurate and therefore removed.

The new model considers the following aspects of an osd to calculate
the cost of an IO:

 - osd_mclock_max_capacity_iops_[hdd|ssd] (existing option)
   The measured random write IOPS at 4 KiB block size. This is
   measured during OSD boot-up using OSD bench tool.
 - osd_mclock_max_sequential_bandwidth_[hdd|ssd] (new config option)
   The maximum sequential bandwidth of of the underlying device.
   For HDDs, 150 MiB/s is considered, and for SSDs 750 MiB/s is
   considered in the cost calculation.

The following important changes are made to arrive at the overall
cost of an IO,

1. Represent QoS reservation and limit config parameter as proportion:
The reservation and limit parameters are now set in terms of a
proportion of the OSD's max IOPS capacity. The earlier representation
was in terms of IOPS per OSD shard which required the user to perform
calculations before setting the parameter. Representing the
reservation and limit in terms of proportions is much more intuitive
and simpler for a user.

2. Cost per IO Calculation:
Using the above config options, osd_bandwidth_cost_per_io for the osd is
calculated and set. It is the ratio of the max sequential bandwidth and
the max random write iops of the osd. It is a constant and represents the
base cost of an IO in terms of bytes. This is added to the actual size of
the IO(in bytes) to represent the overall cost of the IO operation.See
mClockScheduler::calc_scaled_cost().

3. Cost calculation in Bytes:
The settings for reservation and limit in terms a fraction of the OSD's
maximum IOPS capacity is converted to Bytes/sec before updating the
mClock server's ClientInfo structure. This is done for each OSD op shard
using osd_bandwidth_capacity_per_shard shown below:

    (res|lim)  = (IOPS proportion) * osd_bandwidth_capacity_per_shard
    (Bytes/sec)   (unitless)             (bytes/sec)

The above result is updated within the mClock server's ClientInfo
structure for different op_scheduler_class operations. See
mClockScheduler::ClientRegistry::update_from_config().

The overall cost of an IO operation (in secs) is finally determined
during the tag calculations performed in the mClock server. See
crimson::dmclock::RequestTag::tag_calc() for more details.

4. Profile Allocations:
Optimize mClock profile allocations due to the change in the cost model
and lower recovery cost.

5. Modify standalone tests to reflect the change in the QoS config
parameter representation of reservation and limit options.

Fixes: https://tracker.ceph.com/issues/58529
Fixes: https://tracker.ceph.com/issues/59080
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:21:59 +05:30
Sridhar Seshasayee
e3ccb80bbc osd: update PGRecovery queue item cost to reflect object size
Previously, we used a static value of osd_recovery_cost (20M
by default) for PGRecovery. For pools with relatively small
objects, this causes mclock to backfill very very slowly as
20M massively overestimates the amount of IO each recovery
queue operation requires. Instead, add a cost_per_object
parameter to OSDService::awaiting_throttle and set it to the
average object size in the PG being queued.

Fixes: https://tracker.ceph.com/issues/58606
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:21:59 +05:30
Sridhar Seshasayee
a8832bbd24 osd: update OSDService::queue_recovery_context to specify cost
Previously, we always queued this with cost osd_recovery_cost which
defaults to 20M. With mclock, this caused these items to be delayed
heavily. Instead, base the cost on the operation queued.

Fixes: https://tracker.ceph.com/issues/58606
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:21:59 +05:30
Sridhar Seshasayee
7dc3024355 osd/osd_types: use appropriate cost value for PullOp
See included comments -- previous values did not account for object
size.  This causes problems for mclock which is much more strict
in how it interprets costs.

Fixes: https://tracker.ceph.com/issues/58607
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:21:59 +05:30
Sridhar Seshasayee
ee26df6d56 osd/osd_types: use appropriate cost value for PushReplyOp
See included comments -- previous values did not account for object
size.  This causes problems for mclock which is much more strict
in how it interprets costs.

Fixes: https://tracker.ceph.com/issues/58529
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:21:59 +05:30
zdover23
e87b2e0d60
Merge pull request #51322 from zdover23/wip-doc-2023-05-03-rados-operations-stretch-mode-stretch-mode-issues
doc/rados: stretch-mode: stretch cluster issues

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2023-05-07 16:19:11 +10:00
Laura Flores
ecfeb18192
Merge pull request #50983 from ljflores/wip-ceph-release-notes 2023-05-06 23:38:26 -05:00
Anthony D'Atri
ca5f8473d6
Merge pull request #51370 from anthonyeleven/anthonyeleven/correct-space-amp
doc/rados/configuration: correct space amp in bluestore-config-ref.rst
2023-05-06 11:28:53 -04:00
Svelar
b7747a40e3
Merge pull request #50392 from Svelar/seastore-cephadm
ceph-volume: assign seastore as object store backend when deplying crimson-osd using LVM with cephadm
2023-05-06 14:22:12 +08:00
Anthony D'Atri
79256c1213 doc/rados/configuration: correct space amp in bluestore-config-ref.rst
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
2023-05-05 20:43:15 -04:00
Anthony D'Atri
55de546717
Merge pull request #51359 from zdover23/wip-doc-2023-05-05-cephfs-troubleshooting-post-upgrade-inaccessible-filesystems
doc/cephfs: repairing inaccessible FSes
2023-05-05 20:10:36 -04:00
Zac Dover
2430127c6e doc/cephfs: repairing inaccessible FSes
Add a procedure to doc/cephfs/troubleshooting.rst that explains how to
restore access to FileSystems that became inaccessible after
post-Nautilus upgrades. The procedure included here was written by Harry
G Coin, and merely lightly edited by me. I include him here as a
"co-author", but it should be noted that he did the heavy lifting on
this.

See the email thread here for more context:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/HS5FD3QFR77NAKJ43M2T5ZC25UYXFLNW/

Co-authored-by: Harry G Coin <hgcoin@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2023-05-06 09:40:05 +10:00
Laura Flores
c20d876c69 script: improve author scraping on cherry picks
Signed-off-by: Laura Flores <lflores@redhat.com>
2023-05-05 18:31:53 +00:00
Laura Flores
c18a75a43e script: handle a corner case for author in cherry-picked PRs
Signed-off-by: Laura Flores <lflores@redhat.com>
2023-05-05 18:06:30 +00:00
Laura Flores
2189038722 script: fix author and title for cherry picks
Signed-off-by: Laura Flores <lflores@redhat.com>
2023-05-05 17:44:23 +00:00
Laura Flores
f6f6671b8e
Merge pull request #51146 from ceph/wip-yuriw-release-process-main
docs: added note for the TAG option
2023-05-05 12:06:49 -05:00
Anthony D'Atri
170f74d2a1
Merge pull request #51348 from jamesorlakin/hotfix/doc-weightset-osd-tree-command
doc: Use `ceph osd crush tree` command to display weight set weights
2023-05-04 16:53:33 -04:00
Samuel Just
19b6c9bfef
Merge pull request #51333 from Matan-B/wip-matanb-c-objclass-compile
crimson/osd/objclass: Fix compilation warning

Reviewed-by: Samuel Just <sjust@redhat.com>
2023-05-04 11:18:54 -07:00
James Lakin
15c3d72a43 doc: Use ceph osd crush tree command to display weight set weights
The previous `ceph osd tree` doesn't show pool-defined weight-sets as the above documentation suggests.

Signed-off-by: James Lakin <james@jameslakin.co.uk>
2023-05-04 18:48:22 +01:00
Casey Bodley
be31d3fb55
Merge pull request #50507 from cbodley/wip-rgw-api-zero
rgw/rest: add 'zero' rest api

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
2023-05-04 13:09:46 -04:00
Nizamudeen A
55d3f5cfcd
Merge pull request #50183 from rhcs-dashboard/edit-ceph-authx-user
mgr/dashboard: Edit ceph authx users

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2023-05-04 20:59:45 +05:30
Venky Shankar
e398efcb32 Merge PR #51224 into main
* refs/pull/51224/head:
	doc: add a note for minimum compatible python version and supported distros
	tools/cephfs/top/CMakeList.txt: check the minimum compatible python version for cephfs-top

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-05-04 18:23:20 +05:30
Venky Shankar
c6c1f3366c Merge PR #51281 into main
* refs/pull/51281/head:
	dokan: handle std::stoul exceptions

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2023-05-04 18:22:08 +05:30
Soumya Koduri
9d61a2ee45
Merge pull request #50676 from soumyakoduri/wip-skoduri-archive
rgw/archive: Disable logging for archive zone

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
2023-05-04 16:34:05 +05:30
Soumya Koduri
0df2e7a41a
Merge pull request #51192 from soumyakoduri/wip-skoduri-cloudtier-sync
rgw/cloud-transition: Handle cloud-tiered objects in a multisite environment

Reviewed-by:  Casey Bodley <cbodley@redhat.com>
Reviewed-by:  Matt Benjamin <mbenjamin@redhat.com>
2023-05-04 16:16:37 +05:30
Pedro Gonzalez Gomez
8177a748bd mgr/dashboard: Edit ceph authx users
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
2023-05-04 11:44:05 +02:00
Aashish Sharma
ae02dd40be
Merge pull request #50643 from rhcs-dashboard/dashboard-edit-zone
mgr/dashboard: add support for editing RGW zone

Reviewed-by: Nizamudeen A <nia@redhat.com>
2023-05-04 11:48:42 +05:30
zdover23
b42a7305ba
Merge pull request #51292 from zdover23/wip-doc-2023-04-30-rados-operations-stretch-mode-limitations
doc/rados: edit stretch-mode.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2023-05-04 11:08:53 +10:00
Yuri Weinstein
50c7d6bee4
Merge pull request #50861 from weixinwei/master
osd: avoid watcher remains after "rados watch" is interrupted

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2023-05-03 15:06:33 -07:00
Adam King
2f3afa76ee
Merge pull request #51226 from jsoref/spelling-orchestrator
orchestrator: Fix spelling

Reviewed-by: Adam King<adking@redhat.com>
2023-05-03 17:31:04 -04:00
Adam King
d748daba2f
Merge pull request #50976 from phlogistonjohn/jjm-issue59270-inbuf
pybind/mgr: improve error when inbuf is given to commands that don't use it

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
2023-05-03 17:29:10 -04:00
Adam King
d2a8edfba8
Merge pull request #50868 from rhcs-dashboard/update-monitoring-stack
mgr/cephadm: update monitoring stack versions 

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2023-05-03 17:27:26 -04:00
Adam King
a98b42b4e2
Merge pull request #50613 from adk3798/grafana-anonymous
mgr/cephadm: allow configuring anonymous access for grafana

Reviewed-by: Redouane Kachach <rkachach@redhat.com>
2023-05-03 17:24:20 -04:00
Adam King
9f3d21e020
Merge pull request #47199 from adk3798/osp-nfs-ha
mgr/cephadm: support for nfs backed by VIP

Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Redouane Kachach <rkachach@redhat.com>
2023-05-03 17:18:27 -04:00
Yuri Weinstein
d29d6e964d
Merge pull request #50418 from NitzanMordhai/wip-nitzan-blocklist-addr-valid-command
pybind/argparse: blocklist ip validation


Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2023-05-03 12:32:58 -07:00
Yuri Weinstein
ef3554214a
Merge pull request #50344 from rzarzynski/wip-msg-random-nonces
msg: always generate random nonce; don't try to reuse PID

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Adam King adking@redhat.com
2023-05-03 12:32:22 -07:00
Yuri Weinstein
83bb1d7380
Merge pull request #49885 from aclamk/wip-aclamk-bs-improve-fragm-score
BlueStore: Improve fragmentation score metric

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
2023-05-03 12:31:15 -07:00
avanthakkar
bcc92adb96 mgr/dashboard: add support for editing RGW zone
Fixes: https://tracker.ceph.com/issues/59328
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Co-authored-by: Aashish Sharma <aasharma@redhat.com>
2023-05-03 20:45:00 +05:30
Matan
0416d783dd
Merge pull request #51312 from Matan-B/wip-matanb-c-message-con
crimson/osd/ops_executer: Fix usage of Message's connection

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Radosław Zarzyński <rzarzyns@redhat.com>
2023-05-03 18:14:35 +03:00
Zac Dover
6c1baffb85 doc/rados: stretch-mode: stretch cluster issues
Edit "Stretch Cluster Issues", which might better be called "Netsplits"
or "Recognizing Netsplits".

Signed-off-by: Zac Dover <zac.dover@proton.me>
2023-05-03 23:23:20 +10:00
Matan Breizman
99a8d19b05 crimson/osd/objclass: Compilation warning
```
In copy constructor ‘ceph::buffer::v15_2_0::list::list(const ceph::buffer::v15_2_0::list&)’,
    inlined from ‘OSDOp::OSDOp(const OSDOp&)’ at ../src/osd/osd_types.h:4081:8,
    inlined from ‘int cls_cxx_snap_revert(cls_method_context_t, snapid_t)’ at ../src/crimson/osd/objclass.cc:279:37:
../src/include/buffer.h:945:20: warning: ‘op.OSDOp::indata.ceph::buffer::v15_2_0::list::_len’ is used uninitialized [-Wuninitialized]
  945 |         _len(other._len),
      |              ~~~~~~^~~~
../src/crimson/osd/objclass.cc: In function ‘int cls_cxx_snap_revert(cls_method_context_t, snapid_t)’:
../src/crimson/osd/objclass.cc:279:9: note: ‘op’ declared here
  279 |   OSDOp op{op = CEPH_OSD_OP_ROLLBACK};
      |
```

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-05-03 13:10:58 +00:00