Commit Graph

137854 Commits

Author SHA1 Message Date
Aishwarya Mathuria
a7c0029ecc qa/tasks: Change default mClock profile to high_recovery_ops
With the new mClock default profile, tests were failing with "Exiting scrub checking -- not all pgs scrubbed" due to slower scrubs.
Changing the default profile to high_recovery_ops for testing purposes will fix this issue.

Fixes: https://tracker.ceph.com/issues/61228
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2023-05-18 09:32:20 +05:30
Yingxin
785f3ec474
Merge pull request #47749 from xxhdx1985126/wip-intra-fixedkvbtree-pointers-2
crimson/os/seastore/btree: link fixedkvbtree's nodes and logical extents with forward and backward pointers, and drop the pin_set

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
2023-05-09 16:37:41 +08:00
Yuval Lifshitz
1fd8e52113
Merge pull request #51308 from jzhu116-bloomberg/wip-59592
rgw/notification: remove non x-amz-meta-* attributes from bucket notifications
2023-05-09 10:34:36 +03:00
Xuehan Xu
33b56a04d5 crimson/tools/store_nbd: read logical extents via
TransactionManager::read_pin()

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:57:45 +00:00
Xuehan Xu
62974a6589 crimson/os/seastore/cache: add comment about backref_extent_entry_t
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:57:45 +00:00
Xuehan Xu
3c4f8c7613 test/crimson/seastore: complement lba test with logical extents
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:57:45 +00:00
Xuehan Xu
302bc3c2d9 test/crimson/seastore: check intra-fixedkv-btree parent->child trackers during unittests
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:57:45 +00:00
Xuehan Xu
4a3dfc0f63 crimson/os/seastore/btree: drop btree_pin_set_t
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:57:45 +00:00
Xuehan Xu
89c2d0b3af crimson/os/seastore/transaction_manager: follow leaf<->logical extent pointers to read extent
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:57:43 +00:00
Xuehan Xu
cce850d756 crimson/os/seastore/lba_manager: link lba leaf nodes with logical extents by pointers
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:56:59 +00:00
Xuehan Xu
55e1924e38 crimson/os/seastore/btree: "templatize" btree leaf node to distinguish leaf nodes with(out) children
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:56:59 +00:00
Xuehan Xu
4d9b60e750 crimson/os/seastore/btree: link fixed-kv-btree and root_block with pointers
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:56:59 +00:00
Xuehan Xu
25b001db29 crimson/os/seastore: more debug logs
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:56:59 +00:00
Xuehan Xu
45440fadd2 crimson/os/seastore/backref_manager: retrieve live backref extents throught the backref tree
After involving intra-fixed-kv-btree parent-child pointers, we need to keep the
invariant that it's only when extents are not in transactions' read_set that
we can directly query cache with inspecting the transaction

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:56:59 +00:00
Xuehan Xu
c29051c4c7 crimson/os/seastore/btree: avoid searching transactions' read_set when retrieving btree nodes
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:56:59 +00:00
Xuehan Xu
7c3305f014 crimson/os/seastore/btree: search fixed-kv-btree by parent<->child pointers
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:55:53 +00:00
Xuehan Xu
686d120653 crimson/os/seastore/cache: invalidate out-dated extent when initiating Cache
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:55:53 +00:00
Xuehan Xu
1b4c591ef5 crimson/os/seastore/cached_extent: improve the representation of "has_been_invalidated"
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:55:53 +00:00
Xuehan Xu
a86c7bd651 crimson/os/seastore/btree: don't go to leaf nodes when updating internal mappings
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:55:53 +00:00
Xuehan Xu
71051f997f crimson/os/seastore/btree: introduce parent<->child pointers for fixed-kv-btree nodes
maintain correct parent<->child pointers when modifying the btree

Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
2023-05-09 05:55:53 +00:00
Yingxin
ca164fda37
Merge pull request #51355 from aravind-wdc/wip-crimson-zbd
crimson/os/seastore: enable SMR HDD

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
2023-05-09 11:29:54 +08:00
zdover23
045a4e8484
Merge pull request #51392 from parth-gr/rgw-mutisite-ceph-doc
doc: update multisite doc

Reviewed-by:  Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-by:  Casey Bodley <cbodley@redhat.com>
Reviewed-by:  Zac Dover <zac.dover@proton.me>
2023-05-09 12:37:40 +10:00
Kamoltat Sirivadhna
78a43309b2
Merge pull request #50857 from kamoltat/wip-ksirivad-iswriteable
mon/Monitor.cc: exit function if !osdmon()->is_writeable()
Reviewd-by: Gregory Farnum <gfarnum@redhat.com>
2023-05-08 21:04:59 -04:00
zdover23
008c644f46
Merge pull request #51394 from rzarzynski/wip-doc-encode-stdoptional
doc/dev/encoding.txt: update per std::optional

Reviewed-by: Zac Dover <zac.dover@proton.me>
2023-05-09 10:53:06 +10:00
Ilya Dryomov
791abfc83f
Merge pull request #51365 from nbalacha/fix-remove-unused-type
librbd: remove unused enum WriteOpType

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2023-05-08 21:24:28 +02:00
Radoslaw Zarzynski
a9d46ad6dd
Merge pull request #49975 from sseshasa/wip-fix-mclk-rec-backfill-cost
osd: mClock recovery/backfill cost fixes

Reviewed-by: Sam Just <sjust@redhat.com>
Reviewed-by: Radosław Zarzyński <rzarzyns@redhat.com>
2023-05-08 20:22:11 +02:00
Matan
612c81d210
Merge pull request #51381 from Matan-B/wip-matanb-c-blocklist-fix
crimson/osd/osd_operations/client_request: Fix client blocklisting

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
2023-05-08 19:48:28 +03:00
Daniel Gryniewicz
09f857ec3e
Merge pull request #43245 from thiagoarrais/docs-java-examples
[rgw]: Update AWS SDK in Java examples
2023-05-08 11:47:15 -04:00
Radoslaw Zarzynski
622829cebc doc/dev/encoding.txt: update per std::optional
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2023-05-08 14:41:22 +00:00
parth-gr
edab93b2f1 doc: update multisite doc
cmd for getting zone group was spelled incorrectly
Updated to rdosgw-admin

Signed-off-by: parth-gr <paarora@redhat.com>
2023-05-08 19:31:41 +05:30
N Balachandran
97c96408ae librbd : remove unused enum type WriteOpType
This removes the unused enum WriteOpType from
the librbd deep_copy code.

Signed-off-by: N Balachandran <nibalach@redhat.com>
2023-05-08 18:54:35 +05:30
zdover23
4040f12347
Merge pull request #51387 from zdover23/wip-doc-2023-05-08-rados-operations-stretch-mode-other-commands
doc/rados: stretch-mode.rst (other commands)

Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
2023-05-08 22:48:30 +10:00
Zac Dover
fde33f1a5b doc/rados: stretch-mode.rst (other commands)
Edit the "Other Commands" section of
doc/rados/operations/stretch-mode.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2023-05-08 21:08:49 +10:00
Sridhar Seshasayee
4c22fcfbe8 qa/: Override mClock profile to 'high_recovery_ops' for qa tests
The qa tests are not client I/O centric and mostly focus on triggering
recovery/backfills and monitor them for completion within a finite amount
of time. The same holds true for scrub operations.

Therefore, an mClock profile that optimizes background operations is a
better fit for qa related tests. The osd_mclock_profile is therefore
globally overriden to 'high_recovery_ops' profile for the Rados suite as
it fits the requirement.

Also, many standalone tests expect recovery and scrub operations to
complete within a finite time. To ensure this, the osd_mclock_profile
options is set to 'high_recovery_ops' as part of the run_osd() function
in ceph-helpers.sh.

A subset of standalone tests explicitly used 'high_recovery_ops' profile.
Since the profile is now set as part of run_osd(), the earlier overrides
are redundant and therefore removed from the tests.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:22:00 +05:30
Sridhar Seshasayee
b701fbc01d doc/: Modify mClock configuration documentation to reflect profile changes
Modify the relevant documentation to reflect:

- change in the default mClock profile to 'balanced'
- new allocations for ops across mClock profiles
- change in the osd_max_backfills limit
- miscellaneous changes related to warnings.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:22:00 +05:30
Sridhar Seshasayee
2245a84520 common/options/osd.yaml.in: Change mclock max sequential bandwidth for SSDs
The osd_mclock_max_sequential_bandwidth_ssd is changed to 1200 MiB/s as
a reasonable middle ground considering the broad range of SSD capabilities.
This allows the mClock's cost model to extract the SSDs capability
depending on the cost of the IO being performed.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:22:00 +05:30
Sridhar Seshasayee
da48fff4b3 osd/: Retain the default osd_max_backfills limit to 1 for mClock
The earlier limit of 3 was still aggressive enough to have an impact on
the client and other competing operations. Retain the current default
for mClock. This can be modified if necessary after setting the
osd_mclock_override_recovery_settings option.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
5a649f3c94 common/options/osd.yaml.in: change mclock profile default to balanced
Let's use the middle profile as the default.
Modify the standalone tests accordingly.

Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
e326af9178 osd/scheduler/mClockScheduler: avoid limits for recovery
Now that recovery operations are split between background_recovery and
background_best_effort, rebalance qos params to avoid penalizing
background_recovery while idle.

Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
97b27cd57d osd/: add counters for ops delayed due to degraded|unreadable target
Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
e148d34472 osd/: add counters for queue latency for PGRecovery[Context]
Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
5ebc7d319b osd/: add per-op latency averages for each recovery related message
Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
a1f0ccf74a osd/: differentiate priority for PGRecovery[Context]
PGs with degraded objects should be higher priority.

Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
69a17bdb0f osd/: add MSG_OSD_PG_(BACKFILL|BACKFILL_REMOVE|SCAN) as recovery messages
Otherwise, these end up as PGOpItem and therefore as immediate:

class PGOpItem : public PGOpQueueable {
...
  op_scheduler_class get_scheduler_class() const final {
    auto type = op->get_req()->get_type();
    if (type == CEPH_MSG_OSD_OP ||
  type == CEPH_MSG_OSD_BACKOFF) {
      return op_scheduler_class::client;
    } else {
      return op_scheduler_class::immediate;
    }
  }
...
};

This was probably causing a bunch of extra interference with client
ops.

Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
3a87ddd6f1 osd/: differentiate scheduler class for undersized/degraded vs data movement
Recovery operations on pgs/objects that have fewer than the configured
number of copies should be treated more urgently than operations on
pgs/objects that simply need to be moved to a new location.

Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
1defccb36c osd/.../OpSchedulerItem: add MSG_OSD_PG_PULL to is_recovery_msg
Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
e75f480a18 osd/: move PGRecoveryMsg check from osd into PGRecoveryMsg::is_recovery_msg
Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
c20bc71992 osd/: move get_recovery_op_priority into PeeringState next to get_*_priority
Consolidate methods governing recovery scheduling in PeeringState.

Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
d5d7ab57d5 osd/scheduler: simplify qos specific params in OpSchedulerItem
is_qos_item() was only used in operator<< for OpSchedulerItem.  However,
it's actually useful to see priority for mclock items since it affects
whether it goes into the immediate queues and, for some types, the
class.  Unconditionally display both class_id and priority.

Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30
Samuel Just
dc3cc16691 osd/scheduler: remove unused PGOpItem::maybe_get_mosd_op
Signed-off-by: Samuel Just <sjust@redhat.com>
2023-05-08 16:22:00 +05:30