mirror of
https://github.com/ceph/ceph
synced 2025-01-19 17:41:39 +00:00
dee9ac22f1
Adding a note on LC fixes and reshard stale instance fixes Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
140 lines
4.2 KiB
ReStructuredText
140 lines
4.2 KiB
ReStructuredText
.. _rgw_dynamic_bucket_index_resharding:
|
|
|
|
===================================
|
|
RGW Dynamic Bucket Index Resharding
|
|
===================================
|
|
|
|
.. versionadded:: Luminous
|
|
|
|
A large bucket index can lead to performance problems. In order
|
|
to address this problem we introduced bucket index sharding.
|
|
Until Luminous, changing the number of bucket shards (resharding)
|
|
needed to be done offline, from Luminous we support
|
|
online bucket resharding.
|
|
|
|
Each bucket index shard can handle its entries efficiently up until
|
|
reaching a certain threshold of entries. If this threshold is exceeded the system
|
|
could encounter performance issues.
|
|
The dynamic resharding feature detects this situation and increases
|
|
automatically the number of shards used by the bucket index,
|
|
resulting in the reduction of the number of entries in each bucket index shard.
|
|
This process is transparent to the user.
|
|
|
|
The detection process runs:
|
|
1. When new objects are added to the bucket
|
|
2. In a background process that periodically scans all the buckets
|
|
This is needed in order to deal with existing buckets in the system that are not being updated.
|
|
A bucket that requires resharding is added to the ``reshard_log`` queue and will be
|
|
scheduled to be resharded later.
|
|
The reshard threads run in the background and execute the scheduled resharding, one at a time.
|
|
|
|
Multisite
|
|
=========
|
|
Dynamic resharding is not supported in multisite environment.
|
|
|
|
|
|
Configuration
|
|
=============
|
|
|
|
Enable/Disable Dynamic bucket index resharding:
|
|
|
|
-``rgw_dynamic_resharding``: true/false, default: true.
|
|
|
|
Parameters to control the resharding process in Ceph configuration fie:
|
|
|
|
-``rgw_reshard_num_logs``: number of shards for the resharding log, default: 16
|
|
|
|
-``rgw_reshard_bucket_lock_duration``: duration of lock on bucket obj during resharding, default: 120 seconds.
|
|
|
|
-``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard, default: 100000 objects.
|
|
|
|
-``rgw_reshard_thread_interval``: maximum time between rounds of reshard thread processing, default: 600 seconds
|
|
|
|
|
|
Admin commands
|
|
==============
|
|
|
|
Add a bucket to the resharding queue
|
|
------------------------------------
|
|
|
|
::
|
|
|
|
# radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>
|
|
|
|
List resharding queue
|
|
---------------------
|
|
|
|
::
|
|
|
|
# radosgw-admin reshard list
|
|
|
|
Process/Schedule a bucket resharding
|
|
------------------------------------
|
|
|
|
::
|
|
|
|
# radosgw-admin reshard process
|
|
|
|
Bucket resharding status
|
|
------------------------
|
|
|
|
::
|
|
|
|
# radosgw-admin reshard status --bucket <bucket_name>
|
|
|
|
Cancel pending bucket resharding
|
|
--------------------------------
|
|
|
|
Ongoing bucket resharding operations cannot be cancelled. ::
|
|
|
|
# radosgw-admin reshard cancel --bucket <bucket_name>
|
|
|
|
Manual bucket resharding
|
|
------------------------
|
|
|
|
::
|
|
|
|
# radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>
|
|
|
|
|
|
Troubleshooting
|
|
===============
|
|
|
|
Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
|
|
instance entries that weren't automatically cleaned up. The issue also affected
|
|
LifeCycle policies which weren't applied to resharded buckets anymore. Both of
|
|
these issues can be worked around using a couple of radosgw-admin commands.
|
|
|
|
Stale Instance Management
|
|
-------------------------
|
|
|
|
::
|
|
|
|
# radosgw-admin reshard stale-instances list
|
|
|
|
This lists the stale instances in a cluster that are ready to be cleaned up.
|
|
Please note that the cleanup of these instances should be done only on a single
|
|
site cluster. The cleanup can be done by the following command:
|
|
|
|
::
|
|
|
|
# radosgw-admin reshard stale-instances rm
|
|
|
|
|
|
Lifecycle fixes
|
|
---------------
|
|
|
|
For clusters which had resharded instances, it is highly likely that the old
|
|
lifecycle processes would've flagged and deleted lifecycle processing as the
|
|
bucket instance changed during a reshard. While this is fixed for newer clusters
|
|
(from 13.2.6 and 12.2.12), older buckets which had lifecycle policies and
|
|
would've undergone reshard will have to be manually fixed by issuing the following command
|
|
|
|
::
|
|
|
|
# radosgw-admin lc reshard fix --bucket {bucketname}
|
|
|
|
|
|
As a convenience wrapper, if the ``--bucket`` argument is dropped then this
|
|
command will try and fix LC policies for all the buckets in the cluster.
|