Merge pull request #27250 from ivancich/wip-update-resharding-docs

rgw: updates to resharding documentation

Reviewed-by: Adam Emerson <aemerson@redhat.com>
This commit is contained in:
J. Eric Ivancich 2019-03-28 17:18:58 -04:00 committed by GitHub
commit 0262ed3173
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -9,46 +9,49 @@ RGW Dynamic Bucket Index Resharding
A large bucket index can lead to performance problems. In order
to address this problem we introduced bucket index sharding.
Until Luminous, changing the number of bucket shards (resharding)
needed to be done offline, from Luminous we support
needed to be done offline. Starting with Luminous we support
online bucket resharding.
Each bucket index shard can handle its entries efficiently up until
reaching a certain threshold of entries. If this threshold is exceeded the system
could encounter performance issues.
The dynamic resharding feature detects this situation and increases
automatically the number of shards used by the bucket index,
reaching a certain threshold number of entries. If this threshold is exceeded the system
can encounter performance issues.
The dynamic resharding feature detects this situation and automatically
increases the number of shards used by the bucket index,
resulting in the reduction of the number of entries in each bucket index shard.
This process is transparent to the user.
The detection process runs:
1. When new objects are added to the bucket
2. In a background process that periodically scans all the buckets
This is needed in order to deal with existing buckets in the system that are not being updated.
A bucket that requires resharding is added to the ``reshard_log`` queue and will be
1. when new objects are added to the bucket and
2. in a background process that periodically scans all the buckets.
The background process is needed in order to deal with existing buckets in the system that are not being updated.
A bucket that requires resharding is added to the resharding queue and will be
scheduled to be resharded later.
The reshard threads run in the background and execute the scheduled resharding, one at a time.
The reshard threads run in the background and execute the scheduled resharding tasks, one at a time.
Multisite
=========
Dynamic resharding is not supported in multisite environment.
Dynamic resharding is not supported in a multisite environment.
Configuration
=============
Enable/Disable Dynamic bucket index resharding:
Enable/Disable dynamic bucket index resharding:
-``rgw_dynamic_resharding``: true/false, default: true.
- ``rgw_dynamic_resharding``: true/false, default: true
Parameters to control the resharding process in Ceph configuration fie:
Configuration options that control the resharding process:
-``rgw_reshard_num_logs``: number of shards for the resharding log, default: 16
- ``rgw_reshard_num_logs``: number of shards for the resharding queue, default: 16
-``rgw_reshard_bucket_lock_duration``: duration of lock on bucket obj during resharding, default: 120 seconds.
- ``rgw_reshard_bucket_lock_duration``: duration, in seconds, of lock on bucket obj during resharding, default: 120 seconds
-``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard, default: 100000 objects.
- ``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard before resharding is triggered, default: 100000 objects
-``rgw_reshard_thread_interval``: maximum time between rounds of reshard thread processing, default: 600 seconds
- ``rgw_reshard_thread_interval``: maximum time, in seconds, between rounds of resharding queue processing, default: 600 seconds
Admin commands
@ -68,8 +71,8 @@ List resharding queue
# radosgw-admin reshard list
Process/Schedule a bucket resharding
------------------------------------
Process tasks on the resharding queue
-------------------------------------
::
@ -85,12 +88,12 @@ Bucket resharding status
Cancel pending bucket resharding
--------------------------------
Ongoing bucket resharding operations cannot be cancelled. ::
Note: Ongoing bucket resharding operations cannot be cancelled. ::
# radosgw-admin reshard cancel --bucket <bucket_name>
Manual bucket resharding
------------------------
Manual immediate bucket resharding
----------------------------------
::
@ -101,20 +104,21 @@ Troubleshooting
===============
Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
instance entries that weren't automatically cleaned up. The issue also affected
LifeCycle policies which weren't applied to resharded buckets anymore. Both of
instance entries, which were not automatically cleaned up. The issue also affected
LifeCycle policies, which were not applied to resharded buckets anymore. Both of
these issues can be worked around using a couple of radosgw-admin commands.
Stale Instance Management
Stale instance management
-------------------------
List the stale instances in a cluster that are ready to be cleaned up.
::
# radosgw-admin reshard stale-instances list
This lists the stale instances in a cluster that are ready to be cleaned up.
Please note that the cleanup of these instances should be done only on a single
site cluster. The cleanup can be done by the following command:
Clean up the stale instances in a cluster. Note: cleanup of these
instances should only be done on a single site cluster.
::
@ -124,11 +128,13 @@ site cluster. The cleanup can be done by the following command:
Lifecycle fixes
---------------
For clusters which had resharded instances, it is highly likely that the old
lifecycle processes would've flagged and deleted lifecycle processing as the
For clusters that had resharded instances, it is highly likely that the old
lifecycle processes would have flagged and deleted lifecycle processing as the
bucket instance changed during a reshard. While this is fixed for newer clusters
(from 13.2.6 and 12.2.12), older buckets which had lifecycle policies and
would've undergone reshard will have to be manually fixed by issuing the following command
(from Mimic 13.2.6 and Luminous 12.2.12), older buckets that had lifecycle policies and
that have undergone resharding will have to be manually fixed.
The command to do so is:
::
@ -136,4 +142,4 @@ would've undergone reshard will have to be manually fixed by issuing the followi
As a convenience wrapper, if the ``--bucket`` argument is dropped then this
command will try and fix LC policies for all the buckets in the cluster.
command will try and fix lifecycle policies for all the buckets in the cluster.