ceph/doc/radosgw/dynamicresharding.rst

.. _rgw_dynamic_bucket_index_resharding:

===================================
RGW Dynamic Bucket Index Resharding
===================================

.. versionadded:: Luminous

A large bucket index can lead to performance problems. In order
to address this problem we introduced bucket index sharding.
Until Luminous, changing the number of bucket shards (resharding)
needed to be done offline, from Luminous we support
online bucket resharding.

Each bucket index shard can handle its entries efficiently up until
reaching a certain threshold of entries. If this threshold is exceeded the system
could encounter performance issues.
The dynamic resharding feature detects this situation and increases
automatically the number of shards used by the bucket index,
resulting in the reduction of the number of entries in each bucket index shard.
This process is transparent to the user.

The detection process runs:
1. When new objects are added to the bucket
2. In a background process that periodically scans all the buckets
This is needed in order to deal with existing buckets in the system that are not being updated.
A bucket that requires resharding is added to the ``reshard_log`` queue and will be
scheduled to be resharded later.
The reshard threads run in the background and execute the scheduled resharding, one at a time.

Multisite
=========
Dynamic resharding is not supported in multisite environment.


Configuration
=============

Enable/Disable Dynamic bucket index resharding:

-``rgw_dynamic_resharding``:  true/false, default: true.

Parameters to control the resharding process in Ceph configuration fie:

-``rgw_reshard_num_logs``: number of shards for the resharding log, default: 16

-``rgw_reshard_bucket_lock_duration``: duration of lock on bucket obj during resharding, default:  120 seconds.

-``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard, default: 100000 objects.

-``rgw_reshard_thread_interval``: maximum time between rounds of reshard thread processing,  default: 600 seconds


Admin commands
==============

Add a bucket to the resharding queue
------------------------------------

::

   # radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>

List resharding queue
---------------------

::

   # radosgw-admin reshard list

Process/Schedule a bucket resharding
------------------------------------

::

   # radosgw-admin reshard process

Bucket resharding status
------------------------

::

   # radosgw-admin reshard status --bucket <bucket_name>

Cancel pending bucket resharding
--------------------------------

Ongoing bucket resharding operations cannot be cancelled. ::

   # radosgw-admin reshard cancel --bucket <bucket_name>

Manual bucket resharding
------------------------

::

   # radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>


Troubleshooting
===============

Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket
instance entries that weren't automatically cleaned up. The issue also affected
LifeCycle policies which weren't applied to resharded buckets anymore. Both of
these issues can be worked around using a couple of radosgw-admin commands.

Stale Instance Management
-------------------------

::

   # radosgw-admin reshard stale-instances list

This lists the stale instances in a cluster that are ready to be cleaned up.
Please note that the cleanup of these instances should be done only on a single
site cluster. The cleanup can be done by the following command:

::

   # radosgw-admin reshard stale-instances rm


Lifecycle fixes
---------------

For clusters which had resharded instances, it is highly likely that the old
lifecycle processes would've flagged and deleted lifecycle processing as the
bucket instance changed during a reshard. While this is fixed for newer clusters
(from 13.2.6 and 12.2.12), older buckets which had lifecycle policies and
would've undergone reshard will have to be manually fixed by issuing the following command

::

   # radosgw-admin lc reshard fix --bucket {bucketname}


As a convenience wrapper, if the ``--bucket`` argument is dropped then this
command will try and fix LC policies for all the buckets in the cluster.
doc/rados/operations/health-checks: Add LARGE_OMAP_OBJECTS Document LARGE_OMAP_OBJECTS health check Signed-off-by: Brad Hubbard <bhubbard@redhat.com> 2018-11-27 03:50:24 +00:00			`.. _rgw_dynamic_bucket_index_resharding:`

rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00			`===================================`
			`RGW Dynamic Bucket Index Resharding`
			`===================================`

			`.. versionadded:: Luminous`

			`A large bucket index can lead to performance problems. In order`
			`to address this problem we introduced bucket index sharding.`
			`Until Luminous, changing the number of bucket shards (resharding)`
			`needed to be done offline, from Luminous we support`
			`online bucket resharding.`

			`Each bucket index shard can handle its entries efficiently up until`
			`reaching a certain threshold of entries. If this threshold is exceeded the system`
			`could encounter performance issues.`
			`The dynamic resharding feature detects this situation and increases`
			`automatically the number of shards used by the bucket index,`
			`resulting in the reduction of the number of entries in each bucket index shard.`
			`This process is transparent to the user.`

			`The detection process runs:`
			`1. When new objects are added to the bucket`
			`2. In a background process that periodically scans all the buckets`
doc: Fix Spelling Error of Radosgw Signed-off-by: Li Bingyang <li.bingyang1@zte.com.cn> 2018-09-10 01:21:27 +00:00			`This is needed in order to deal with existing buckets in the system that are not being updated.`
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00			A bucket that requires resharding is added to the ``reshard_log`` queue and will be
			`scheduled to be resharded later.`
			`The reshard threads run in the background and execute the scheduled resharding, one at a time.`

			`Multisite`
			`=========`
			`Dynamic resharding is not supported in multisite environment.`


			`Configuration`
			`=============`

			`Enable/Disable Dynamic bucket index resharding:`

			-``rgw_dynamic_resharding``: true/false, default: true.

doc: Fix Spelling Error In File dynamicresharding.rst Signed-off-by: xiaomanh <huangxiaoman@cmss.chinamobile.com> 2018-09-19 10:37:24 +00:00			`Parameters to control the resharding process in Ceph configuration fie:`
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00
			-``rgw_reshard_num_logs``: number of shards for the resharding log, default: 16

			-``rgw_reshard_bucket_lock_duration``: duration of lock on bucket obj during resharding, default: 120 seconds.

			-``rgw_max_objs_per_shard``: maximum number of objects per bucket index shard, default: 100000 objects.

			-``rgw_reshard_thread_interval``: maximum time between rounds of reshard thread processing, default: 600 seconds


			`Admin commands`
			`==============`

			`Add a bucket to the resharding queue`
			`------------------------------------`
doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00			`::`
doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00
			`# radosgw-admin reshard add --bucket <bucket_name> --num-shards <new number of shards>`
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00
			`List resharding queue`
			`---------------------`
doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00			`::`
doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00
doc: fix typo in dynamicresharding.rst: admon => admin Signed-off-by: Alexander Ermolaev <ave@integros.com> 2017-10-31 12:59:07 +00:00			`# radosgw-admin reshard list`
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00
			`Process/Schedule a bucket resharding`
			`------------------------------------`
doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00
			`::`

rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00			`# radosgw-admin reshard process`

doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00			`Bucket resharding status`
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00			`------------------------`
doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00			`::`
doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00
			`# radosgw-admin reshard status --bucket <bucket_name>`
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00
			`Cancel pending bucket resharding`
			`--------------------------------`

doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00			`Ongoing bucket resharding operations cannot be cancelled. ::`
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00
doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00			`# radosgw-admin reshard cancel --bucket <bucket_name>`
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00
			`Manual bucket resharding`
			`------------------------`
doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00
rgw: Add dynamic resharding documentation Fixes: http://tracker.ceph.com/issues/21553 Signed-off-by: Orit Wasserman <owasserm@redhat.com> 2017-06-26 11:58:49 +00:00			`::`

doc: Fixes dynamic-resharding doc formatting Fixed the formatting errors in dynamicresharding.rst Signed-off-by: Ashish Singh assingh@redhat.com 2018-03-20 08:49:32 +00:00			`# radosgw-admin bucket reshard --bucket <bucket_name> --num-shards <new number of shards>`
doc: add troubleshooting notes on reshard admin clis Adding a note on LC fixes and reshard stale instance fixes Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com> 2019-03-08 15:57:28 +00:00

			`Troubleshooting`
			`===============`

			`Clusters prior to Luminous 12.2.11 and Mimic 13.2.5 left behind stale bucket`
			`instance entries that weren't automatically cleaned up. The issue also affected`
			`LifeCycle policies which weren't applied to resharded buckets anymore. Both of`
			`these issues can be worked around using a couple of radosgw-admin commands.`

			`Stale Instance Management`
			`-------------------------`

			`::`

			`# radosgw-admin reshard stale-instances list`

			`This lists the stale instances in a cluster that are ready to be cleaned up.`
			`Please note that the cleanup of these instances should be done only on a single`
			`site cluster. The cleanup can be done by the following command:`

			`::`

			`# radosgw-admin reshard stale-instances rm`


			`Lifecycle fixes`
			`---------------`

			`For clusters which had resharded instances, it is highly likely that the old`
			`lifecycle processes would've flagged and deleted lifecycle processing as the`
			`bucket instance changed during a reshard. While this is fixed for newer clusters`
			`(from 13.2.6 and 12.2.12), older buckets which had lifecycle policies and`
			`would've undergone reshard will have to be manually fixed by issuing the following command`

			`::`

			`# radosgw-admin lc reshard fix --bucket {bucketname}`


			As a convenience wrapper, if the ``--bucket`` argument is dropped then this
			`command will try and fix LC policies for all the buckets in the cluster.`