diff --git a/ceph.spec.in b/ceph.spec.in index 6ee83cc3995..8024b99749a 100644 --- a/ceph.spec.in +++ b/ceph.spec.in @@ -1502,6 +1502,7 @@ exit 0 %{_mandir}/man8/ceph-authtool.8* %{_mandir}/man8/ceph-conf.8* %{_mandir}/man8/ceph-dencoder.8* +%{_mandir}/man8/ceph-diff-sorted.8* %{_mandir}/man8/ceph-rbdnamer.8* %{_mandir}/man8/ceph-syn.8* %{_mandir}/man8/ceph-post-file.8* @@ -1514,6 +1515,7 @@ exit 0 %{_mandir}/man8/rbd-replay.8* %{_mandir}/man8/rbd-replay-many.8* %{_mandir}/man8/rbd-replay-prep.8* +%{_mandir}/man8/rgw-orphan-list.8* %dir %{_datadir}/ceph/ %{_datadir}/ceph/known_hosts_drop.ceph.com %{_datadir}/ceph/id_rsa_drop.ceph.com diff --git a/debian/radosgw.install b/debian/radosgw.install index b2930879adc..1b19d292db4 100644 --- a/debian/radosgw.install +++ b/debian/radosgw.install @@ -6,4 +6,6 @@ usr/bin/radosgw-object-expirer usr/bin/radosgw-token usr/bin/rgw-orphan-list usr/lib/libradosgw.so* +usr/share/man/man8/ceph-diff-sorted.8 usr/share/man/man8/radosgw.8 +usr/share/man/man8/rgw-orphan-list.8 diff --git a/doc/man/8/CMakeLists.txt b/doc/man/8/CMakeLists.txt index 02655a8cd6d..96feaeb6fd6 100644 --- a/doc/man/8/CMakeLists.txt +++ b/doc/man/8/CMakeLists.txt @@ -49,7 +49,9 @@ endif() if(WITH_RADOSGW) list(APPEND man_srcs radosgw.rst - radosgw-admin.rst) + radosgw-admin.rst + rgw-orphan-list.rst + ceph-diff-sorted.rst) endif() if(WITH_RBD) diff --git a/doc/man/8/ceph-diff-sorted.rst b/doc/man/8/ceph-diff-sorted.rst new file mode 100644 index 00000000000..99e9583363e --- /dev/null +++ b/doc/man/8/ceph-diff-sorted.rst @@ -0,0 +1,71 @@ +:orphan: + +========================================================== + ceph-diff-sorted -- compare two sorted files line by line +========================================================== + +.. program:: ceph-diff-sorted + +Synopsis +======== + +| **ceph-diff-sorted** *file1* *file2* + +Description +=========== + +:program:`ceph-diff-sorted` is a simplifed *diff* utility optimized +for comparing two files with lines that are lexically sorted. + +The output is simplified in comparison to that of the standard `diff` +tool available in POSIX systems. Angle brackets ('<' and '>') are used +to show lines that appear in one file but not the other. The output is +not compatible with the `patch` tool. + +This tool was created in order to perform diffs of large files (e.g., +containing billions of lines) that the standard `diff` tool cannot +handle efficiently. Knowing that the lines are sorted allows this to +be done efficiently with minimal memory overhead. + +The sorting of each file needs to be done lexcially. Most POSIX +systems use the *LANG* environment variable to determine the `sort` +tool's sorting order. To sort lexically we would need something such +as: + + $ LANG=C sort some-file.txt >some-file-sorted.txt + +Examples +======== + +Compare two files:: + + $ ceph-diff-sorted fileA.txt fileB.txt + +Exit Status +=========== + +When complete, the exit status will be set to one of the following: + +0 + files same +1 + files different +2 + usage problem (e.g., wrong number of command-line arguments) +3 + problem opening input file +4 + bad file content (e.g., unsorted order or empty lines) + + +Availability +============ + +:program:`ceph-diff-sorted` is part of Ceph, a massively scalable, +open-source, distributed storage system. Please refer to the Ceph +documentation at http://ceph.com/docs for more information. + +See also +======== + +:doc:`rgw-orphan-list `\(8) diff --git a/doc/man/8/rgw-orphan-list.rst b/doc/man/8/rgw-orphan-list.rst new file mode 100644 index 00000000000..408242da277 --- /dev/null +++ b/doc/man/8/rgw-orphan-list.rst @@ -0,0 +1,69 @@ +:orphan: + +================================================================== + rgw-orphan-list -- list rados objects that are not indexed by rgw +================================================================== + +.. program:: rgw-orphan-list + +Synopsis +======== + +| **rgw-orphan-list** + +Description +=========== + +:program:`rgw-orphan-list` is an *EXPERIMENTAL* RADOS gateway user +administration utility. It produces a listing of rados objects that +are not directly or indirectly referenced through the bucket indexes +on a pool. It places the results and intermediate files on the local +filesystem rather than on the ceph cluster itself, and therefore will +not itself consume additional cluster storage. + +In theory orphans should not exist. However because ceph evolves +rapidly, bugs do crop up, and they may result in orphans that are left +behind. + +In its current form this utility does not take any command-line +arguments or options. It will list the available pools and prompt the +user to enter the pool they would like to list orphans for. + +Behind the scenes it runs `rados ls` and `radosgw-admin bucket +radoslist ...` and produces a list of those entries that appear in the +former but not the latter. Those entries are presumed to be the +orphans. + +Warnings +======== + +This utility is currently considered *EXPERIMENTAL*. + +This utility will produce false orphan entries for unindexed buckets +since such buckets have no bucket indices that can provide the +starting point for tracing. + +Options +======= + +At present there are no options. + +Examples +======== + +Launch the tool:: + + $ rgw-orphan-list + +Availability +============ + +:program:`radosgw-admin` is part of Ceph, a massively scalable, open-source, +distributed storage system. Please refer to the Ceph documentation at +http://ceph.com/docs for more information. + +See also +======== + +:doc:`radosgw-admin `\(8) +:doc:`ceph-diff-sorted `\(8) diff --git a/doc/man_index.rst b/doc/man_index.rst index 0c54f32fb6c..56c9564dbfd 100644 --- a/doc/man_index.rst +++ b/doc/man_index.rst @@ -40,4 +40,6 @@ man/8/rbd-replay man/8/rbd man/8/rbdmap + man/8/rgw-orphan-list man/8/ceph-immutable-object-cache + man/8/ceph-diff-sorted diff --git a/doc/radosgw/index.rst b/doc/radosgw/index.rst index e6c1ab538ea..bf1392d8451 100644 --- a/doc/radosgw/index.rst +++ b/doc/radosgw/index.rst @@ -71,6 +71,7 @@ you may write data with one API and retrieve it with the other. STS Lite Keycloak Role + Orphan List and Associated Tooliing troubleshooting Manpage radosgw <../../man/8/radosgw> Manpage radosgw-admin <../../man/8/radosgw-admin> diff --git a/doc/radosgw/orphans.rst b/doc/radosgw/orphans.rst new file mode 100644 index 00000000000..9a77d60de47 --- /dev/null +++ b/doc/radosgw/orphans.rst @@ -0,0 +1,115 @@ +================================== +Orphan List and Associated Tooling +================================== + +.. version added:: Luminous + +.. contents:: + +Orphans are RADOS objects that are left behind after their associated +RGW objects are removed. Normally these RADOS objects are removed +automatically, either immediately or through a process known as +"garbage collection". Over the history of RGW, however, there may have +been bugs that prevented these RADOS objects from being deleted, and +these RADOS objects may be consuming space on the Ceph cluster without +being of any use. From the perspective of RGW, we call such RADOS +objects "orphans". + +Orphans Find -- DEPRECATED +-------------------------- + +The `radosgw-admin` tool has/had three subcommands to help manage +orphans, however these subcommands are (or will soon be) +deprecated. These subcommands are: + +:: + # radosgw-admin orphans find ... + # radosgw-admin orphans finish ... + # radosgw-admin orphans list-jobs ... + +There are two key problems with these subcommands, however. First, +these subcommands have not been actively maintained and therefore have +not tracked RGW as it has evolved in terms of features and updates. As +a result the confidence that these subcommands can accurately identify +true orphans is presently low. + +Second, these subcommands store intermediate results on the cluster +itself. This can be problematic when cluster administrators are +confronting insufficient storage space and want to remove orphans as a +means of addressing the issue. The intermediate results could strain +the existing cluster storage capacity even further. + +For these reasons "orphans find" has been deprecated. + +Orphan List +----------- + +Because "orphans find" has been deprecated, RGW now includes an +additional tool -- 'rgw-orphan-list'. When run it will list the +available pools and prompt the user to enter the name of the data +pool. At that point the tool will, perhaps after an extended period of +time, produce a local file containing the RADOS objects from the +designated pool that appear to be orphans. The administrator is free +to examine this file and the decide on a course of action, perhaps +removing those RADOS objects from the designated pool. + +All intermediate results are stored on the local file system rather +than the Ceph cluster. So running the 'rgw-orphan-list' tool should +have no appreciable impact on the amount of cluster storage consumed. + +WARNING: Experimental Status +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The 'rgw-orphan-list' tool is new and therefore currently considered +experimental. The list of orphans produced should be "sanity checked" +before being used for a large delete operation. + +WARNING: Specifying a Data Pool +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If a pool other than an RGW data pool is specified, the results of the +tool will be erroneous. All RADOS objects found on such a pool will +falsely be designated as orphans. + +WARNING: Unindexed Buckets +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +RGW allows for unindexed buckets, that is buckets that do not maintain +an index of their contents. This is not a typical configuration, but +it is supported. Because the 'rgw-orphan-list' tool uses the bucket +indices to determine what RADOS objects should exist, objects in the +unindexed buckets will falsely be listed as orphans. + + +RADOS List +---------- + +One of the sub-steps in computing a list of orphans is to map each RGW +object into its corresponding set of RADOS objects. This is done using +a subcommand of 'radosgw-admin'. + +:: + # radosgw-admin bucket radoslist [--bucket={bucket-name}] + +The subcommand will produce a list of RADOS objects that support all +of the RGW objects. If a bucket is specified then the subcommand will +only produce a list of RADOS objects that correspond back the RGW +objects in the specified bucket. + +Note: Shared Bucket Markers +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Some administrators will be aware of the coding schemes used to name +the RADOS objects that correspond to RGW objects, which include a +"marker" unique to a given bucket. + +RADOS objects that correspond with the contents of one RGW bucket, +however, may contain a marker that specifies a different bucket. This +behavior is a consequence of the "shallow copy" optimization used by +RGW. When larger objects are copied from bucket to bucket, only the +"head" objects are actually copied, and the tail objects are +shared. Those shared objects will contain the marker of the original +bucket. + +.. _Data Layout in RADOS : ../layout +.. _Pool Placement and Storage Classes : ../placement