doc/rgw: add docs for rgw-orphan-list and ceph-diff-sorted

Add man pages and documentation for both tools.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
This commit is contained in:
J. Eric Ivancich 2020-03-25 13:39:51 -04:00
parent e396064d9a
commit 9d5e9c3031
8 changed files with 265 additions and 1 deletions

View File

@ -1502,6 +1502,7 @@ exit 0
%{_mandir}/man8/ceph-authtool.8*
%{_mandir}/man8/ceph-conf.8*
%{_mandir}/man8/ceph-dencoder.8*
%{_mandir}/man8/ceph-diff-sorted.8*
%{_mandir}/man8/ceph-rbdnamer.8*
%{_mandir}/man8/ceph-syn.8*
%{_mandir}/man8/ceph-post-file.8*
@ -1514,6 +1515,7 @@ exit 0
%{_mandir}/man8/rbd-replay.8*
%{_mandir}/man8/rbd-replay-many.8*
%{_mandir}/man8/rbd-replay-prep.8*
%{_mandir}/man8/rgw-orphan-list.8*
%dir %{_datadir}/ceph/
%{_datadir}/ceph/known_hosts_drop.ceph.com
%{_datadir}/ceph/id_rsa_drop.ceph.com

View File

@ -6,4 +6,6 @@ usr/bin/radosgw-object-expirer
usr/bin/radosgw-token
usr/bin/rgw-orphan-list
usr/lib/libradosgw.so*
usr/share/man/man8/ceph-diff-sorted.8
usr/share/man/man8/radosgw.8
usr/share/man/man8/rgw-orphan-list.8

View File

@ -49,7 +49,9 @@ endif()
if(WITH_RADOSGW)
list(APPEND man_srcs
radosgw.rst
radosgw-admin.rst)
radosgw-admin.rst
rgw-orphan-list.rst
ceph-diff-sorted.rst)
endif()
if(WITH_RBD)

View File

@ -0,0 +1,71 @@
:orphan:
==========================================================
ceph-diff-sorted -- compare two sorted files line by line
==========================================================
.. program:: ceph-diff-sorted
Synopsis
========
| **ceph-diff-sorted** *file1* *file2*
Description
===========
:program:`ceph-diff-sorted` is a simplifed *diff* utility optimized
for comparing two files with lines that are lexically sorted.
The output is simplified in comparison to that of the standard `diff`
tool available in POSIX systems. Angle brackets ('<' and '>') are used
to show lines that appear in one file but not the other. The output is
not compatible with the `patch` tool.
This tool was created in order to perform diffs of large files (e.g.,
containing billions of lines) that the standard `diff` tool cannot
handle efficiently. Knowing that the lines are sorted allows this to
be done efficiently with minimal memory overhead.
The sorting of each file needs to be done lexcially. Most POSIX
systems use the *LANG* environment variable to determine the `sort`
tool's sorting order. To sort lexically we would need something such
as:
$ LANG=C sort some-file.txt >some-file-sorted.txt
Examples
========
Compare two files::
$ ceph-diff-sorted fileA.txt fileB.txt
Exit Status
===========
When complete, the exit status will be set to one of the following:
0
files same
1
files different
2
usage problem (e.g., wrong number of command-line arguments)
3
problem opening input file
4
bad file content (e.g., unsorted order or empty lines)
Availability
============
:program:`ceph-diff-sorted` is part of Ceph, a massively scalable,
open-source, distributed storage system. Please refer to the Ceph
documentation at http://ceph.com/docs for more information.
See also
========
:doc:`rgw-orphan-list <rgw-orphan-list>`\(8)

View File

@ -0,0 +1,69 @@
:orphan:
==================================================================
rgw-orphan-list -- list rados objects that are not indexed by rgw
==================================================================
.. program:: rgw-orphan-list
Synopsis
========
| **rgw-orphan-list**
Description
===========
:program:`rgw-orphan-list` is an *EXPERIMENTAL* RADOS gateway user
administration utility. It produces a listing of rados objects that
are not directly or indirectly referenced through the bucket indexes
on a pool. It places the results and intermediate files on the local
filesystem rather than on the ceph cluster itself, and therefore will
not itself consume additional cluster storage.
In theory orphans should not exist. However because ceph evolves
rapidly, bugs do crop up, and they may result in orphans that are left
behind.
In its current form this utility does not take any command-line
arguments or options. It will list the available pools and prompt the
user to enter the pool they would like to list orphans for.
Behind the scenes it runs `rados ls` and `radosgw-admin bucket
radoslist ...` and produces a list of those entries that appear in the
former but not the latter. Those entries are presumed to be the
orphans.
Warnings
========
This utility is currently considered *EXPERIMENTAL*.
This utility will produce false orphan entries for unindexed buckets
since such buckets have no bucket indices that can provide the
starting point for tracing.
Options
=======
At present there are no options.
Examples
========
Launch the tool::
$ rgw-orphan-list
Availability
============
:program:`radosgw-admin` is part of Ceph, a massively scalable, open-source,
distributed storage system. Please refer to the Ceph documentation at
http://ceph.com/docs for more information.
See also
========
:doc:`radosgw-admin <radosgw-admin>`\(8)
:doc:`ceph-diff-sorted <ceph-diff-sorted>`\(8)

View File

@ -40,4 +40,6 @@
man/8/rbd-replay
man/8/rbd
man/8/rbdmap
man/8/rgw-orphan-list
man/8/ceph-immutable-object-cache
man/8/ceph-diff-sorted

View File

@ -71,6 +71,7 @@ you may write data with one API and retrieve it with the other.
STS Lite <STSLite>
Keycloak <keycloak>
Role <role>
Orphan List and Associated Tooliing <orphans>
troubleshooting
Manpage radosgw <../../man/8/radosgw>
Manpage radosgw-admin <../../man/8/radosgw-admin>

115
doc/radosgw/orphans.rst Normal file
View File

@ -0,0 +1,115 @@
==================================
Orphan List and Associated Tooling
==================================
.. version added:: Luminous
.. contents::
Orphans are RADOS objects that are left behind after their associated
RGW objects are removed. Normally these RADOS objects are removed
automatically, either immediately or through a process known as
"garbage collection". Over the history of RGW, however, there may have
been bugs that prevented these RADOS objects from being deleted, and
these RADOS objects may be consuming space on the Ceph cluster without
being of any use. From the perspective of RGW, we call such RADOS
objects "orphans".
Orphans Find -- DEPRECATED
--------------------------
The `radosgw-admin` tool has/had three subcommands to help manage
orphans, however these subcommands are (or will soon be)
deprecated. These subcommands are:
::
# radosgw-admin orphans find ...
# radosgw-admin orphans finish ...
# radosgw-admin orphans list-jobs ...
There are two key problems with these subcommands, however. First,
these subcommands have not been actively maintained and therefore have
not tracked RGW as it has evolved in terms of features and updates. As
a result the confidence that these subcommands can accurately identify
true orphans is presently low.
Second, these subcommands store intermediate results on the cluster
itself. This can be problematic when cluster administrators are
confronting insufficient storage space and want to remove orphans as a
means of addressing the issue. The intermediate results could strain
the existing cluster storage capacity even further.
For these reasons "orphans find" has been deprecated.
Orphan List
-----------
Because "orphans find" has been deprecated, RGW now includes an
additional tool -- 'rgw-orphan-list'. When run it will list the
available pools and prompt the user to enter the name of the data
pool. At that point the tool will, perhaps after an extended period of
time, produce a local file containing the RADOS objects from the
designated pool that appear to be orphans. The administrator is free
to examine this file and the decide on a course of action, perhaps
removing those RADOS objects from the designated pool.
All intermediate results are stored on the local file system rather
than the Ceph cluster. So running the 'rgw-orphan-list' tool should
have no appreciable impact on the amount of cluster storage consumed.
WARNING: Experimental Status
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The 'rgw-orphan-list' tool is new and therefore currently considered
experimental. The list of orphans produced should be "sanity checked"
before being used for a large delete operation.
WARNING: Specifying a Data Pool
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If a pool other than an RGW data pool is specified, the results of the
tool will be erroneous. All RADOS objects found on such a pool will
falsely be designated as orphans.
WARNING: Unindexed Buckets
~~~~~~~~~~~~~~~~~~~~~~~~~~
RGW allows for unindexed buckets, that is buckets that do not maintain
an index of their contents. This is not a typical configuration, but
it is supported. Because the 'rgw-orphan-list' tool uses the bucket
indices to determine what RADOS objects should exist, objects in the
unindexed buckets will falsely be listed as orphans.
RADOS List
----------
One of the sub-steps in computing a list of orphans is to map each RGW
object into its corresponding set of RADOS objects. This is done using
a subcommand of 'radosgw-admin'.
::
# radosgw-admin bucket radoslist [--bucket={bucket-name}]
The subcommand will produce a list of RADOS objects that support all
of the RGW objects. If a bucket is specified then the subcommand will
only produce a list of RADOS objects that correspond back the RGW
objects in the specified bucket.
Note: Shared Bucket Markers
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some administrators will be aware of the coding schemes used to name
the RADOS objects that correspond to RGW objects, which include a
"marker" unique to a given bucket.
RADOS objects that correspond with the contents of one RGW bucket,
however, may contain a marker that specifies a different bucket. This
behavior is a consequence of the "shallow copy" optimization used by
RGW. When larger objects are copied from bucket to bucket, only the
"head" objects are actually copied, and the tail objects are
shared. Those shared objects will contain the marker of the original
bucket.
.. _Data Layout in RADOS : ../layout
.. _Pool Placement and Storage Classes : ../placement