ceph/doc/radosgw/cloud-sync-module.rst

252 lines
7.8 KiB
ReStructuredText
Raw Normal View History

=========================
Cloud Sync Module
=========================
.. versionadded:: Mimic
This sync module sync zone data to a remote cloud service. The sync is unidirectional,
and data is not synced from the remote zone back. The aim of this sync module is to
provide capability of syncing data to different cloud providers. Currently the supported
cloud providers are ones that are compatible with AWS (S3).
A user for the remote cloud object store service needs to be configured. Sync operations will
be done under that specified user. Since different cloud services impose limits on the number
of buckets that each user can create, the source objects and buckets will be mapped into a
different (configurable) buckets and objects. It is possible to configure different targets
to different buckets and bucket prefixes. In addition to that, source ACLs will not be preserved.
It is possible to map permission to specific source users to a specific destination users.
Due to API limitations, there is no way to preserve original objects modification time, and
ETag. The cloud sync module stores these in a separate metadata attributes on the destination
objects.
Cloud Sync Tier Type Configuration
-------------------------------------
Trivial Configuration:
~~~~~~~~~~~~~~~~~~~~~~
::
{
"connection": {
"access_key": <access>,
"secret": <secret>,
"endpoint": <endpoint>,
"host_style": <path | virtual>,
},
"acls": [ { "type": <id | email | uri>,
"source_id": <source_id>,
"dest_id": <dest_id> } ... ],
"target_path": <target_path>,
}
Non Trivial Configuration:
~~~~~~~~~~~~~~~~~~~~~~~~~~
::
{
"default": {
"connection": {
"access_key": <access>,
"secret": <secret>,
"endpoint": <endpoint>,
"host_style" <path | virtual>,
},
"acls": [
{
"type" : <id | email | uri>, # optional, default is id
"source_id": <id>,
"dest_id": <id>
} ... ]
"target_path": <path> # optional
},
"connections": [
{
"connection_id": <id>,
"access_key": <access>,
"secret": <secret>,
"endpoint": <endpoint>,
"host_style" <path | virtual>, # optional
} ... ],
"acl_profiles": [
{
"acls_id": <id>, # acl mappings
"acls": [ {
"type": <id | email | uri>,
"source_id": <id>,
"dest_id": <id>
} ... ]
}
],
"profiles": [
{
"source_bucket": <source>,
"connection_id": <connection_id>,
"acls_id": <mappings_id>,
"target_path": <dest>, # optional
} ... ],
}
.. Note:: Trivial configuration can coincide with the non-trivial one.
* ``connection`` (container)
Represents a connection to the remote cloud service. Contains ``conection_id`, ``access_key``,
``secret``, ``endpoint``, and ``host_style``.
* ``access_key`` (string)
The remote cloud access key that will be used for a specific connection.
* ``secret`` (string)
The secret key for the remote cloud service.
* ``endpoint`` (string)
URL of remote cloud service endpoint.
* ``host_style`` (path | virtual)
Type of host style to be used when accessing remote cloud endpoint (default: ``path``).
* ``acls`` (array)
Contains a list of ``acl_mappings``.
* ``acl_mapping`` (container)
Each ``acl_mapping`` structure contains ``type``, ``source_id``, and ``dest_id``. These
will define the ACL mutation that will be done on each object. An ACL mutation allows converting source
user id to a destination id.
* ``type`` (id | email | uri)
ACL type: ``id`` defines user id, ``email`` defines user by email, and ``uri`` defines user by ``uri`` (group).
* ``source_id`` (string)
ID of user in the source zone.
* ``dest_id`` (string)
ID of user in the destination.
* ``target_path`` (string)
A string that defines how the target path is created. The target path specifies a prefix to which
the source object name is appended. The target path configurable can include any of the following
variables:
- ``sid``: unique string that represents the sync instance ID
- ``zonegroup``: the zonegroup name
- ``zonegroup_id``: the zonegroup ID
- ``zone``: the zone name
- ``zone_id``: the zone id
- ``bucket``: source bucket name
- ``owner``: source bucket owner ID
For example: ``target_path = rgwx-${zone}-${sid}/${owner}/${bucket}``
* ``acl_profiles`` (array)
An array of of ``acl_profile``.
* ``acl_profile`` (container)
Each profile contains ``acls_id`` (string) that represents the profile, and ``acls`` array that
holds a list of ``acl_mappings``.
* ``profiles`` (array)
A list of profiles. Each profile contains the following:
- ``source_bucket``: either a bucket name, or a bucket prefix (if ends with ``*``) that defines the source bucket(s) for this profile
- ``target_path``: as defined above
- ``connection_id``: ID of the connection that will be used for this profile
- ``acls_id``: ID of ACLs profile that will be used for this profile
S3 Specific Configurables:
~~~~~~~~~~~~~~~~~~~~~~~~~~
Currently cloud sync will only work with backends that are compatible with AWS S3. There are
a few configurables that can be used to tweak its behavior when accessing these cloud services:
::
{
"multipart_sync_threshold": {object_size},
"multipart_min_part_size": {part_size}
}
* ``multipart_sync_threshold`` (integer)
Objects this size or larger will be synced to the cloud using multipart upload.
* ``multipart_min_part_size`` (integer)
Minimum parts size to use when syncing objects using multipart upload.
How to Configure
~~~~~~~~~~~~~~~~
See `Multisite Configuration`_ for how to multisite config instructions. The cloud sync module requires a creation of a new zone. The zone
tier type needs to be defined as ``cloud``:
::
# radosgw-admin zone create --rgw-zonegroup={zone-group-name} \
--rgw-zone={zone-name} \
--endpoints={http://fqdn}[,{http://fqdn}]
--tier-type=cloud
The tier configuration can be then done using the following command
::
# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
--rgw-zone={zone-name} \
--tier-config={key}={val}[,{key}={val}]
The ``key`` in the configuration specifies the config variable that needs to be updated, and
the ``val`` specifies its new value. Nested values can be accessed using period. For example:
::
# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
--rgw-zone={zone-name} \
--tier-config=connection.access_key={key},connection.secret={secret}
Configuration array entries can be accessed by specifying the specific entry to be referenced enclosed
in square brackets, and adding new array entry can be done by using `[]`. Index value of `-1` references
the last entry in the array. At the moment it is not possible to create a new entry and reference it
again at the same command.
For example, creating a new profile for buckets starting with {prefix}:
::
# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
--rgw-zone={zone-name} \
--tier-config=profiles[].source_bucket={prefix}'*'
# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
--rgw-zone={zone-name} \
--tier-config=profiles[-1].connection_id={conn_id},profiles[-1].acls_id={acls_id}
An entry can be removed by using ``--tier-config-rm={key}``.
.. _Multisite Configuration: ./multisite