ceph/doc/architecture.rst

==============
 Architecture
==============

:term:`Ceph` uniquely delivers **object, block, and file storage** in one
unified system. Ceph is highly reliable, easy to manage, and free. The power of
Ceph can transform your company's IT infrastructure and your ability to manage
vast amounts of data. Ceph delivers extraordinary scalability–thousands of
clients accessing petabytes to exabytes of data. A :term:`Ceph Node` leverages
commodity hardware and intelligent daemons, and a :term:`Ceph Storage Cluster`
accommodates large numbers of nodes, which communicate with each other to
replicate and redistribute data dynamically.

.. image:: images/stack.png

.. _arch-ceph-storage-cluster:

The Ceph Storage Cluster
========================

Ceph provides an infinitely scalable :term:`Ceph Storage Cluster` based upon
:abbr:`RADOS (Reliable Autonomic Distributed Object Store)`, a reliable,
distributed storage service that uses the intelligence in each of its nodes to
secure the data it stores and to provide that data to :term:`client`\s. See
Sage Weil's "`The RADOS Object Store
<https://ceph.io/en/news/blog/2009/the-rados-distributed-object-store/>`_" blog
post for a brief explanation of RADOS and see `RADOS - A Scalable, Reliable
Storage Service for Petabyte-scale Storage Clusters`_ for an exhaustive
explanation of :term:`RADOS`.

A Ceph Storage Cluster consists of multiple types of daemons:

- :term:`Ceph Monitor`
- :term:`Ceph OSD Daemon`
- :term:`Ceph Manager`
- :term:`Ceph Metadata Server`

.. _arch_monitor:

Ceph Monitors maintain the master copy of the cluster map, which they provide
to Ceph clients. The existence of multiple monitors in the Ceph cluster ensures
availability if one of the monitor daemons or its host fails.

A Ceph OSD Daemon checks its own state and the state of other OSDs and reports
back to monitors.

A Ceph Manager serves as an endpoint for monitoring, orchestration, and plug-in
modules.

A Ceph Metadata Server (MDS) manages file metadata when CephFS is used to
provide file services.

Storage cluster clients and :term:`Ceph OSD Daemon`\s use the CRUSH algorithm
to compute information about the location of data.  By using the CRUSH
algorithm, clients and OSDs avoid being bottlenecked by a central lookup table.
Ceph's high-level features include a native interface to the Ceph Storage
Cluster via ``librados`` and a number of service interfaces built on top of
``librados``.

Storing Data
------------

The Ceph Storage Cluster receives data from :term:`Ceph Client`\s--whether it
comes through a :term:`Ceph Block Device`, :term:`Ceph Object Storage`, the
:term:`Ceph File System`, or a custom implementation that you create by using
``librados``. The data received by the Ceph Storage Cluster is stored as RADOS
objects. Each object is stored on an :term:`Object Storage Device` (this is
also called an "OSD"). Ceph OSDs control read, write, and replication
operations on storage drives. The default BlueStore back end stores objects
in a monolithic, database-like fashion.

.. ditaa::

           /------\       +-----+       +-----+
           | obj  |------>| {d} |------>| {s} |
           \------/       +-----+       +-----+

            Object         OSD          Drive

Ceph OSD Daemons store data as objects in a flat namespace. This means that
objects are not stored in a hierarchy of directories. An object has an
identifier, binary data, and metadata consisting of name/value pairs.
:term:`Ceph Client`\s determine the semantics of the object data. For example,
CephFS uses metadata to store file attributes such as the file owner, the
created date, and the last modified date.


.. ditaa::

           /------+------------------------------+----------------\
           | ID   | Binary Data                  | Metadata       |
           +------+------------------------------+----------------+
           | 1234 | 0101010101010100110101010010 | name1 = value1 |
           |      | 0101100001010100110101010010 | name2 = value2 |
           |      | 0101100001010100110101010010 | nameN = valueN |
           \------+------------------------------+----------------/

.. note:: An object ID is unique across the entire cluster, not just the local
   filesystem.


.. index:: architecture; high availability, scalability

.. _arch_scalability_and_high_availability:

Scalability and High Availability
---------------------------------

In traditional architectures, clients talk to a centralized component. This
centralized component might be a gateway, a broker, an API, or a facade. A
centralized component of this kind acts as a single point of entry to a complex
subsystem. Architectures that rely upon such a centralized component have a
single point of failure and incur limits to performance and scalability. If
the centralized component goes down, the whole system becomes unavailable.

Ceph eliminates this centralized component. This enables clients to interact
with Ceph OSDs directly. Ceph OSDs create object replicas on other Ceph Nodes
to ensure data safety and high availability. Ceph also uses a cluster of
monitors to ensure high availability. To eliminate centralization, Ceph uses an
algorithm called :abbr:`CRUSH (Controlled Replication Under Scalable Hashing)`.


.. index:: CRUSH; architecture

CRUSH Introduction
~~~~~~~~~~~~~~~~~~

Ceph Clients and Ceph OSD Daemons both use the :abbr:`CRUSH (Controlled
Replication Under Scalable Hashing)` algorithm to compute information about
object location instead of relying upon a central lookup table. CRUSH provides
a better data management mechanism than do older approaches, and CRUSH enables
massive scale by distributing the work to all the OSD daemons in the cluster
and all the clients that communicate with them. CRUSH uses intelligent data
replication to ensure resiliency, which is better suited to hyper-scale
storage. The following sections provide additional details on how CRUSH works.
For an in-depth, academic discussion of CRUSH, see `CRUSH - Controlled,
Scalable, Decentralized Placement of Replicated Data`_.

.. index:: architecture; cluster map

.. _architecture_cluster_map:

Cluster Map
~~~~~~~~~~~

In order for a Ceph cluster to function properly, Ceph Clients and Ceph OSDs
must have current information about the cluster's topology. Current information
is stored in the "Cluster Map", which is in fact a collection of five maps. The
five maps that constitute the cluster map are:

#. **The Monitor Map:** Contains the cluster ``fsid``, the position, the name,
   the address, and the TCP port of each monitor. The monitor map specifies the
   current epoch, the time of the monitor map's creation, and the time of the
   monitor map's last modification.  To view a monitor map, run ``ceph mon
   dump``.

#. **The OSD Map:** Contains the cluster ``fsid``, the time of the OSD map's
   creation, the time of the OSD map's last modification, a list of pools, a
   list of replica sizes, a list of PG numbers, and a list of OSDs and their
   statuses (for example, ``up``, ``in``). To view an OSD map, run ``ceph
   osd dump``.

#. **The PG Map:** Contains the PG version, its time stamp, the last OSD map
   epoch, the full ratios, and the details of each placement group. This
   includes the PG ID, the `Up Set`, the `Acting Set`, the state of the PG (for
   example, ``active + clean``), and data usage statistics for each pool.

#. **The CRUSH Map:** Contains a list of storage devices, the failure domain
   hierarchy (for example, ``device``, ``host``, ``rack``, ``row``, ``room``),
   and rules for traversing the hierarchy when storing data. To view a CRUSH
   map, run ``ceph osd getcrushmap -o {filename}`` and then decompile it by
   running ``crushtool -d {comp-crushmap-filename} -o
   {decomp-crushmap-filename}``. Use a text editor or ``cat`` to view the
   decompiled map.

#. **The MDS Map:** Contains the current MDS map epoch, when the map was
   created, and the last time it changed. It also contains the pool for
   storing metadata, a list of metadata servers, and which metadata servers
   are ``up`` and ``in``. To view an MDS map, execute ``ceph fs dump``.

Each map maintains a history of changes to its operating state. Ceph Monitors
maintain a master copy of the cluster map. This master copy includes the
cluster members, the state of the cluster, changes to the cluster, and
information recording the overall health of the Ceph Storage Cluster.

.. index:: high availability; monitor architecture

High Availability Monitors
~~~~~~~~~~~~~~~~~~~~~~~~~~

A Ceph Client must contact a Ceph Monitor and obtain a current copy of the
cluster map in order to read data from or to write data to the Ceph cluster.

It is possible for a Ceph cluster to function properly with only a single
monitor, but a Ceph cluster that has only a single monitor has a single point
of failure: if the monitor goes down, Ceph clients will be unable to read data
from or write data to the cluster.

Ceph leverages a cluster of monitors in order to increase reliability and fault
tolerance. When a cluster of monitors is used, however, one or more of the
monitors in the cluster can fall behind due to latency or other faults. Ceph
mitigates these negative effects by requiring multiple monitor instances to
agree about the state of the cluster. To establish consensus among the monitors
regarding the state of the cluster, Ceph uses the `Paxos`_ algorithm and a
majority of monitors (for example, one in a cluster that contains only one
monitor, two in a cluster that contains three monitors, three in a cluster that
contains five monitors, four in a cluster that contains six monitors, and so
on).

See the `Monitor Config Reference`_ for more detail on configuring monitors.

.. index:: architecture; high availability authentication

.. _arch_high_availability_authentication:

High Availability Authentication
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``cephx`` authentication system is used by Ceph to authenticate users and
daemons and to protect against man-in-the-middle attacks.

.. note:: The ``cephx`` protocol does not address data encryption in transport
   (for example, SSL/TLS) or encryption at rest.

``cephx`` uses shared secret keys for authentication. This means that both the
client and the monitor cluster keep a copy of the client's secret key.

The ``cephx`` protocol makes it possible for each party to prove to the other
that it has a copy of the key without revealing it. This provides mutual
authentication and allows the cluster to confirm (1) that the user has the
secret key and (2) that the user can be confident that the cluster has a copy
of the secret key.

As stated in :ref:`Scalability and High Availability
<arch_scalability_and_high_availability>`, Ceph does not have any centralized
interface between clients and the Ceph object store. By avoiding such a
centralized interface, Ceph avoids the bottlenecks that attend such centralized
interfaces. However, this means that clients must interact directly with OSDs.
Direct interactions between Ceph clients and OSDs require authenticated
connections. The ``cephx`` authentication system establishes and sustains these
authenticated connections.

The ``cephx`` protocol operates in a manner similar to `Kerberos`_.

A user invokes a Ceph client to contact a monitor. Unlike Kerberos, each
monitor can authenticate users and distribute keys, which means that there is
no single point of failure and no bottleneck when using ``cephx``. The monitor
returns an authentication data structure that is similar to a Kerberos ticket.
This authentication data structure contains a session key for use in obtaining
Ceph services. The session key is itself encrypted with the user's permanent
secret key, which means that only the user can request services from the Ceph
Monitors. The client then uses the session key to request services from the
monitors, and the monitors provide the client with a ticket that authenticates
the client against the OSDs that actually handle data. Ceph Monitors and OSDs
share a secret, which means that the clients can use the ticket provided by the
monitors to authenticate against any OSD or metadata server in the cluster.

Like Kerberos tickets, ``cephx`` tickets expire. An attacker cannot use an
expired ticket or session key that has been obtained surreptitiously. This form
of authentication prevents attackers who have access to the communications
medium from creating bogus messages under another user's identity and prevents
attackers from altering another user's legitimate messages, as long as the
user's secret key is not divulged before it expires.

An administrator must set up users before using ``cephx``.  In the following
diagram, the ``client.admin`` user invokes ``ceph auth get-or-create-key`` from
the command line to generate a username and secret key. Ceph's ``auth``
subsystem generates the username and key, stores a copy on the monitor(s), and
transmits the user's secret back to the ``client.admin`` user. This means that
the client and the monitor share a secret key.

.. note:: The ``client.admin`` user must provide the user ID and
   secret key to the user in a secure manner.

.. ditaa::

           +---------+     +---------+
           | Client  |     | Monitor |
           +---------+     +---------+
                |  request to   |
                | create a user |
                |-------------->|----------+ create user
                |               |          | and
                |<--------------|<---------+ store key
                | transmit key  |
                |               |

Here is how a client authenticates with a monitor. The client passes the user
name to the monitor. The monitor generates a session key that is encrypted with
the secret key associated with the ``username``. The monitor transmits the
encrypted ticket to the client. The client uses the shared secret key to
decrypt the payload. The session key identifies the user, and this act of
identification will last for the duration of the session.  The client requests
a ticket for the user, and the ticket is signed with the session key. The
monitor generates a ticket and uses the user's secret key to encrypt it. The
encrypted ticket is transmitted to the client. The client decrypts the ticket
and uses it to sign requests to OSDs and to metadata servers in the cluster.

.. ditaa::

           +---------+     +---------+
           | Client  |     | Monitor |
           +---------+     +---------+
                |  authenticate |
                |-------------->|----------+ generate and
                |               |          | encrypt
                |<--------------|<---------+ session key
                | transmit      |
                | encrypted     |
                | session key   |
                |               |
                |-----+ decrypt |
                |     | session |
                |<----+ key     |
                |               |
                |  req. ticket  |
                |-------------->|----------+ generate and
                |               |          | encrypt
                |<--------------|<---------+ ticket
                | recv. ticket  |
                |               |
                |-----+ decrypt |
                |     | ticket  |
                |<----+         |


The ``cephx`` protocol authenticates ongoing communications between the clients
and Ceph daemons. After initial authentication, each message sent between a
client and a daemon is signed using a ticket that can be verified by monitors,
OSDs, and metadata daemons. This ticket is verified by using the secret shared
between the client and the daemon.

.. ditaa::

           +---------+     +---------+     +-------+     +-------+
           |  Client |     | Monitor |     |  MDS  |     |  OSD  |
           +---------+     +---------+     +-------+     +-------+
                |  request to   |              |             |
                | create a user |              |             |
                |-------------->| mon and      |             |
                |<--------------| client share |             |
                |    receive    | a secret.    |             |
                | shared secret |              |             |
                |               |<------------>|             |
                |               |<-------------+------------>|
                |               | mon, mds,    |             |
                | authenticate  | and osd      |             |
                |-------------->| share        |             |
                |<--------------| a secret     |             |
                |  session key  |              |             |
                |               |              |             |
                |  req. ticket  |              |             |
                |-------------->|              |             |
                |<--------------|              |             |
                | recv. ticket  |              |             |
                |               |              |             |
                |   make request (CephFS only) |             |
                |----------------------------->|             |
                |<-----------------------------|             |
                | receive response (CephFS only)             |
                |                                            |
                |                make request                |
                |------------------------------------------->|
                |<-------------------------------------------|
                               receive response

This authentication protects only the connections between Ceph clients and Ceph
daemons. The authentication is not extended beyond the Ceph client. If a user
accesses the Ceph client from a remote host, cephx authentication will not be
applied to the connection between the user's host and the client host.

See `Cephx Config Guide`_ for more on configuration details.

See `User Management`_ for more on user management.

See :ref:`A Detailed Description of the Cephx Authentication Protocol
<cephx_2012_peter>` for more on the distinction between authorization and
authentication and for a step-by-step explanation of the setup of ``cephx``
tickets and session keys.

.. index:: architecture; smart daemons and scalability

Smart Daemons Enable Hyperscale
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A feature of many storage clusters is a centralized interface that keeps track
of the nodes that clients are permitted to access. Such centralized
architectures provide services to clients by means of a double dispatch. At the
petabyte-to-exabyte scale, such double dispatches are a significant
bottleneck.

Ceph obviates this bottleneck: Ceph's OSD Daemons AND Ceph clients are
cluster-aware. Like Ceph clients, each Ceph OSD Daemon is aware of other Ceph
OSD Daemons in the cluster. This enables Ceph OSD Daemons to interact directly
with other Ceph OSD Daemons and to interact directly with Ceph Monitors.  Being
cluster-aware makes it possible for Ceph clients to interact directly with Ceph
OSD Daemons.

Because Ceph clients, Ceph monitors, and Ceph OSD daemons interact with one
another directly, Ceph OSD daemons can make use of the aggregate CPU and RAM
resources of the nodes in the Ceph cluster. This means that a Ceph cluster can
easily perform tasks that a cluster with a centralized interface would struggle
to perform. The ability of Ceph nodes to make use of the computing power of
the greater cluster provides several benefits:

#. **OSDs Service Clients Directly:** Network devices can support only a
   limited number of concurrent connections. Because Ceph clients contact
   Ceph OSD daemons directly without first connecting to a central interface,
   Ceph enjoys improved perfomance and increased system capacity relative to
   storage redundancy strategies that include a central interface. Ceph clients
   maintain sessions only when needed, and maintain those sessions with only
   particular Ceph OSD daemons, not with a centralized interface.

#. **OSD Membership and Status**: When Ceph OSD Daemons join a cluster, they
   report their status. At the lowest level, the Ceph OSD Daemon status is
   ``up`` or ``down``: this reflects whether the Ceph OSD daemon is running and
   able to service Ceph Client requests. If a Ceph OSD Daemon is ``down`` and
   ``in`` the Ceph Storage Cluster, this status may indicate the failure of the
   Ceph OSD Daemon. If a Ceph OSD Daemon is not running because it has crashed,
   the Ceph OSD Daemon cannot notify the Ceph Monitor that it is ``down``. The
   OSDs periodically send messages to the Ceph Monitor (in releases prior to
   Luminous, this was done by means of ``MPGStats``, and beginning with the
   Luminous release, this has been done with ``MOSDBeacon``). If the Ceph
   Monitors receive no such message after a configurable period of time,
   then they mark the OSD ``down``. This mechanism is a failsafe, however.
   Normally, Ceph OSD Daemons determine if a neighboring OSD is ``down`` and
   report it to the Ceph Monitors. This contributes to making Ceph Monitors
   lightweight processes. See `Monitoring OSDs`_ and `Heartbeats`_ for
   additional details.

#. **Data Scrubbing:** To maintain data consistency, Ceph OSD Daemons scrub
   RADOS objects. Ceph OSD Daemons compare the metadata of their own local
   objects against the metadata of the replicas of those objects, which are
   stored on other OSDs. Scrubbing occurs on a per-Placement-Group basis, finds
   mismatches in object size and finds metadata mismatches, and is usually
   performed daily. Ceph OSD Daemons perform deeper scrubbing by comparing the
   data in objects, bit-for-bit, against their checksums. Deep scrubbing finds
   bad sectors on drives that are not detectable with light scrubs. See `Data
   Scrubbing`_ for details on configuring scrubbing.

#. **Replication:** Data replication involves a collaboration between Ceph
   Clients and Ceph OSD Daemons. Ceph OSD Daemons use the CRUSH algorithm to
   determine the storage location of object replicas. Ceph clients use the
   CRUSH algorithm to determine the storage location of an object, then the
   object is mapped to a pool and to a placement group, and then the client
   consults the CRUSH map to identify the placement group's primary OSD.

   After identifying the target placement group, the client writes the object
   to the identified placement group's primary OSD. The primary OSD then
   consults its own copy of the CRUSH map to identify secondary and tertiary
   OSDS, replicates the object to the placement groups in those secondary and
   tertiary OSDs, confirms that the object was stored successfully in the
   secondary and tertiary OSDs, and reports to the client that the object
   was stored successfully.

.. ditaa::

             +----------+
             |  Client  |
             |          |
             +----------+
                 *  ^
      Write (1)  |  |  Ack (6)
                 |  |
                 v  *
            +-------------+
            | Primary OSD |
            |             |
            +-------------+
              *  ^   ^  *
    Write (2) |  |   |  |  Write (3)
       +------+  |   |  +------+
       |  +------+   +------+  |
       |  | Ack (4)  Ack (5)|  |
       v  *                 *  v
 +---------------+   +---------------+
 | Secondary OSD |   | Tertiary OSD  |
 |               |   |               |
 +---------------+   +---------------+

By performing this act of data replication, Ceph OSD Daemons relieve Ceph
clients of the burden of replicating data.

Dynamic Cluster Management
--------------------------

In the `Scalability and High Availability`_ section, we explained how Ceph uses
CRUSH, cluster topology, and intelligent daemons to scale and maintain high
availability. Key to Ceph's design is the autonomous, self-healing, and
intelligent Ceph OSD Daemon. Let's take a deeper look at how CRUSH works to
enable modern cloud storage infrastructures to place data, rebalance the
cluster, and adaptively place and balance data and recover from faults.

.. index:: architecture; pools

About Pools
~~~~~~~~~~~

The Ceph storage system supports the notion of 'Pools', which are logical
partitions for storing objects.

Ceph Clients retrieve a `Cluster Map`_ from a Ceph Monitor, and write RADOS
objects to pools. The way that Ceph places the data in the pools is determined
by the pool's ``size`` or number of replicas, the CRUSH rule, and the number of
placement groups in the pool.

.. ditaa::

            +--------+  Retrieves  +---------------+
            | Client |------------>|  Cluster Map  |
            +--------+             +---------------+
                 |
                 v      Writes
              /-----\
              | obj |
              \-----/
                 |      To
                 v
            +--------+           +---------------+
            |  Pool  |---------->|  CRUSH Rule   |
            +--------+  Selects  +---------------+


Pools set at least the following parameters:

- Ownership/Access to Objects
- The Number of Placement Groups, and
- The CRUSH Rule to Use.

See `Set Pool Values`_ for details.


.. index: architecture; placement group mapping

Mapping PGs to OSDs
~~~~~~~~~~~~~~~~~~~

Each pool has a number of placement groups (PGs) within it. CRUSH dynamically
maps PGs to OSDs. When a Ceph Client stores objects, CRUSH maps each RADOS
object to a PG.

This mapping of RADOS objects to PGs implements an abstraction and indirection
layer between Ceph OSD Daemons and Ceph Clients. The Ceph Storage Cluster must
be able to grow (or shrink) and redistribute data adaptively when the internal
topology changes.

If the Ceph Client "knew" which Ceph OSD Daemons were storing which objects, a
tight coupling would exist between the Ceph Client and the Ceph OSD Daemon.
But Ceph avoids any such tight coupling. Instead, the CRUSH algorithm maps each
RADOS object to a placement group and then maps each placement group to one or
more Ceph OSD Daemons. This "layer of indirection" allows Ceph to rebalance
dynamically when new Ceph OSD Daemons and their underlying OSD devices come
online. The following diagram shows how the CRUSH algorithm maps objects to
placement groups, and how it maps placement groups to OSDs.

.. ditaa::

           /-----\  /-----\  /-----\  /-----\  /-----\
           | obj |  | obj |  | obj |  | obj |  | obj |
           \-----/  \-----/  \-----/  \-----/  \-----/
              |        |        |        |        |
              +--------+--------+        +---+----+
              |                              |
              v                              v
   +-----------------------+      +-----------------------+
   |  Placement Group #1   |      |  Placement Group #2   |
   |                       |      |                       |
   +-----------------------+      +-----------------------+
               |                              |
               |      +-----------------------+---+
        +------+------+-------------+             |
        |             |             |             |
        v             v             v             v
   /----------\  /----------\  /----------\  /----------\
   |          |  |          |  |          |  |          |
   |  OSD #1  |  |  OSD #2  |  |  OSD #3  |  |  OSD #4  |
   |          |  |          |  |          |  |          |
   \----------/  \----------/  \----------/  \----------/

The client uses its copy of the cluster map and the CRUSH algorithm to compute
precisely which OSD it will use when reading or writing a particular object.

.. index:: architecture; calculating PG IDs

Calculating PG IDs
~~~~~~~~~~~~~~~~~~

When a Ceph Client binds to a Ceph Monitor, it retrieves the latest version of
the `Cluster Map`_. When a client has been equipped with a copy of the cluster
map, it is aware of all the monitors, OSDs, and metadata servers in the
cluster. **However, even equipped with a copy of the latest version of the
cluster map, the client doesn't know anything about object locations.**

**Object locations must be computed.**

The client requires only the object ID and the name of the pool in order to
compute the object location.

Ceph stores data in named pools (for example,  "liverpool"). When a client
stores a named object (for example, "john", "paul", "george", or "ringo") it
calculates a placement group by using the object name, a hash code, the number
of PGs in the pool, and the pool name. Ceph clients use the following steps to
compute PG IDs.

#. The client inputs the pool name and the object ID. (for example: pool =
   "liverpool" and object-id = "john")
#. Ceph hashes the object ID.
#. Ceph calculates the hash, modulo the number of PGs (for example: ``58``), to
   get a PG ID.
#. Ceph uses the pool name to retrieve the pool ID: (for example: "liverpool" =
   ``4``)
#. Ceph prepends the pool ID to the PG ID (for example: ``4.58``).

It is much faster to compute object locations than to perform object location
query over a chatty session. The :abbr:`CRUSH (Controlled Replication Under
Scalable Hashing)` algorithm allows a client to compute where objects are
expected to be stored, and enables the client to contact the primary OSD to
store or retrieve the objects.

.. index:: architecture; PG Peering

Peering and Sets
~~~~~~~~~~~~~~~~

In previous sections, we noted that Ceph OSD Daemons check each other's
heartbeats and report back to Ceph Monitors. Ceph OSD daemons also 'peer',
which is the process of bringing all of the OSDs that store a Placement Group
(PG) into agreement about the state of all of the RADOS objects (and their
metadata) in that PG. Ceph OSD Daemons `Report Peering Failure`_ to the Ceph
Monitors. Peering issues usually resolve themselves; however, if the problem
persists, you may need to refer to the `Troubleshooting Peering Failure`_
section.

.. Note:: PGs that agree on the state of the cluster do not necessarily have
   the current data yet.

The Ceph Storage Cluster was designed to store at least two copies of an object
(that is, ``size = 2``), which is the minimum requirement for data safety. For
high availability, a Ceph Storage Cluster should store more than two copies of
an object (that is, ``size = 3`` and ``min size = 2``) so that it can continue
to run in a ``degraded`` state while maintaining data safety.

.. warning:: Although we say here that R2 (replication with two copies) is the
   minimum requirement for data safety, R3 (replication with three copies) is
   recommended. On a long enough timeline, data stored with an R2 strategy will
   be lost.

As explained in the diagram in `Smart Daemons Enable Hyperscale`_, we do not
name the Ceph OSD Daemons specifically (for example, ``osd.0``, ``osd.1``,
etc.), but rather refer to them as *Primary*, *Secondary*, and so forth. By
convention, the *Primary* is the first OSD in the *Acting Set*, and is
responsible for orchestrating the peering process for each placement group
where it acts as the *Primary*. The *Primary* is the **ONLY** OSD in a given
placement group that accepts client-initiated writes to objects.

The set of OSDs that is responsible for a placement group is called the
*Acting Set*. The term "*Acting Set*" can refer either to the Ceph OSD Daemons
that are currently responsible for the placement group, or to the Ceph OSD
Daemons that were responsible for a particular placement group as of some
epoch.

The Ceph OSD daemons that are part of an *Acting Set* might not always be
``up``. When an OSD in the *Acting Set* is ``up``, it is part of the *Up Set*.
The *Up Set* is an important distinction, because Ceph can remap PGs to other
Ceph OSD Daemons when an OSD fails.

.. note:: Consider a hypothetical *Acting Set* for a PG that contains
   ``osd.25``, ``osd.32`` and ``osd.61``. The first OSD (``osd.25``), is the
   *Primary*. If that OSD fails, the Secondary (``osd.32``), becomes the
   *Primary*, and ``osd.25`` is removed from the *Up Set*.

.. index:: architecture; Rebalancing

Rebalancing
~~~~~~~~~~~

When you add a Ceph OSD Daemon to a Ceph Storage Cluster, the cluster map gets
updated with the new OSD. Referring back to `Calculating PG IDs`_, this changes
the cluster map. Consequently, it changes object placement, because it changes
an input for the calculations. The following diagram depicts the rebalancing
process (albeit rather crudely, since it is substantially less impactful with
large clusters) where some, but not all of the PGs migrate from existing OSDs
(OSD 1, and OSD 2) to the new OSD (OSD 3). Even when rebalancing, CRUSH is
stable. Many of the placement groups remain in their original configuration,
and each OSD gets some added capacity, so there are no load spikes on the
new OSD after rebalancing is complete.


.. ditaa::

           +--------+     +--------+
   Before  |  OSD 1 |     |  OSD 2 |
           +--------+     +--------+
           |  PG #1 |     | PG #6  |
           |  PG #2 |     | PG #7  |
           |  PG #3 |     | PG #8  |
           |  PG #4 |     | PG #9  |
           |  PG #5 |     | PG #10 |
           +--------+     +--------+

           +--------+     +--------+     +--------+
    After  |  OSD 1 |     |  OSD 2 |     |  OSD 3 |
           +--------+     +--------+     +--------+
           |  PG #1 |     | PG #7  |     |  PG #3 |
           |  PG #2 |     | PG #8  |     |  PG #6 |
           |  PG #4 |     | PG #10 |     |  PG #9 |
           |  PG #5 |     |        |     |        |
           |        |     |        |     |        |
           +--------+     +--------+     +--------+


.. index:: architecture; Data Scrubbing

Data Consistency
~~~~~~~~~~~~~~~~

As part of maintaining data consistency and cleanliness, Ceph OSDs also scrub
objects within placement groups. That is, Ceph OSDs compare object metadata in
one placement group with its replicas in placement groups stored in other
OSDs. Scrubbing (usually performed daily) catches OSD bugs or filesystem
errors, often as a result of hardware issues.  OSDs also perform deeper
scrubbing by comparing data in objects bit-for-bit.  Deep scrubbing (by default
performed weekly) finds bad blocks on a drive that weren't apparent in a light
scrub.

See `Data Scrubbing`_ for details on configuring scrubbing.


.. index:: erasure coding

Erasure Coding
--------------

An erasure coded pool stores each object as ``K+M`` chunks. It is divided into
``K`` data chunks and ``M`` coding chunks. The pool is configured to have a size
of ``K+M`` so that each chunk is stored in an OSD in the acting set. The rank of
the chunk is stored as an attribute of the object.

For instance an erasure coded pool can be created to use five OSDs (``K+M = 5``) and
sustain the loss of two of them (``M = 2``). Data may be unavailable until (``K+1``)
shards are restored.

Reading and Writing Encoded Chunks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When the object **NYAN** containing ``ABCDEFGHI`` is written to the pool, the erasure
encoding function splits the content into three data chunks simply by dividing
the content in three: the first contains ``ABC``, the second ``DEF`` and the
last ``GHI``. The content will be padded if the content length is not a multiple
of ``K``. The function also creates two coding chunks: the fourth with ``YXY``
and the fifth with ``QGC``. Each chunk is stored in an OSD in the acting set.
The chunks are stored in objects that have the same name (**NYAN**) but reside
on different OSDs. The order in which the chunks were created must be preserved
and is stored as an attribute of the object (``shard_t``), in addition to its
name. Chunk 1 contains ``ABC`` and is stored on **OSD5** while chunk 4 contains
``YXY`` and is stored on **OSD3**.


.. ditaa::

                            +-------------------+
                       name |       NYAN        |
                            +-------------------+
                    content |     ABCDEFGHI     |
                            +--------+----------+
                                     |
                                     |
                                     v
                              +------+------+
              +---------------+ encode(3,2) +-----------+
              |               +--+--+---+---+           |
              |                  |  |   |               |
              |          +-------+  |   +-----+         |
              |          |          |         |         |
           +--v---+   +--v---+   +--v---+  +--v---+  +--v---+
     name  | NYAN |   | NYAN |   | NYAN |  | NYAN |  | NYAN |
           +------+   +------+   +------+  +------+  +------+
    shard  |  1   |   |  2   |   |  3   |  |  4   |  |  5   |
           +------+   +------+   +------+  +------+  +------+
  content  | ABC  |   | DEF  |   | GHI  |  | YXY  |  | QGC  |
           +--+---+   +--+---+   +--+---+  +--+---+  +--+---+
              |          |          |         |         |
              |          |          v         |         |
              |          |       +--+---+     |         |
              |          |       | OSD1 |     |         |
              |          |       +------+     |         |
              |          |                    |         |
              |          |       +------+     |         |
              |          +------>| OSD2 |     |         |
              |                  +------+     |         |
              |                               |         |
              |                  +------+     |         |
              |                  | OSD3 |<----+         |
              |                  +------+               |
              |                                         |
              |                  +------+               |
              |                  | OSD4 |<--------------+
              |                  +------+
              |
              |                  +------+
              +----------------->| OSD5 |
                                 +------+


When the object **NYAN** is read from the erasure coded pool, the decoding
function reads three chunks: chunk 1 containing ``ABC``, chunk 3 containing
``GHI`` and chunk 4 containing ``YXY``. Then, it rebuilds the original content
of the object ``ABCDEFGHI``. The decoding function is informed that the chunks 2
and 5 are missing (they are called 'erasures'). The chunk 5 could not be read
because the **OSD4** is out. The decoding function can be called as soon as
three chunks are read: **OSD2** was the slowest and its chunk was not taken into
account.

.. ditaa::

	                         +-------------------+
	                    name |       NYAN        |
	                         +-------------------+
	                 content |     ABCDEFGHI     |
	                         +---------+---------+
	                                   ^
	                                   |
	                                   |
	                           +-------+-------+
	                           |  decode(3,2)  |
	            +------------->+  erasures 2,5 +<-+
	            |              |               |  |
	            |              +-------+-------+  |
	            |                      ^          |
	            |                      |          |
	            |                      |          |
	         +--+---+   +------+   +---+--+   +---+--+
	   name  | NYAN |   | NYAN |   | NYAN |   | NYAN |
	         +------+   +------+   +------+   +------+
	  shard  |  1   |   |  2   |   |  3   |   |  4   |
	         +------+   +------+   +------+   +------+
	content  | ABC  |   | DEF  |   | GHI  |   | YXY  |
	         +--+---+   +--+---+   +--+---+   +--+---+
	            ^          .          ^          ^
	            |    TOO   .          |          |
	            |    SLOW  .       +--+---+      |
	            |          ^       | OSD1 |      |
	            |          |       +------+      |
	            |          |                     |
	            |          |       +------+      |
	            |          +-------| OSD2 |      |
	            |                  +------+      |
	            |                                |
	            |                  +------+      |
	            |                  | OSD3 |------+
	            |                  +------+
	            |
	            |                  +------+
	            |                  | OSD4 | OUT
	            |                  +------+
	            |
	            |                  +------+
	            +------------------| OSD5 |
	                               +------+


Interrupted Full Writes
~~~~~~~~~~~~~~~~~~~~~~~

In an erasure coded pool, the primary OSD in the up set receives all write
operations. It is responsible for encoding the payload into ``K+M`` chunks and
sends them to the other OSDs. It is also responsible for maintaining an
authoritative version of the placement group logs.

In the following diagram, an erasure coded placement group has been created with
``K = 2, M = 1`` and is supported by three OSDs, two for ``K`` and one for
``M``. The acting set of the placement group is made of **OSD 1**, **OSD 2** and
**OSD 3**. An object has been encoded and stored in the OSDs : the chunk
``D1v1`` (i.e. Data chunk number 1, version 1) is on **OSD 1**, ``D2v1`` on
**OSD 2** and ``C1v1`` (i.e. Coding chunk number 1, version 1) on **OSD 3**. The
placement group logs on each OSD are identical (i.e. ``1,1`` for epoch 1,
version 1).


.. ditaa::

     Primary OSD

   +-------------+
   |    OSD 1    |             +-------------+
   |         log |  Write Full |             |
   |  +----+     |<------------+ Ceph Client |
   |  |D1v1| 1,1 |      v1     |             |
   |  +----+     |             +-------------+
   +------+------+
          |
          |
          |          +-------------+
          |          |    OSD 2    |
          |          |         log |
          +--------->+  +----+     |
          |          |  |D2v1| 1,1 |
          |          |  +----+     |
          |          +-------------+
          |
          |          +-------------+
          |          |    OSD 3    |
          |          |         log |
          +--------->|  +----+     |
                     |  |C1v1| 1,1 |
                     |  +----+     |
                     +-------------+

**OSD 1** is the primary and receives a **WRITE FULL** from a client, which
means the payload is to replace the object entirely instead of overwriting a
portion of it. Version 2 (v2) of the object is created to override version 1
(v1). **OSD 1** encodes the payload into three chunks: ``D1v2`` (i.e. Data
chunk number 1 version 2) will be on **OSD 1**, ``D2v2`` on **OSD 2** and
``C1v2`` (i.e. Coding chunk number 1 version 2) on **OSD 3**. Each chunk is sent
to the target OSD, including the primary OSD which is responsible for storing
chunks in addition to handling write operations and maintaining an authoritative
version of the placement group logs. When an OSD receives the message
instructing it to write the chunk, it also creates a new entry in the placement
group logs to reflect the change. For instance, as soon as **OSD 3** stores
``C1v2``, it adds the entry ``1,2`` ( i.e. epoch 1, version 2 ) to its logs.
Because the OSDs work asynchronously, some chunks may still be in flight ( such
as ``D2v2`` ) while others are acknowledged and persisted to storage drives
(such as ``C1v1`` and ``D1v1``).

.. ditaa::

     Primary OSD

   +-------------+
   |    OSD 1    |
   |         log |
   |  +----+     |             +-------------+
   |  |D1v2| 1,2 |  Write Full |             |
   |  +----+     +<------------+ Ceph Client |
   |             |      v2     |             |
   |  +----+     |             +-------------+
   |  |D1v1| 1,1 |
   |  +----+     |
   +------+------+
          |
          |
          |           +------+------+
          |           |    OSD 2    |
          |  +------+ |         log |
          +->| D2v2 | |  +----+     |
          |  +------+ |  |D2v1| 1,1 |
          |           |  +----+     |
          |           +-------------+
          |
          |           +-------------+
          |           |    OSD 3    |
          |           |         log |
          |           |  +----+     |
          |           |  |C1v2| 1,2 |
          +---------->+  +----+     |
                      |             |
                      |  +----+     |
                      |  |C1v1| 1,1 |
                      |  +----+     |
                      +-------------+


If all goes well, the chunks are acknowledged on each OSD in the acting set and
the logs' ``last_complete`` pointer can move from ``1,1`` to ``1,2``.

.. ditaa::

     Primary OSD

   +-------------+
   |    OSD 1    |
   |         log |
   |  +----+     |             +-------------+
   |  |D1v2| 1,2 |  Write Full |             |
   |  +----+     +<------------+ Ceph Client |
   |             |      v2     |             |
   |  +----+     |             +-------------+
   |  |D1v1| 1,1 |
   |  +----+     |
   +------+------+
          |
          |           +-------------+
          |           |    OSD 2    |
          |           |         log |
          |           |  +----+     |
          |           |  |D2v2| 1,2 |
          +---------->+  +----+     |
          |           |             |
          |           |  +----+     |
          |           |  |D2v1| 1,1 |
          |           |  +----+     |
          |           +-------------+
          |
          |           +-------------+
          |           |    OSD 3    |
          |           |         log |
          |           |  +----+     |
          |           |  |C1v2| 1,2 |
          +---------->+  +----+     |
                      |             |
                      |  +----+     |
                      |  |C1v1| 1,1 |
                      |  +----+     |
                      +-------------+


Finally, the files used to store the chunks of the previous version of the
object can be removed: ``D1v1`` on **OSD 1**, ``D2v1`` on **OSD 2** and ``C1v1``
on **OSD 3**.

.. ditaa::

     Primary OSD

   +-------------+
   |    OSD 1    |
   |         log |
   |  +----+     |
   |  |D1v2| 1,2 |
   |  +----+     |
   +------+------+
          |
          |
          |          +-------------+
          |          |    OSD 2    |
          |          |         log |
          +--------->+  +----+     |
          |          |  |D2v2| 1,2 |
          |          |  +----+     |
          |          +-------------+
          |
          |          +-------------+
          |          |    OSD 3    |
          |          |         log |
          +--------->|  +----+     |
                     |  |C1v2| 1,2 |
                     |  +----+     |
                     +-------------+


But accidents happen. If **OSD 1** goes down while ``D2v2`` is still in flight,
the object's version 2 is partially written: **OSD 3** has one chunk but that is
not enough to recover. It lost two chunks: ``D1v2`` and ``D2v2`` and the
erasure coding parameters ``K = 2``, ``M = 1`` require that at least two chunks are
available to rebuild the third. **OSD 4** becomes the new primary and finds that
the ``last_complete`` log entry (i.e., all objects before this entry were known
to be available on all OSDs in the previous acting set ) is ``1,1`` and that
will be the head of the new authoritative log.

.. ditaa::

   +-------------+
   |    OSD 1    |
   |   (down)    |
   | c333        |
   +------+------+
          |
          |           +-------------+
          |           |    OSD 2    |
          |           |         log |
          |           |  +----+     |
          +---------->+  |D2v1| 1,1 |
          |           |  +----+     |
          |           |             |
          |           +-------------+
          |
          |           +-------------+
          |           |    OSD 3    |
          |           |         log |
          |           |  +----+     |
          |           |  |C1v2| 1,2 |
          +---------->+  +----+     |
                      |             |
                      |  +----+     |
                      |  |C1v1| 1,1 |
                      |  +----+     |
                      +-------------+
     Primary OSD
   +-------------+
   |    OSD 4    |
   |         log |
   |             |
   |         1,1 |
   |             |
   +------+------+


The log entry 1,2 found on **OSD 3** is divergent from the new authoritative log
provided by **OSD 4**: it is discarded and the file containing the ``C1v2``
chunk is removed. The ``D1v1`` chunk is rebuilt with the ``decode`` function of
the erasure coding library during scrubbing and stored on the new primary
**OSD 4**.


.. ditaa::

     Primary OSD

   +-------------+
   |    OSD 4    |
   |         log |
   |  +----+     |
   |  |D1v1| 1,1 |
   |  +----+     |
   +------+------+
          ^
          |
          |          +-------------+
          |          |    OSD 2    |
          |          |         log |
          +----------+  +----+     |
          |          |  |D2v1| 1,1 |
          |          |  +----+     |
          |          +-------------+
          |
          |          +-------------+
          |          |    OSD 3    |
          |          |         log |
          +----------|  +----+     |
                     |  |C1v1| 1,1 |
                     |  +----+     |
                     +-------------+

   +-------------+
   |    OSD 1    |
   |   (down)    |
   | c333        |
   +-------------+

See `Erasure Code Notes`_ for additional details.


Cache Tiering
-------------

.. note:: Cache tiering is deprecated in Reef.

A cache tier provides Ceph Clients with better I/O performance for a subset of
the data stored in a backing storage tier. Cache tiering involves creating a
pool of relatively fast/expensive storage devices (e.g., solid state drives)
configured to act as a cache tier, and a backing pool of either erasure-coded
or relatively slower/cheaper devices configured to act as an economical storage
tier. The Ceph objecter handles where to place the objects and the tiering
agent determines when to flush objects from the cache to the backing storage
tier. So the cache tier and the backing storage tier are completely transparent
to Ceph clients.


.. ditaa::

           +-------------+
           | Ceph Client |
           +------+------+
                  ^
     Tiering is   |
    Transparent   |              Faster I/O
        to Ceph   |           +---------------+
     Client Ops   |           |               |
                  |    +----->+   Cache Tier  |
                  |    |      |               |
                  |    |      +-----+---+-----+
                  |    |            |   ^
                  v    v            |   |   Active Data in Cache Tier
           +------+----+--+         |   |
           |   Objecter   |         |   |
           +-----------+--+         |   |
                       ^            |   |   Inactive Data in Storage Tier
                       |            v   |
                       |      +-----+---+-----+
                       |      |               |
                       +----->|  Storage Tier |
                              |               |
                              +---------------+
                                 Slower I/O

See `Cache Tiering`_ for additional details.  Note that Cache Tiers can be
tricky and their use is now discouraged.


.. index:: Extensibility, Ceph Classes

Extending Ceph
--------------

You can extend Ceph by creating shared object classes called 'Ceph Classes'.
Ceph loads ``.so`` classes stored in the ``osd class dir`` directory dynamically
(i.e., ``$libdir/rados-classes`` by default). When you implement a class, you
can create new object methods that have the ability to call the native methods
in the Ceph Object Store, or other class methods you incorporate via libraries
or create yourself.

On writes, Ceph Classes can call native or class methods, perform any series of
operations on the inbound data and generate a resulting write transaction  that
Ceph will apply atomically.

On reads, Ceph Classes can call native or class methods, perform any series of
operations on the outbound data and return the data to the client.

.. topic:: Ceph Class Example

   A Ceph class for a content management system that presents pictures of a
   particular size and aspect ratio could take an inbound bitmap image, crop it
   to a particular aspect ratio, resize it and embed an invisible copyright or
   watermark to help protect the intellectual property; then, save the
   resulting bitmap image to the object store.

See ``src/objclass/objclass.h``, ``src/fooclass.cc`` and ``src/barclass`` for
exemplary implementations.


Summary
-------

Ceph Storage Clusters are dynamic--like a living organism. Whereas, many storage
appliances do not fully utilize the CPU and RAM of a typical commodity server,
Ceph does. From heartbeats, to  peering, to rebalancing the cluster or
recovering from faults,  Ceph offloads work from clients (and from a centralized
gateway which doesn't exist in the Ceph architecture) and uses the computing
power of the OSDs to perform the work. When referring to `Hardware
Recommendations`_ and the `Network Config Reference`_,  be cognizant of the
foregoing concepts to understand how Ceph utilizes computing resources.

.. index:: Ceph Protocol, librados

Ceph Protocol
=============

Ceph Clients use the native protocol for interacting with the Ceph Storage
Cluster. Ceph packages this functionality into the ``librados`` library so that
you can create your own custom Ceph Clients. The following diagram depicts the
basic architecture.

.. ditaa::

            +---------------------------------+
            |  Ceph Storage Cluster Protocol  |
            |           (librados)            |
            +---------------------------------+
            +---------------+ +---------------+
            |      OSDs     | |    Monitors   |
            +---------------+ +---------------+


Native Protocol and ``librados``
--------------------------------

Modern applications need a simple object storage interface with asynchronous
communication capability. The Ceph Storage Cluster provides a simple object
storage interface with asynchronous communication capability. The interface
provides direct, parallel access to objects throughout the cluster.


- Pool Operations
- Snapshots and Copy-on-write Cloning
- Read/Write Objects
  - Create or Remove
  - Entire Object or Byte Range
  - Append or Truncate
- Create/Set/Get/Remove XATTRs
- Create/Set/Get/Remove Key/Value Pairs
- Compound operations and dual-ack semantics
- Object Classes


.. index:: architecture; watch/notify

Object Watch/Notify
-------------------

A client can register a persistent interest with an object and keep a session to
the primary OSD open. The client can send a notification message and a payload to
all watchers and receive notification when the watchers receive the
notification. This enables a client to use any object as a
synchronization/communication channel.


.. ditaa::

           +----------+     +----------+     +----------+     +---------------+
           | Client 1 |     | Client 2 |     | Client 3 |     | OSD:Object ID |
           +----------+     +----------+     +----------+     +---------------+
                 |                |                |                  |
                 |                |                |                  |
                 |                |  Watch Object  |                  |
                 |--------------------------------------------------->|
                 |                |                |                  |
                 |<---------------------------------------------------|
                 |                |   Ack/Commit   |                  |
                 |                |                |                  |
                 |                |  Watch Object  |                  |
                 |                |---------------------------------->|
                 |                |                |                  |
                 |                |<----------------------------------|
                 |                |   Ack/Commit   |                  |
                 |                |                |   Watch Object   |
                 |                |                |----------------->|
                 |                |                |                  |
                 |                |                |<-----------------|
                 |                |                |    Ack/Commit    |
                 |                |     Notify     |                  |
                 |--------------------------------------------------->|
                 |                |                |                  |
                 |<---------------------------------------------------|
                 |                |     Notify     |                  |
                 |                |                |                  |
                 |                |<----------------------------------|
                 |                |     Notify     |                  |
                 |                |                |<-----------------|
                 |                |                |      Notify      |
                 |                |       Ack      |                  |
                 |----------------+---------------------------------->|
                 |                |                |                  |
                 |                |       Ack      |                  |
                 |                +---------------------------------->|
                 |                |                |                  |
                 |                |                |        Ack       |
                 |                |                |----------------->|
                 |                |                |                  |
                 |<---------------+----------------+------------------|
                 |                     Complete

.. index:: architecture; Striping

Data Striping
-------------

Storage devices have throughput limitations, which impact performance and
scalability. So storage systems often support `striping`_--storing sequential
pieces of information across multiple storage devices--to increase throughput
and performance. The most common form of data striping comes from `RAID`_.
The RAID type most similar to Ceph's striping is `RAID 0`_, or a 'striped
volume'. Ceph's striping offers the throughput of RAID 0 striping, the
reliability of n-way RAID mirroring and faster recovery.

Ceph provides three types of clients: Ceph Block Device, Ceph File System, and
Ceph Object Storage. A Ceph Client converts its data from the representation
format it provides to its users (a block device image, RESTful objects, CephFS
filesystem directories) into objects for storage in the Ceph Storage Cluster.

.. tip:: The objects Ceph stores in the Ceph Storage Cluster are not striped.
   Ceph Object Storage, Ceph Block Device, and the Ceph File System stripe their
   data over multiple Ceph Storage Cluster objects. Ceph Clients that write
   directly to the Ceph Storage Cluster via ``librados`` must perform the
   striping (and parallel I/O) for themselves to obtain these benefits.

The simplest Ceph striping format involves a stripe count of 1 object. Ceph
Clients write stripe units to a Ceph Storage Cluster object until the object is
at its maximum capacity, and then create another object for additional stripes
of data. The simplest form of striping may be sufficient for small block device
images, S3 or Swift objects and CephFS files. However, this simple form doesn't
take maximum advantage of Ceph's ability to distribute data across placement
groups, and consequently doesn't improve performance very much. The following
diagram depicts the simplest form of striping:

.. ditaa::

                        +---------------+
                        |  Client Data  |
                        |     Format    |
                        | cCCC          |
                        +---------------+
                                |
                       +--------+-------+
                       |                |
                       v                v
                 /-----------\    /-----------\
                 | Begin cCCC|    | Begin cCCC|
                 | Object  0 |    | Object  1 |
                 +-----------+    +-----------+
                 |  stripe   |    |  stripe   |
                 |  unit 1   |    |  unit 5   |
                 +-----------+    +-----------+
                 |  stripe   |    |  stripe   |
                 |  unit 2   |    |  unit 6   |
                 +-----------+    +-----------+
                 |  stripe   |    |  stripe   |
                 |  unit 3   |    |  unit 7   |
                 +-----------+    +-----------+
                 |  stripe   |    |  stripe   |
                 |  unit 4   |    |  unit 8   |
                 +-----------+    +-----------+
                 | End cCCC  |    | End cCCC  |
                 | Object 0  |    | Object 1  |
                 \-----------/    \-----------/


If you anticipate large images sizes, large S3 or Swift objects (e.g., video),
or large CephFS directories, you may see considerable read/write performance
improvements by striping client data over multiple objects within an object set.
Significant write performance occurs when the client writes the stripe units to
their corresponding objects in parallel. Since objects get mapped to different
placement groups and further mapped to different OSDs, each write occurs in
parallel at the maximum write speed. A write to a single drive would be limited
by the head movement (e.g. 6ms per seek) and bandwidth of that one device (e.g.
100MB/s).  By spreading that write over multiple objects (which map to different
placement groups and OSDs) Ceph can reduce the number of seeks per drive and
combine the throughput of multiple drives to achieve much faster write (or read)
speeds.

.. note:: Striping is independent of object replicas. Since CRUSH
   replicates objects across OSDs, stripes get replicated automatically.

In the following diagram, client data gets striped across an object set
(``object set 1`` in the following diagram) consisting of 4 objects, where the
first stripe unit is ``stripe unit 0`` in ``object 0``, and the fourth stripe
unit is ``stripe unit 3`` in ``object 3``. After writing the fourth stripe, the
client determines if the object set is full. If the object set is not full, the
client begins writing a stripe to the first object again (``object 0`` in the
following diagram). If the object set is full, the client creates a new object
set (``object set 2`` in the following diagram), and begins writing to the first
stripe (``stripe unit 16``) in the first object in the new object set (``object
4`` in the diagram below).

.. ditaa::

                          +---------------+
                          |  Client Data  |
                          |     Format    |
                          | cCCC          |
                          +---------------+
                                  |
       +-----------------+--------+--------+-----------------+
       |                 |                 |                 |     +--\
       v                 v                 v                 v        |
 /-----------\     /-----------\     /-----------\     /-----------\  |
 | Begin cCCC|     | Begin cCCC|     | Begin cCCC|     | Begin cCCC|  |
 | Object 0  |     | Object  1 |     | Object  2 |     | Object  3 |  |
 +-----------+     +-----------+     +-----------+     +-----------+  |
 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  |
 |  unit 0   |     |  unit 1   |     |  unit 2   |     |  unit 3   |  |
 +-----------+     +-----------+     +-----------+     +-----------+  |
 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  +-\
 |  unit 4   |     |  unit 5   |     |  unit 6   |     |  unit 7   |    | Object
 +-----------+     +-----------+     +-----------+     +-----------+    +- Set
 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |    |   1
 |  unit 8   |     |  unit 9   |     |  unit 10  |     |  unit 11  |  +-/
 +-----------+     +-----------+     +-----------+     +-----------+  |
 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  |
 |  unit 12  |     |  unit 13  |     |  unit 14  |     |  unit 15  |  |
 +-----------+     +-----------+     +-----------+     +-----------+  |
 | End cCCC  |     | End cCCC  |     | End cCCC  |     | End cCCC  |  |
 | Object 0  |     | Object 1  |     | Object 2  |     | Object 3  |  |
 \-----------/     \-----------/     \-----------/     \-----------/  |
                                                                      |
                                                                   +--/

                                                                   +--\
                                                                      |
 /-----------\     /-----------\     /-----------\     /-----------\  |
 | Begin cCCC|     | Begin cCCC|     | Begin cCCC|     | Begin cCCC|  |
 | Object  4 |     | Object  5 |     | Object  6 |     | Object  7 |  |
 +-----------+     +-----------+     +-----------+     +-----------+  |
 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  |
 |  unit 16  |     |  unit 17  |     |  unit 18  |     |  unit 19  |  |
 +-----------+     +-----------+     +-----------+     +-----------+  |
 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  +-\
 |  unit 20  |     |  unit 21  |     |  unit 22  |     |  unit 23  |    | Object
 +-----------+     +-----------+     +-----------+     +-----------+    +- Set
 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |    |   2
 |  unit 24  |     |  unit 25  |     |  unit 26  |     |  unit 27  |  +-/
 +-----------+     +-----------+     +-----------+     +-----------+  |
 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  |
 |  unit 28  |     |  unit 29  |     |  unit 30  |     |  unit 31  |  |
 +-----------+     +-----------+     +-----------+     +-----------+  |
 | End cCCC  |     | End cCCC  |     | End cCCC  |     | End cCCC  |  |
 | Object 4  |     | Object 5  |     | Object 6  |     | Object 7  |  |
 \-----------/     \-----------/     \-----------/     \-----------/  |
                                                                      |
                                                                   +--/

Three important variables determine how Ceph stripes data:

- **Object Size:** Objects in the Ceph Storage Cluster have a maximum
  configurable size (e.g., 2MB, 4MB, etc.). The object size should be large
  enough to accommodate many stripe units, and should be a multiple of
  the stripe unit.

- **Stripe Width:** Stripes have a configurable unit size (e.g., 64kb).
  The Ceph Client divides the data it will write to objects into equally
  sized stripe units, except for the last stripe unit. A stripe width,
  should be a fraction of the Object Size so that an object may contain
  many stripe units.

- **Stripe Count:** The Ceph Client writes a sequence of stripe units
  over a series of objects determined by the stripe count. The series
  of objects is called an object set. After the Ceph Client writes to
  the last object in the object set, it returns to the first object in
  the object set.

.. important:: Test the performance of your striping configuration before
   putting your cluster into production. You CANNOT change these striping
   parameters after you stripe the data and write it to objects.

Once the Ceph Client has striped data to stripe units and mapped the stripe
units to objects, Ceph's CRUSH algorithm maps the objects to placement groups,
and the placement groups to Ceph OSD Daemons before the objects are stored as
files on a storage drive.

.. note:: Since a client writes to a single pool, all data striped into objects
   get mapped to placement groups in the same pool. So they use the same CRUSH
   map and the same access controls.


.. index:: architecture; Ceph Clients

.. _architecture_ceph_clients:

Ceph Clients
============

Ceph Clients include a number of service interfaces. These include:

- **Block Devices:** The :term:`Ceph Block Device` (a.k.a., RBD) service
  provides resizable, thin-provisioned block devices that can be snapshotted
  and cloned. Ceph stripes a block device across the cluster for high
  performance. Ceph supports both kernel objects (KO) and a QEMU hypervisor
  that uses ``librbd`` directly--avoiding the kernel object overhead for
  virtualized systems.

- **Object Storage:** The :term:`Ceph Object Storage` (a.k.a., RGW) service
  provides RESTful APIs with interfaces that are compatible with Amazon S3
  and OpenStack Swift.

- **Filesystem**: The :term:`Ceph File System` (CephFS) service provides
  a POSIX compliant filesystem usable with ``mount`` or as
  a filesystem in user space (FUSE).

Ceph can run additional instances of OSDs, MDSs, and monitors for scalability
and high availability. The following diagram depicts the high-level
architecture.

.. ditaa::

            +--------------+  +----------------+  +-------------+
            | Block Device |  | Object Storage |  |   CephFS    |
            +--------------+  +----------------+  +-------------+

            +--------------+  +----------------+  +-------------+
            |    librbd    |  |     librgw     |  |  libcephfs  |
            +--------------+  +----------------+  +-------------+

            +---------------------------------------------------+
            |      Ceph Storage Cluster Protocol (librados)     |
            +---------------------------------------------------+

            +---------------+ +---------------+ +---------------+
            |      OSDs     | |      MDSs     | |    Monitors   |
            +---------------+ +---------------+ +---------------+


.. index:: architecture; Ceph Object Storage

Ceph Object Storage
-------------------

The Ceph Object Storage daemon, ``radosgw``, is a FastCGI service that provides
a RESTful_ HTTP API to store objects and metadata. It layers on top of the Ceph
Storage Cluster with its own data formats, and maintains its own user database,
authentication, and access control. The RADOS Gateway uses a unified namespace,
which means you can use either the OpenStack Swift-compatible API or the Amazon
S3-compatible API. For example, you can write data using the S3-compatible API
with one application and then read data using the Swift-compatible API with
another application.

.. topic:: S3/Swift Objects and Store Cluster Objects Compared

   Ceph's Object Storage uses the term *object* to describe the data it stores.
   S3 and Swift objects are not the same as the objects that Ceph writes to the
   Ceph Storage Cluster. Ceph Object Storage objects are mapped to Ceph Storage
   Cluster objects. The S3 and Swift objects do not necessarily
   correspond in a 1:1 manner with an object stored in the storage cluster. It
   is possible for an S3 or Swift object to map to multiple Ceph objects.

See `Ceph Object Storage`_ for details.


.. index:: Ceph Block Device; block device; RBD; Rados Block Device

Ceph Block Device
-----------------

A Ceph Block Device stripes a block device image over multiple objects in the
Ceph Storage Cluster, where each object gets mapped to a placement group and
distributed, and the placement groups are spread across separate ``ceph-osd``
daemons throughout the cluster.

.. important:: Striping allows RBD block devices to perform better than a single
   server could!

Thin-provisioned snapshottable Ceph Block Devices are an attractive option for
virtualization and cloud computing. In virtual machine scenarios, people
typically deploy a Ceph Block Device with the ``rbd`` network storage driver in
QEMU/KVM, where the host machine uses ``librbd`` to provide a block device
service to the guest. Many cloud computing stacks use ``libvirt`` to integrate
with hypervisors. You can use thin-provisioned Ceph Block Devices with QEMU and
``libvirt`` to support OpenStack, OpenNebula and CloudStack
among other solutions.

While we do not provide ``librbd`` support with other hypervisors at this time,
you may also use Ceph Block Device kernel objects to provide a block device to a
client. Other virtualization technologies such as Xen can access the Ceph Block
Device kernel object(s). This is done with the  command-line tool ``rbd``.


.. index:: CephFS; Ceph File System; libcephfs; MDS; metadata server; ceph-mds

.. _arch-cephfs:

Ceph File System
----------------

The Ceph File System (CephFS) provides a POSIX-compliant filesystem as a
service that is layered on top of the object-based Ceph Storage Cluster.
CephFS files get mapped to objects that Ceph stores in the Ceph Storage
Cluster. Ceph Clients mount a CephFS filesystem as a kernel object or as
a Filesystem in User Space (FUSE).

.. ditaa::

            +-----------------------+  +------------------------+
            | CephFS Kernel Object  |  |      CephFS FUSE       |
            +-----------------------+  +------------------------+

            +---------------------------------------------------+
            |            CephFS Library (libcephfs)             |
            +---------------------------------------------------+

            +---------------------------------------------------+
            |      Ceph Storage Cluster Protocol (librados)     |
            +---------------------------------------------------+

            +---------------+ +---------------+ +---------------+
            |      OSDs     | |      MDSs     | |    Monitors   |
            +---------------+ +---------------+ +---------------+


The Ceph File System service includes the Ceph Metadata Server (MDS) deployed
with the Ceph Storage cluster. The purpose of the MDS is to store all the
filesystem metadata (directories, file ownership, access modes, etc) in
high-availability Ceph Metadata Servers where the metadata resides in memory.
The reason for the MDS (a daemon called ``ceph-mds``) is that simple filesystem
operations like listing a directory or changing a directory (``ls``, ``cd``)
would tax the Ceph OSD Daemons unnecessarily. So separating the metadata from
the data means that the Ceph File System can provide high performance services
without taxing the Ceph Storage Cluster.

CephFS separates the metadata from the data, storing the metadata in the MDS,
and storing the file data in one or more objects in the Ceph Storage Cluster.
The Ceph filesystem aims for POSIX compatibility. ``ceph-mds`` can run as a
single process, or it can be distributed out to multiple physical machines,
either for high availability or for scalability.

- **High Availability**: The extra ``ceph-mds`` instances can be `standby`,
  ready to take over the duties of any failed ``ceph-mds`` that was
  `active`. This is easy because all the data, including the journal, is
  stored on RADOS. The transition is triggered automatically by ``ceph-mon``.

- **Scalability**: Multiple ``ceph-mds`` instances can be `active`, and they
  will split the directory tree into subtrees (and shards of a single
  busy directory), effectively balancing the load amongst all `active`
  servers.

Combinations of `standby` and `active` etc are possible, for example
running 3 `active` ``ceph-mds`` instances for scaling, and one `standby`
instance for high availability.


.. _RADOS - A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters: https://ceph.io/assets/pdfs/weil-rados-pdsw07.pdf
.. _Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science)
.. _Monitor Config Reference: ../rados/configuration/mon-config-ref
.. _Monitoring OSDs and PGs: ../rados/operations/monitoring-osd-pg
.. _Heartbeats: ../rados/configuration/mon-osd-interaction
.. _Monitoring OSDs: ../rados/operations/monitoring-osd-pg/#monitoring-osds
.. _CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data: https://ceph.io/assets/pdfs/weil-crush-sc06.pdf
.. _Data Scrubbing: ../rados/configuration/osd-config-ref#scrubbing
.. _Report Peering Failure: ../rados/configuration/mon-osd-interaction#osds-report-peering-failure
.. _Troubleshooting Peering Failure: ../rados/troubleshooting/troubleshooting-pg#placement-group-down-peering-failure
.. _Ceph Authentication and Authorization: ../rados/operations/auth-intro/
.. _Hardware Recommendations: ../start/hardware-recommendations
.. _Network Config Reference: ../rados/configuration/network-config-ref
.. _Data Scrubbing: ../rados/configuration/osd-config-ref#scrubbing
.. _striping: https://en.wikipedia.org/wiki/Data_striping
.. _RAID: https://en.wikipedia.org/wiki/RAID
.. _RAID 0: https://en.wikipedia.org/wiki/RAID_0#RAID_0
.. _Ceph Object Storage: ../radosgw/
.. _RESTful: https://en.wikipedia.org/wiki/RESTful
.. _Erasure Code Notes: https://github.com/ceph/ceph/blob/40059e12af88267d0da67d8fd8d9cd81244d8f93/doc/dev/osd_internals/erasure_coding/developer_notes.rst
.. _Cache Tiering: ../rados/operations/cache-tiering
.. _Set Pool Values: ../rados/operations/pools#set-pool-values
.. _Kerberos: https://en.wikipedia.org/wiki/Kerberos_(protocol)
.. _Cephx Config Guide: ../rados/configuration/auth-config-ref
.. _User Management: ../rados/operations/user-management
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
+								==============
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								 Architecture
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
+								==============
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								:term:`Ceph` uniquely delivers **object, block, and file storage** in one
 								unified system. Ceph is highly reliable, easy to manage, and free. The power of
 								Ceph can transform your company's IT infrastructure and your ability to manage
 								vast amounts of data. Ceph delivers extraordinary scalability–thousands of
 								clients accessing petabytes to exabytes of data. A :term:`Ceph Node` leverages
 								commodity hardware and intelligent daemons, and a :term:`Ceph Storage Cluster`
 								accommodates large numbers of nodes, which communicate with each other to
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								replicate and redistribute data dynamically.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. image:: images/stack.png
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc/glossary: define "Ceph Storage Cluster"

Add "Ceph Storage Cluster" to the glossary.

Signed-off-by: Zac Dover <zac.dover@gmail.com>

											
										
										
											2022-11-22 04:02:34 +00:00
+								.. _arch-ceph-storage-cluster:
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								The Ceph Storage Cluster
 								========================
 								Ceph provides an infinitely scalable :term:`Ceph Storage Cluster` based upon
-												doc/architecture.rst: improve rados definition

Improve the definition of RADOS, and link to information about RADOS.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-01-28 19:33:58 +00:00
+								:abbr:`RADOS (Reliable Autonomic Distributed Object Store)`, a reliable,
 								distributed storage service that uses the intelligence in each of its nodes to
-												doc/architecture: correct typo

s/client/clients/ where necessary, and add a link to the glossary.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-03-06 11:40:10 +00:00
+								secure the data it stores and to provide that data to :term:`client`\s. See
 								Sage Weil's "`The RADOS Object Store
-												doc/architecture.rst: improve rados definition

Improve the definition of RADOS, and link to information about RADOS.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-01-28 19:33:58 +00:00
+								<https://ceph.io/en/news/blog/2009/the-rados-distributed-object-store/>`_" blog
 								post for a brief explanation of RADOS and see `RADOS - A Scalable, Reliable
 								Storage Service for Petabyte-scale Storage Clusters`_ for an exhaustive
 								explanation of :term:`RADOS`.
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								A Ceph Storage Cluster consists of multiple types of daemons:
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								- :term:`Ceph Monitor`
-												doc: Fixed syntax error.

Signed-off-by: John Wilkins <jowilki@redhat.com>

											
										
										
											2014-09-11 17:50:42 +00:00
+								- :term:`Ceph OSD Daemon`
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								- :term:`Ceph Manager`
 								- :term:`Ceph Metadata Server`
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc/glossary: add "Quorum" to glossary

Add the term "Quorum" to the glossary and link to the part of
architecture.rst concerning Monitors. The sticky header at the top of
the docs.ceph.com website gets in the way of the location linked to in
this commit, but fatigue and disgust prevent me from spending time today
trial-and-erroring my way through the hostile and ill-documented
wilderness of scroll-margin so that the link goes where it should.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-11-14 13:40:42 +00:00
+								.. _arch_monitor:
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								Ceph Monitors maintain the master copy of the cluster map, which they provide
-												doc/architecture: improve some paragraphs

Improve paragraphs under the heading "The Ceph Storage Cluster". Remove
a sentence that was pleonastic in its context in the paragraph.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-01-30 09:51:53 +00:00
+								to Ceph clients. The existence of multiple monitors in the Ceph cluster ensures
 								availability if one of the monitor daemons or its host fails.
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								A Ceph OSD Daemon checks its own state and the state of other OSDs and reports
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								back to monitors.
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								A Ceph Manager serves as an endpoint for monitoring, orchestration, and plug-in
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								modules.
 								A Ceph Metadata Server (MDS) manages file metadata when CephFS is used to
 								provide file services.
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								Storage cluster clients and :term:`Ceph OSD Daemon`\s use the CRUSH algorithm
-												doc/architecture: fix spelling and syntax

Fix the spelling of the word "algorithm" (which was "algoritm") and make
a sentence a little more natural.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-03-04 13:09:20 +00:00
+								to compute information about the location of data.  By using the CRUSH
 								algorithm, clients and OSDs avoid being bottlenecked by a central lookup table.
-												doc/architecture: improve some paragraphs

Improve paragraphs under the heading "The Ceph Storage Cluster". Remove
a sentence that was pleonastic in its context in the paragraph.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-01-30 09:51:53 +00:00
+								Ceph's high-level features include a native interface to the Ceph Storage
-												doc/architecture: fix spelling and syntax

Fix the spelling of the word "algorithm" (which was "algoritm") and make
a sentence a little more natural.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-03-04 13:09:20 +00:00
+								Cluster via ``librados`` and a number of service interfaces built on top of
-												doc/architecture: improve some paragraphs

Improve paragraphs under the heading "The Ceph Storage Cluster". Remove
a sentence that was pleonastic in its context in the paragraph.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-01-30 09:51:53 +00:00
+								``librados``.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Storing Data
 								------------
-												doc/glossary.rst: remove duplicates

This commit removes similar but distinct entries for the following:
   * CephFS
   * Ceph Client

Removal of a glossary term that is referred to in the body of the
documentation suite requires the alteration of the text string
that refers to the glossary term. Alterations of this kind have
been made to doc/architecture.rst and doc/rados/api/index.rst.

Signed-off-by: Zac Dover <zac.dover@gmail.com>

											
										
										
											2022-10-03 12:51:35 +00:00
+								The Ceph Storage Cluster receives data from :term:`Ceph Client`\s--whether it
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								comes through a :term:`Ceph Block Device`, :term:`Ceph Object Storage`, the
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								:term:`Ceph File System`, or a custom implementation that you create by using
 								``librados``. The data received by the Ceph Storage Cluster is stored as RADOS
 								objects. Each object is stored on an :term:`Object Storage Device` (this is
 								also called an "OSD"). Ceph OSDs control read, write, and replication
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								operations on storage drives. The default BlueStore back end stores objects
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								in a monolithic, database-like fashion.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								           /------\       +-----+       +-----+
 								           | obj  |------>| {d} |------>| {s} |
 								           \------/       +-----+       +-----+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								            Object         OSD          Drive
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								Ceph OSD Daemons store data as objects in a flat namespace. This means that
 								objects are not stored in a hierarchy of directories. An object has an
 								identifier, binary data, and metadata consisting of name/value pairs.
 								:term:`Ceph Client`\s determine the semantics of the object data. For example,
 								CephFS uses metadata to store file attributes such as the file owner, the
 								created date, and the last modified date.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
 								           /------+------------------------------+----------------\
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								           | ID   | Binary Data                  | Metadata       |
 								           +------+------------------------------+----------------+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								           | 1234 | 0101010101010100110101010010 | name1 = value1 |
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								           |      | 0101100001010100110101010010 | name2 = value2 |
 								           |      | 0101100001010100110101010010 | nameN = valueN |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								           \------+------------------------------+----------------/
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. note:: An object ID is unique across the entire cluster, not just the local
 								   filesystem.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; high availability, scalability
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 08:56:40 +00:00
+								.. _arch_scalability_and_high_availability:
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Scalability and High Availability
 								---------------------------------
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								In traditional architectures, clients talk to a centralized component. This
 								centralized component might be a gateway, a broker, an API, or a facade. A
 								centralized component of this kind acts as a single point of entry to a complex
-												doc/architecture.rst - edit a sentence

Change the sentence structure of a sentence because the verb
"experience" looked like the abstract noun "experience" when I read it
with fresh eyes. I chose the perhaps TESOL-unfriendly verb "incur", but
I believe it is right.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 16:31:30 +00:00
+								subsystem. Architectures that rely upon such a centralized component have a
 								single point of failure and incur limits to performance and scalability. If
 								the centralized component goes down, the whole system becomes unavailable.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								Ceph eliminates this centralized component. This enables clients to interact
 								with Ceph OSDs directly. Ceph OSDs create object replicas on other Ceph Nodes
 								to ensure data safety and high availability. Ceph also uses a cluster of
 								monitors to ensure high availability. To eliminate centralization, Ceph uses an
 								algorithm called :abbr:`CRUSH (Controlled Replication Under Scalable Hashing)`.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: CRUSH; architecture
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								CRUSH Introduction
 								~~~~~~~~~~~~~~~~~~
 								Ceph Clients and Ceph OSD Daemons both use the :abbr:`CRUSH (Controlled
-												doc/architecture.rst - edit up to "Cluster Map"

Edit doc/architecture.rst up to "Cluster Map", but not including
"Cluster Map".

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-10 03:10:09 +00:00
+								Replication Under Scalable Hashing)` algorithm to compute information about
 								object location instead of relying upon a central lookup table. CRUSH provides
 								a better data management mechanism than do older approaches, and CRUSH enables
 								massive scale by distributing the work to all the OSD daemons in the cluster
 								and all the clients that communicate with them. CRUSH uses intelligent data
 								replication to ensure resiliency, which is better suited to hyper-scale
 								storage. The following sections provide additional details on how CRUSH works.
-												doc/architecture: remove pleonasm

Remove the word "detailed" in a sentence immediately following a
sentence containing the word "detail".

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-03-04 13:41:20 +00:00
+								For an in-depth, academic discussion of CRUSH, see `CRUSH - Controlled,
 								Scalable, Decentralized Placement of Replicated Data`_.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; cluster map
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc/glossary: improve "Ceph Manager" term

Improve the glossary entry "Ceph Manager" by correcting the grammar and
properly linking to the "Ceph Manager" section of the Architecture
Guide.

Signed-off-by: Zac Dover <zac.dover@gmail.com>

											
										
										
											2022-11-09 13:12:54 +00:00
+								.. _architecture_cluster_map:
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
+								Cluster Map
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								~~~~~~~~~~~
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc/architecture: edit "Cluster Map"

Edit the section "Cluster Map" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-12 11:17:40 +00:00
+								In order for a Ceph cluster to function properly, Ceph Clients and Ceph OSDs
 								must have current information about the cluster's topology. Current information
 								is stored in the "Cluster Map", which is in fact a collection of five maps. The
 								five maps that constitute the cluster map are:
 								#. **The Monitor Map:** Contains the cluster ``fsid``, the position, the name,
 								   the address, and the TCP port of each monitor. The monitor map specifies the
 								   current epoch, the time of the monitor map's creation, and the time of the
 								   monitor map's last modification.  To view a monitor map, run ``ceph mon
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   dump``.
-												doc/architecture: edit "Cluster Map"

Edit the section "Cluster Map" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-12 11:17:40 +00:00
+								#. **The OSD Map:** Contains the cluster ``fsid``, the time of the OSD map's
 								   creation, the time of the OSD map's last modification, a list of pools, a
 								   list of replica sizes, a list of PG numbers, and a list of OSDs and their
 								   statuses (for example, ``up``, ``in``). To view an OSD map, run ``ceph
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   osd dump``.
-												doc/architecture: edit "Cluster Map"

Edit the section "Cluster Map" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-12 11:17:40 +00:00
+								#. **The PG Map:** Contains the PG version, its time stamp, the last OSD map
 								   epoch, the full ratios, and the details of each placement group. This
 								   includes the PG ID, the `Up Set`, the `Acting Set`, the state of the PG (for
 								   example, ``active + clean``), and data usage statistics for each pool.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
 								#. **The CRUSH Map:** Contains a list of storage devices, the failure domain
-												doc/architecture: edit "Cluster Map"

Edit the section "Cluster Map" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-12 11:17:40 +00:00
+								   hierarchy (for example, ``device``, ``host``, ``rack``, ``row``, ``room``),
 								   and rules for traversing the hierarchy when storing data. To view a CRUSH
 								   map, run ``ceph osd getcrushmap -o {filename}`` and then decompile it by
 								   running ``crushtool -d {comp-crushmap-filename} -o
 								   {decomp-crushmap-filename}``. Use a text editor or ``cat`` to view the
 								   decompiled map.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								#. **The MDS Map:** Contains the current MDS map epoch, when the map was
 								   created, and the last time it changed. It also contains the pool for
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
+								   storing metadata, a list of metadata servers, and which metadata servers
-												mds: remove deprecated commands from docs

This mostly is just removing the commands from the man page ceph(1). I
left the legacy section in doc/cephfs/administration.rst as-is.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

											
										
										
											2016-10-12 13:44:57 +00:00
+								   are ``up`` and ``in``. To view an MDS map, execute ``ceph fs dump``.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc/architecture: edit "Cluster Map"

Edit the section "Cluster Map" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-12 11:17:40 +00:00
+								Each map maintains a history of changes to its operating state. Ceph Monitors
 								maintain a master copy of the cluster map. This master copy includes the
 								cluster members, the state of the cluster, changes to the cluster, and
 								information recording the overall health of the Ceph Storage Cluster.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: high availability; monitor architecture
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								High Availability Monitors
 								~~~~~~~~~~~~~~~~~~~~~~~~~~
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc/architecture: edit "High Avail. Monitors"

Improve the sentence structure in the "High Availability Monitors"
section of doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-13 14:09:45 +00:00
+								A Ceph Client must contact a Ceph Monitor and obtain a current copy of the
 								cluster map in order to read data from or to write data to the Ceph cluster.
 								It is possible for a Ceph cluster to function properly with only a single
 								monitor, but a Ceph cluster that has only a single monitor has a single point
 								of failure: if the monitor goes down, Ceph clients will be unable to read data
 								from or write data to the cluster.
 								Ceph leverages a cluster of monitors in order to increase reliability and fault
 								tolerance. When a cluster of monitors is used, however, one or more of the
 								monitors in the cluster can fall behind due to latency or other faults. Ceph
 								mitigates these negative effects by requiring multiple monitor instances to
 								agree about the state of the cluster. To establish consensus among the monitors
 								regarding the state of the cluster, Ceph uses the `Paxos`_ algorithm and a
 								majority of monitors (for example, one in a cluster that contains only one
 								monitor, two in a cluster that contains three monitors, three in a cluster that
 								contains five monitors, four in a cluster that contains six monitors, and so
 								on).
 								See the `Monitor Config Reference`_ for more detail on configuring monitors.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; high availability authentication
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc/glossary: improve "CephX" entry

Improve the glossary entry for "CephX".

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-03-28 08:42:11 +00:00
+								.. _arch_high_availability_authentication:
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								High Availability Authentication
 								~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-												doc/architecture: Edit "HA Auth"

Edit "High Availability Authentication" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-16 12:27:29 +00:00
+								The ``cephx`` authentication system is used by Ceph to authenticate users and
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								daemons and to protect against man-in-the-middle attacks.
-												doc: Added a few comments and links to other relevant docs.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-25 18:02:27 +00:00
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								.. note:: The ``cephx`` protocol does not address data encryption in transport
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 08:56:40 +00:00
+								   (for example, SSL/TLS) or encryption at rest.
 								``cephx`` uses shared secret keys for authentication. This means that both the
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								client and the monitor cluster keep a copy of the client's secret key.
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 08:56:40 +00:00
 								The ``cephx`` protocol makes it possible for each party to prove to the other
 								that it has a copy of the key without revealing it. This provides mutual
 								authentication and allows the cluster to confirm (1) that the user has the
 								secret key and (2) that the user can be confident that the cluster has a copy
 								of the secret key.
-												doc: Added a few comments and links to other relevant docs.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-25 18:02:27 +00:00
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 08:56:40 +00:00
+								As stated in :ref:`Scalability and High Availability
 								<arch_scalability_and_high_availability>`, Ceph does not have any centralized
 								interface between clients and the Ceph object store. By avoiding such a
 								centralized interface, Ceph avoids the bottlenecks that attend such centralized
 								interfaces. However, this means that clients must interact directly with OSDs.
 								Direct interactions between Ceph clients and OSDs require authenticated
 								connections. The ``cephx`` authentication system establishes and sustains these
 								authenticated connections.
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								The ``cephx`` protocol operates in a manner similar to `Kerberos`_.
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 20:41:28 +00:00
+								A user invokes a Ceph client to contact a monitor. Unlike Kerberos, each
 								monitor can authenticate users and distribute keys, which means that there is
 								no single point of failure and no bottleneck when using ``cephx``. The monitor
 								returns an authentication data structure that is similar to a Kerberos ticket.
 								This authentication data structure contains a session key for use in obtaining
 								Ceph services. The session key is itself encrypted with the user's permanent
 								secret key, which means that only the user can request services from the Ceph
 								Monitors. The client then uses the session key to request services from the
 								monitors, and the monitors provide the client with a ticket that authenticates
 								the client against the OSDs that actually handle data. Ceph Monitors and OSDs
 								share a secret, which means that the clients can use the ticket provided by the
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								monitors to authenticate against any OSD or metadata server in the cluster.
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 20:41:28 +00:00
 								Like Kerberos tickets, ``cephx`` tickets expire. An attacker cannot use an
 								expired ticket or session key that has been obtained surreptitiously. This form
 								of authentication prevents attackers who have access to the communications
 								medium from creating bogus messages under another user's identity and prevents
 								attackers from altering another user's legitimate messages, as long as the
 								user's secret key is not divulged before it expires.
 								An administrator must set up users before using ``cephx``.  In the following
 								diagram, the ``client.admin`` user invokes ``ceph auth get-or-create-key`` from
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								the command line to generate a username and secret key. Ceph's ``auth``
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 20:41:28 +00:00
+								subsystem generates the username and key, stores a copy on the monitor(s), and
 								transmits the user's secret back to the ``client.admin`` user. This means that
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								the client and the monitor share a secret key.
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								.. note:: The ``client.admin`` user must provide the user ID and
 								   secret key to the user in a secure manner.
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
 								           +---------+     +---------+
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								           | Client  |     | Monitor |
 								           +---------+     +---------+
 								                |  request to   |
 								                | create a user |
 								                |-------------->|----------+ create user
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                |               |          | and
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								                |<--------------|<---------+ store key
 								                | transmit key  |
 								                |               |
-												doc/architecture: "Edit HA Auth"

Rewrite the explanation of how a client authenticates against a monitor.
This is a rewrite of a single paragraph, and has been set apart in its
own PR so that it can receive the maximum amount of scrutiny that the
upstream Ceph community can muster.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-23 02:26:18 +00:00
+								Here is how a client authenticates with a monitor. The client passes the user
 								name to the monitor. The monitor generates a session key that is encrypted with
 								the secret key associated with the ``username``. The monitor transmits the
 								encrypted ticket to the client. The client uses the shared secret key to
 								decrypt the payload. The session key identifies the user, and this act of
 								identification will last for the duration of the session.  The client requests
 								a ticket for the user, and the ticket is signed with the session key. The
 								monitor generates a ticket and uses the user's secret key to encrypt it. The
 								encrypted ticket is transmitted to the client. The client decrypts the ticket
 								and uses it to sign requests to OSDs and to metadata servers in the cluster.
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
 								           +---------+     +---------+
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								           | Client  |     | Monitor |
 								           +---------+     +---------+
 								                |  authenticate |
 								                |-------------->|----------+ generate and
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                |               |          | encrypt
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								                |<--------------|<---------+ session key
 								                | transmit      |
 								                | encrypted     |
 								                | session key   |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                |               |
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								                |-----+ decrypt |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                |     | session |
 								                |<----+ key     |
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								                |               |
 								                |  req. ticket  |
 								                |-------------->|----------+ generate and
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                |               |          | encrypt
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								                |<--------------|<---------+ ticket
 								                | recv. ticket  |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                |               |
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								                |-----+ decrypt |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                |     | ticket  |
 								                |<----+         |
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
-												doc/architecture: edit "HA Authentication"

Edit "High Availability Authentication" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-25 04:34:06 +00:00
+								The ``cephx`` protocol authenticates ongoing communications between the clients
 								and Ceph daemons. After initial authentication, each message sent between a
 								client and a daemon is signed using a ticket that can be verified by monitors,
 								OSDs, and metadata daemons. This ticket is verified by using the secret shared
 								between the client and the daemon.
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
 								           +---------+     +---------+     +-------+     +-------+
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								           |  Client |     | Monitor |     |  MDS  |     |  OSD  |
 								           +---------+     +---------+     +-------+     +-------+
 								                |  request to   |              |             |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                | create a user |              |             |
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								                |-------------->| mon and      |             |
 								                |<--------------| client share |             |
 								                |    receive    | a secret.    |             |
 								                | shared secret |              |             |
 								                |               |<------------>|             |
 								                |               |<-------------+------------>|
 								                |               | mon, mds,    |             |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                | authenticate  | and osd      |             |
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								                |-------------->| share        |             |
 								                |<--------------| a secret     |             |
 								                |  session key  |              |             |
 								                |               |              |             |
 								                |  req. ticket  |              |             |
 								                |-------------->|              |             |
 								                |<--------------|              |             |
 								                | recv. ticket  |              |             |
 								                |               |              |             |
 								                |   make request (CephFS only) |             |
 								                |----------------------------->|             |
 								                |<-----------------------------|             |
 								                | receive response (CephFS only)             |
 								                |                                            |
 								                |                make request                |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                |------------------------------------------->|
-												doc: Put architectural details of authentication in to architecture doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-14 03:27:31 +00:00
+								                |<-------------------------------------------|
 								                               receive response
-												doc/architecture: edit "HA Authentication"

Edit "High Availability Authentication" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-25 04:34:06 +00:00
+								This authentication protects only the connections between Ceph clients and Ceph
 								daemons. The authentication is not extended beyond the Ceph client. If a user
 								accesses the Ceph client from a remote host, cephx authentication will not be
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 20:41:28 +00:00
+								applied to the connection between the user's host and the client host.
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								See `Cephx Config Guide`_ for more on configuration details.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 08:56:40 +00:00
+								See `User Management`_ for more on user management.
-												doc: Added a few comments and links to other relevant docs.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-25 18:02:27 +00:00
-												doc/architecture: "Edit HA Auth" (one of several)

Edit "High Availability Authentication" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-17 08:56:40 +00:00
+								See :ref:`A Detailed Description of the Cephx Authentication Protocol
 								<cephx_2012_peter>` for more on the distinction between authorization and
 								authentication and for a step-by-step explanation of the setup of ``cephx``
 								tickets and session keys.
-												doc: Added a few comments and links to other relevant docs.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-25 18:02:27 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; smart daemons and scalability
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
 								Smart Daemons Enable Hyperscale
 								~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-												doc/architecture: edit "SDEH"

Edit the front matter of the "Smart Daemons Enable Hyperscale" section
of doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-25 21:40:42 +00:00
+								A feature of many storage clusters is a centralized interface that keeps track
 								of the nodes that clients are permitted to access. Such centralized
 								architectures provide services to clients by means of a double dispatch. At the
 								petabyte-to-exabyte scale, such double dispatches are a significant
 								bottleneck.
 								Ceph obviates this bottleneck: Ceph's OSD Daemons AND Ceph clients are
 								cluster-aware. Like Ceph clients, each Ceph OSD Daemon is aware of other Ceph
 								OSD Daemons in the cluster. This enables Ceph OSD Daemons to interact directly
 								with other Ceph OSD Daemons and to interact directly with Ceph Monitors.  Being
 								cluster-aware makes it possible for Ceph clients to interact directly with Ceph
 								OSD Daemons.
 								Because Ceph clients, Ceph monitors, and Ceph OSD daemons interact with one
 								another directly, Ceph OSD daemons can make use of the aggregate CPU and RAM
 								resources of the nodes in the Ceph cluster. This means that a Ceph cluster can
 								easily perform tasks that a cluster with a centralized interface would struggle
-												doc/architecture: edit "OSDs service clients directly"

Edit "OSDs service clients directly" in the list in
"Smart Daemons Enable Hyperscale" in doc/architecure.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-26 20:00:34 +00:00
+								to perform. The ability of Ceph nodes to make use of the computing power of
-												doc/architecture: edit "SDEH"

Edit the front matter of the "Smart Daemons Enable Hyperscale" section
of doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-25 21:40:42 +00:00
+								the greater cluster provides several benefits:
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc/architecture: edit "OSDs service clients directly"

Edit "OSDs service clients directly" in the list in
"Smart Daemons Enable Hyperscale" in doc/architecure.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-26 20:00:34 +00:00
+								#. **OSDs Service Clients Directly:** Network devices can support only a
 								   limited number of concurrent connections. Because Ceph clients contact
 								   Ceph OSD daemons directly without first connecting to a central interface,
 								   Ceph enjoys improved perfomance and increased system capacity relative to
 								   storage redundancy strategies that include a central interface. Ceph clients
 								   maintain sessions only when needed, and maintain those sessions with only
 								   particular Ceph OSD daemons, not with a centralized interface.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc/architecture: edit "OSD Membership and Status"

Edit "OSD Membership and Status" in the "Smart Daemons Enable
Hyperscale" section of doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-28 03:55:02 +00:00
+								#. **OSD Membership and Status**: When Ceph OSD Daemons join a cluster, they
 								   report their status. At the lowest level, the Ceph OSD Daemon status is
 								   ``up`` or ``down``: this reflects whether the Ceph OSD daemon is running and
 								   able to service Ceph Client requests. If a Ceph OSD Daemon is ``down`` and
 								   ``in`` the Ceph Storage Cluster, this status may indicate the failure of the
 								   Ceph OSD Daemon. If a Ceph OSD Daemon is not running because it has crashed,
 								   the Ceph OSD Daemon cannot notify the Ceph Monitor that it is ``down``. The
 								   OSDs periodically send messages to the Ceph Monitor (in releases prior to
 								   Luminous, this was done by means of ``MPGStats``, and beginning with the
 								   Luminous release, this has been done with ``MOSDBeacon``). If the Ceph
 								   Monitors receive no such message after a configurable period of time,
 								   then they mark the OSD ``down``. This mechanism is a failsafe, however.
 								   Normally, Ceph OSD Daemons determine if a neighboring OSD is ``down`` and
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   report it to the Ceph Monitors. This contributes to making Ceph Monitors
-												doc/architecture: edit "OSD Membership and Status"

Edit "OSD Membership and Status" in the "Smart Daemons Enable
Hyperscale" section of doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-28 03:55:02 +00:00
+								   lightweight processes. See `Monitoring OSDs`_ and `Heartbeats`_ for
 								   additional details.
-												doc: fix factual inaccuracy in doc/architecture.rst

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>

											
										
										
											2017-05-23 10:27:32 +00:00
-												doc/architecture: edit "Data Scrubbing"

Edit the "Data Scrubbing" listitem in the list of benefits conferred by
the use by OSDs of the aggregate power of the cluster, in the section
"Smart Daemons Enable Hyperscale" in doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-28 18:40:08 +00:00
+								#. **Data Scrubbing:** To maintain data consistency, Ceph OSD Daemons scrub
 								   RADOS objects. Ceph OSD Daemons compare the metadata of their own local
 								   objects against the metadata of the replicas of those objects, which are
 								   stored on other OSDs. Scrubbing occurs on a per-Placement-Group basis, finds
 								   mismatches in object size and finds metadata mismatches, and is usually
 								   performed daily. Ceph OSD Daemons perform deeper scrubbing by comparing the
 								   data in objects, bit-for-bit, against their checksums. Deep scrubbing finds
 								   bad sectors on drives that are not detectable with light scrubs. See `Data
 								   Scrubbing`_ for details on configuring scrubbing.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc/architecture: edit "Replication"

Edit "Replication" in the "Smart Daemons Enable Hyperscale" section of
doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-28 21:37:40 +00:00
+								#. **Replication:** Data replication involves a collaboration between Ceph
 								   Clients and Ceph OSD Daemons. Ceph OSD Daemons use the CRUSH algorithm to
 								   determine the storage location of object replicas. Ceph clients use the
 								   CRUSH algorithm to determine the storage location of an object, then the
 								   object is mapped to a pool and to a placement group, and then the client
 								   consults the CRUSH map to identify the placement group's primary OSD.
 								   After identifying the target placement group, the client writes the object
 								   to the identified placement group's primary OSD. The primary OSD then
 								   consults its own copy of the CRUSH map to identify secondary and tertiary
 								   OSDS, replicates the object to the placement groups in those secondary and
 								   tertiary OSDs, confirms that the object was stored successfully in the
 								   secondary and tertiary OSDs, and reports to the client that the object
 								   was stored successfully.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								             +----------+
 								             |  Client  |
 								             |          |
 								             +----------+
 								                 *  ^
 								      Write (1)  |  |  Ack (6)
 								                 |  |
 								                 v  *
 								            +-------------+
 								            | Primary OSD |
 								            |             |
 								            +-------------+
 								              *  ^   ^  *
 								    Write (2) |  |   |  |  Write (3)
 								       +------+  |   |  +------+
 								       |  +------+   +------+  |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								       |  | Ack (4)  Ack (5)|  |
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								       v  *                 *  v
 								 +---------------+   +---------------+
 								 | Secondary OSD |   | Tertiary OSD  |
 								 |               |   |               |
 								 +---------------+   +---------------+
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc/architecture: edit "Replication"

Edit "Replication" in the "Smart Daemons Enable Hyperscale" section of
doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-28 21:37:40 +00:00
+								By performing this act of data replication, Ceph OSD Daemons relieve Ceph
 								clients of the burden of replicating data.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Dynamic Cluster Management
 								--------------------------
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								In the `Scalability and High Availability`_ section, we explained how Ceph uses
-												doc/architecture: edit several sections

Edit the following sections in doc/architecture.rst:

 1. Dynamic Cluster Management
 2. About Pools
 3. Mapping PGs to OSDs

The tone of "Dynamic Cluster Management" remains a bit too close to the
tone of marketing material, in my opinion, but I will return to firm it
up when I have finished a once-over of architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-30 04:58:41 +00:00
+								CRUSH, cluster topology, and intelligent daemons to scale and maintain high
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								availability. Key to Ceph's design is the autonomous, self-healing, and
 								intelligent Ceph OSD Daemon. Let's take a deeper look at how CRUSH works to
-												doc/architecture: edit several sections

Edit the following sections in doc/architecture.rst:

 1. Dynamic Cluster Management
 2. About Pools
 3. Mapping PGs to OSDs

The tone of "Dynamic Cluster Management" remains a bit too close to the
tone of marketing material, in my opinion, but I will return to firm it
up when I have finished a once-over of architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-30 04:58:41 +00:00
+								enable modern cloud storage infrastructures to place data, rebalance the
 								cluster, and adaptively place and balance data and recover from faults.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; pools
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
 								About Pools
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								~~~~~~~~~~~
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
 								The Ceph storage system supports the notion of 'Pools', which are logical
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								partitions for storing objects.
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc/architecture: edit several sections

Edit the following sections in doc/architecture.rst:

 1. Dynamic Cluster Management
 2. About Pools
 3. Mapping PGs to OSDs

The tone of "Dynamic Cluster Management" remains a bit too close to the
tone of marketing material, in my opinion, but I will return to firm it
up when I have finished a once-over of architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-30 04:58:41 +00:00
+								Ceph Clients retrieve a `Cluster Map`_ from a Ceph Monitor, and write RADOS
 								objects to pools. The way that Ceph places the data in the pools is determined
 								by the pool's ``size`` or number of replicas, the CRUSH rule, and the number of
 								placement groups in the pool.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +--------+  Retrieves  +---------------+
 								            | Client |------------>|  Cluster Map  |
 								            +--------+             +---------------+
 								                 |
 								                 v      Writes
 								              /-----\
 								              | obj |
 								              \-----/
 								                 |      To
 								                 v
 								            +--------+           +---------------+
-												doc: globally change CRUSH ruleset to CRUSH rule

Since kraken, Ceph enforces a 1:1 correspondence between CRUSH ruleset and
CRUSH rule, so effectively ruleset and rule are the same thing, although
the term "ruleset" still survives - notably in the CRUSH rule itself, where it
effectively denotes the number of the rule.

This commit updates the documentation to more faithfully reflect the current
state of the code.

Fixes: http://tracker.ceph.com/issues/20559
Signed-off-by: Nathan Cutler <ncutler@suse.com>

											
										
										
											2017-12-11 16:09:25 +00:00
+								            |  Pool  |---------->|  CRUSH Rule   |
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +--------+  Selects  +---------------+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								Pools set at least the following parameters:
 								- Ownership/Access to Objects
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								- The Number of Placement Groups, and
-												doc: globally change CRUSH ruleset to CRUSH rule

Since kraken, Ceph enforces a 1:1 correspondence between CRUSH ruleset and
CRUSH rule, so effectively ruleset and rule are the same thing, although
the term "ruleset" still survives - notably in the CRUSH rule itself, where it
effectively denotes the number of the rule.

This commit updates the documentation to more faithfully reflect the current
state of the code.

Fixes: http://tracker.ceph.com/issues/20559
Signed-off-by: Nathan Cutler <ncutler@suse.com>

											
										
										
											2017-12-11 16:09:25 +00:00
+								- The CRUSH Rule to Use.
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
 								See `Set Pool Values`_ for details.
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index: architecture; placement group mapping
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Mapping PGs to OSDs
 								~~~~~~~~~~~~~~~~~~~
-												doc/architecture: edit several sections

Edit the following sections in doc/architecture.rst:

 1. Dynamic Cluster Management
 2. About Pools
 3. Mapping PGs to OSDs

The tone of "Dynamic Cluster Management" remains a bit too close to the
tone of marketing material, in my opinion, but I will return to firm it
up when I have finished a once-over of architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-30 04:58:41 +00:00
+								Each pool has a number of placement groups (PGs) within it. CRUSH dynamically
 								maps PGs to OSDs. When a Ceph Client stores objects, CRUSH maps each RADOS
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								object to a PG.
-												doc/architecture: edit several sections

Edit the following sections in doc/architecture.rst:

 1. Dynamic Cluster Management
 2. About Pools
 3. Mapping PGs to OSDs

The tone of "Dynamic Cluster Management" remains a bit too close to the
tone of marketing material, in my opinion, but I will return to firm it
up when I have finished a once-over of architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-30 04:58:41 +00:00
 								This mapping of RADOS objects to PGs implements an abstraction and indirection
 								layer between Ceph OSD Daemons and Ceph Clients. The Ceph Storage Cluster must
 								be able to grow (or shrink) and redistribute data adaptively when the internal
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								topology changes.
-												doc/architecture: edit several sections

Edit the following sections in doc/architecture.rst:

 1. Dynamic Cluster Management
 2. About Pools
 3. Mapping PGs to OSDs

The tone of "Dynamic Cluster Management" remains a bit too close to the
tone of marketing material, in my opinion, but I will return to firm it
up when I have finished a once-over of architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-30 04:58:41 +00:00
 								If the Ceph Client "knew" which Ceph OSD Daemons were storing which objects, a
 								tight coupling would exist between the Ceph Client and the Ceph OSD Daemon.
 								But Ceph avoids any such tight coupling. Instead, the CRUSH algorithm maps each
 								RADOS object to a placement group and then maps each placement group to one or
 								more Ceph OSD Daemons. This "layer of indirection" allows Ceph to rebalance
 								dynamically when new Ceph OSD Daemons and their underlying OSD devices come
 								online. The following diagram shows how the CRUSH algorithm maps objects to
 								placement groups, and how it maps placement groups to OSDs.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
+								           /-----\  /-----\  /-----\  /-----\  /-----\
 								           | obj |  | obj |  | obj |  | obj |  | obj |
 								           \-----/  \-----/  \-----/  \-----/  \-----/
 								              |        |        |        |        |
 								              +--------+--------+        +---+----+
 								              |                              |
 								              v                              v
 								   +-----------------------+      +-----------------------+
 								   |  Placement Group #1   |      |  Placement Group #2   |
 								   |                       |      |                       |
 								   +-----------------------+      +-----------------------+
 								               |                              |
 								               |      +-----------------------+---+
 								        +------+------+-------------+             |
 								        |             |             |             |
 								        v             v             v             v
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   /----------\  /----------\  /----------\  /----------\
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
+								   |          |  |          |  |          |  |          |
 								   |  OSD #1  |  |  OSD #2  |  |  OSD #3  |  |  OSD #4  |
 								   |          |  |          |  |          |  |          |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   \----------/  \----------/  \----------/  \----------/
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc/architecture: edit several sections

Edit the following sections in doc/architecture.rst:

 1. Dynamic Cluster Management
 2. About Pools
 3. Mapping PGs to OSDs

The tone of "Dynamic Cluster Management" remains a bit too close to the
tone of marketing material, in my opinion, but I will return to firm it
up when I have finished a once-over of architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-30 04:58:41 +00:00
+								The client uses its copy of the cluster map and the CRUSH algorithm to compute
 								precisely which OSD it will use when reading or writing a particular object.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; calculating PG IDs
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
 								Calculating PG IDs
 								~~~~~~~~~~~~~~~~~~
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc/architecture: edit "Calculating PG IDs"

Edit the section "Calcluating PG IDs" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-10-01 23:43:37 +00:00
+								When a Ceph Client binds to a Ceph Monitor, it retrieves the latest version of
 								the `Cluster Map`_. When a client has been equipped with a copy of the cluster
 								map, it is aware of all the monitors, OSDs, and metadata servers in the
 								cluster. **However, even equipped with a copy of the latest version of the
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								cluster map, the client doesn't know anything about object locations.**
-												doc/architecture: edit "Calculating PG IDs"

Edit the section "Calcluating PG IDs" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-10-01 23:43:37 +00:00
 								**Object locations must be computed.**
-												doc/architecture.rst - fix typo

s/requies/requires

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2024-01-30 12:24:11 +00:00
+								The client requires only the object ID and the name of the pool in order to
-												doc/architecture: edit "Calculating PG IDs"

Edit the section "Calcluating PG IDs" in doc/architecture.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-10-01 23:43:37 +00:00
+								compute the object location.
 								Ceph stores data in named pools (for example,  "liverpool"). When a client
 								stores a named object (for example, "john", "paul", "george", or "ringo") it
 								calculates a placement group by using the object name, a hash code, the number
 								of PGs in the pool, and the pool name. Ceph clients use the following steps to
 								compute PG IDs.
 								#. The client inputs the pool name and the object ID. (for example: pool =
 								   "liverpool" and object-id = "john")
 								#. Ceph hashes the object ID.
 								#. Ceph calculates the hash, modulo the number of PGs (for example: ``58``), to
 								   get a PG ID.
 								#. Ceph uses the pool name to retrieve the pool ID: (for example: "liverpool" =
 								   ``4``)
 								#. Ceph prepends the pool ID to the PG ID (for example: ``4.58``).
 								It is much faster to compute object locations than to perform object location
 								query over a chatty session. The :abbr:`CRUSH (Controlled Replication Under
 								Scalable Hashing)` algorithm allows a client to compute where objects are
 								expected to be stored, and enables the client to contact the primary OSD to
 								store or retrieve the objects.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; PG Peering
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
 								Peering and Sets
 								~~~~~~~~~~~~~~~~
-												doc: fix a typo

Signed-off-by: Brad Fitzpatrick <brad@danga.com>
											
										
										
											2023-01-07 03:54:45 +00:00
+								In previous sections, we noted that Ceph OSD Daemons check each other's
-												doc/architecture: edit "Peering and Sets"

Edit the English in the section "Peering and Sets" in the file
doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-10-03 12:11:50 +00:00
+								heartbeats and report back to Ceph Monitors. Ceph OSD daemons also 'peer',
 								which is the process of bringing all of the OSDs that store a Placement Group
 								(PG) into agreement about the state of all of the RADOS objects (and their
 								metadata) in that PG. Ceph OSD Daemons `Report Peering Failure`_ to the Ceph
 								Monitors. Peering issues usually resolve themselves; however, if the problem
 								persists, you may need to refer to the `Troubleshooting Peering Failure`_
 								section.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc/architecture: edit "Peering and Sets"

Edit the English in the section "Peering and Sets" in the file
doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-10-03 12:11:50 +00:00
+								.. Note:: PGs that agree on the state of the cluster do not necessarily have
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   the current data yet.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
 								The Ceph Storage Cluster was designed to store at least two copies of an object
-												doc/architecture: edit "Peering and Sets"

Edit the English in the section "Peering and Sets" in the file
doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-10-03 12:11:50 +00:00
+								(that is, ``size = 2``), which is the minimum requirement for data safety. For
 								high availability, a Ceph Storage Cluster should store more than two copies of
 								an object (that is, ``size = 3`` and ``min size = 2``) so that it can continue
 								to run in a ``degraded`` state while maintaining data safety.
 								.. warning:: Although we say here that R2 (replication with two copies) is the
 								   minimum requirement for data safety, R3 (replication with three copies) is
 								   recommended. On a long enough timeline, data stored with an R2 strategy will
 								   be lost.
 								As explained in the diagram in `Smart Daemons Enable Hyperscale`_, we do not
 								name the Ceph OSD Daemons specifically (for example, ``osd.0``, ``osd.1``,
 								etc.), but rather refer to them as *Primary*, *Secondary*, and so forth. By
 								convention, the *Primary* is the first OSD in the *Acting Set*, and is
 								responsible for orchestrating the peering process for each placement group
 								where it acts as the *Primary*. The *Primary* is the **ONLY** OSD in a given
 								placement group that accepts client-initiated writes to objects.
 								The set of OSDs that is responsible for a placement group is called the
 								*Acting Set*. The term "*Acting Set*" can refer either to the Ceph OSD Daemons
 								that are currently responsible for the placement group, or to the Ceph OSD
 								Daemons that were responsible for a particular placement group as of some
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								epoch.
-												doc/architecture: edit "Peering and Sets"

Edit the English in the section "Peering and Sets" in the file
doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-10-03 12:11:50 +00:00
+								The Ceph OSD daemons that are part of an *Acting Set* might not always be
 								``up``. When an OSD in the *Acting Set* is ``up``, it is part of the *Up Set*.
 								The *Up Set* is an important distinction, because Ceph can remap PGs to other
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								Ceph OSD Daemons when an OSD fails.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc/architecture: edit "Peering and Sets"

Edit the English in the section "Peering and Sets" in the file
doc/architecture.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-10-03 12:11:50 +00:00
+								.. note:: Consider a hypothetical *Acting Set* for a PG that contains
 								   ``osd.25``, ``osd.32`` and ``osd.61``. The first OSD (``osd.25``), is the
 								   *Primary*. If that OSD fails, the Secondary (``osd.32``), becomes the
 								   *Primary*, and ``osd.25`` is removed from the *Up Set*.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; Rebalancing
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
 								Rebalancing
 								~~~~~~~~~~~
 								When you add a Ceph OSD Daemon to a Ceph Storage Cluster, the cluster map gets
 								updated with the new OSD. Referring back to `Calculating PG IDs`_, this changes
 								the cluster map. Consequently, it changes object placement, because it changes
 								an input for the calculations. The following diagram depicts the rebalancing
 								process (albeit rather crudely, since it is substantially less impactful with
 								large clusters) where some, but not all of the PGs migrate from existing OSDs
 								(OSD 1, and OSD 2) to the new OSD (OSD 3). Even when rebalancing, CRUSH is
 								stable. Many of the placement groups remain in their original configuration,
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								and each OSD gets some added capacity, so there are no load spikes on the
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								new OSD after rebalancing is complete.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								           +--------+     +--------+
 								   Before  |  OSD 1 |     |  OSD 2 |
 								           +--------+     +--------+
 								           |  PG #1 |     | PG #6  |
 								           |  PG #2 |     | PG #7  |
 								           |  PG #3 |     | PG #8  |
 								           |  PG #4 |     | PG #9  |
 								           |  PG #5 |     | PG #10 |
 								           +--------+     +--------+
 								           +--------+     +--------+     +--------+
 								    After  |  OSD 1 |     |  OSD 2 |     |  OSD 3 |
 								           +--------+     +--------+     +--------+
 								           |  PG #1 |     | PG #7  |     |  PG #3 |
 								           |  PG #2 |     | PG #8  |     |  PG #6 |
 								           |  PG #4 |     | PG #10 |     |  PG #9 |
 								           |  PG #5 |     |        |     |        |
 								           |        |     |        |     |        |
 								           +--------+     +--------+     +--------+
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; Data Scrubbing
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Data Consistency
 								~~~~~~~~~~~~~~~~
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								As part of maintaining data consistency and cleanliness, Ceph OSDs also scrub
 								objects within placement groups. That is, Ceph OSDs compare object metadata in
 								one placement group with its replicas in placement groups stored in other
 								OSDs. Scrubbing (usually performed daily) catches OSD bugs or filesystem
 								errors, often as a result of hardware issues.  OSDs also perform deeper
 								scrubbing by comparing data in objects bit-for-bit.  Deep scrubbing (by default
 								performed weekly) finds bad blocks on a drive that weren't apparent in a light
 								scrub.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
 								See `Data Scrubbing`_ for details on configuring scrubbing.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
 								.. index:: erasure coding
 								Erasure Coding
 								--------------
 								An erasure coded pool stores each object as ``K+M`` chunks. It is divided into
 								``K`` data chunks and ``M`` coding chunks. The pool is configured to have a size
 								of ``K+M`` so that each chunk is stored in an OSD in the acting set. The rank of
 								the chunk is stored as an attribute of the object.
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								For instance an erasure coded pool can be created to use five OSDs (``K+M = 5``) and
-												doc: clarify availability vs integrity

											
										
										
											2024-06-18 21:17:00 +00:00
+								sustain the loss of two of them (``M = 2``). Data may be unavailable until (``K+1``)
 								shards are restored.
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
 								Reading and Writing Encoded Chunks
 								~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 								When the object **NYAN** containing ``ABCDEFGHI`` is written to the pool, the erasure
 								encoding function splits the content into three data chunks simply by dividing
 								the content in three: the first contains ``ABC``, the second ``DEF`` and the
 								last ``GHI``. The content will be padded if the content length is not a multiple
 								of ``K``. The function also creates two coding chunks: the fourth with ``YXY``
-												doc: fix typo in erasure coding example

Signed-off-by: Arthur Liu <arthurhsliu@gmail.com>

											
										
										
											2019-01-01 22:48:03 +00:00
+								and the fifth with ``QGC``. Each chunk is stored in an OSD in the acting set.
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								The chunks are stored in objects that have the same name (**NYAN**) but reside
 								on different OSDs. The order in which the chunks were created must be preserved
 								and is stored as an attribute of the object (``shard_t``), in addition to its
 								name. Chunk 1 contains ``ABC`` and is stored on **OSD5** while chunk 4 contains
 								``YXY`` and is stored on **OSD3**.
 								.. ditaa::
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								                            +-------------------+
 								                       name |       NYAN        |
 								                            +-------------------+
 								                    content |     ABCDEFGHI     |
 								                            +--------+----------+
 								                                     |
 								                                     |
 								                                     v
 								                              +------+------+
 								              +---------------+ encode(3,2) +-----------+
 								              |               +--+--+---+---+           |
 								              |                  |  |   |               |
 								              |          +-------+  |   +-----+         |
 								              |          |          |         |         |
 								           +--v---+   +--v---+   +--v---+  +--v---+  +--v---+
 								     name  | NYAN |   | NYAN |   | NYAN |  | NYAN |  | NYAN |
 								           +------+   +------+   +------+  +------+  +------+
 								    shard  |  1   |   |  2   |   |  3   |  |  4   |  |  5   |
 								           +------+   +------+   +------+  +------+  +------+
 								  content  | ABC  |   | DEF  |   | GHI  |  | YXY  |  | QGC  |
 								           +--+---+   +--+---+   +--+---+  +--+---+  +--+---+
 								              |          |          |         |         |
 								              |          |          v         |         |
 								              |          |       +--+---+     |         |
 								              |          |       | OSD1 |     |         |
 								              |          |       +------+     |         |
 								              |          |                    |         |
 								              |          |       +------+     |         |
 								              |          +------>| OSD2 |     |         |
 								              |                  +------+     |         |
 								              |                               |         |
 								              |                  +------+     |         |
 								              |                  | OSD3 |<----+         |
 								              |                  +------+               |
 								              |                                         |
 								              |                  +------+               |
 								              |                  | OSD4 |<--------------+
 								              |                  +------+
 								              |
 								              |                  +------+
 								              +----------------->| OSD5 |
 								                                 +------+
 								When the object **NYAN** is read from the erasure coded pool, the decoding
 								function reads three chunks: chunk 1 containing ``ABC``, chunk 3 containing
 								``GHI`` and chunk 4 containing ``YXY``. Then, it rebuilds the original content
 								of the object ``ABCDEFGHI``. The decoding function is informed that the chunks 2
 								and 5 are missing (they are called 'erasures'). The chunk 5 could not be read
 								because the **OSD4** is out. The decoding function can be called as soon as
 								three chunks are read: **OSD2** was the slowest and its chunk was not taken into
 								account.
 								.. ditaa::
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+									                         +-------------------+
 									                    name |       NYAN        |
 									                         +-------------------+
 									                 content |     ABCDEFGHI     |
 									                         +---------+---------+
 									                                   ^
 									                                   |
 									                                   |
 									                           +-------+-------+
 									                           |  decode(3,2)  |
 									            +------------->+  erasures 2,5 +<-+
 									            |              |               |  |
 									            |              +-------+-------+  |
 									            |                      ^          |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+									            |                      |          |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+									            |                      |          |
 									         +--+---+   +------+   +---+--+   +---+--+
 									   name  | NYAN |   | NYAN |   | NYAN |   | NYAN |
 									         +------+   +------+   +------+   +------+
 									  shard  |  1   |   |  2   |   |  3   |   |  4   |
 									         +------+   +------+   +------+   +------+
 									content  | ABC  |   | DEF  |   | GHI  |   | YXY  |
 									         +--+---+   +--+---+   +--+---+   +--+---+
 									            ^          .          ^          ^
 									            |    TOO   .          |          |
 									            |    SLOW  .       +--+---+      |
 									            |          ^       | OSD1 |      |
 									            |          |       +------+      |
 									            |          |                     |
 									            |          |       +------+      |
 									            |          +-------| OSD2 |      |
 									            |                  +------+      |
 									            |                                |
 									            |                  +------+      |
 									            |                  | OSD3 |------+
 									            |                  +------+
 									            |
 									            |                  +------+
 									            |                  | OSD4 | OUT
 									            |                  +------+
 									            |
 									            |                  +------+
 									            +------------------| OSD5 |
 									                               +------+
 								Interrupted Full Writes
 								~~~~~~~~~~~~~~~~~~~~~~~
 								In an erasure coded pool, the primary OSD in the up set receives all write
 								operations. It is responsible for encoding the payload into ``K+M`` chunks and
 								sends them to the other OSDs. It is also responsible for maintaining an
 								authoritative version of the placement group logs.
 								In the following diagram, an erasure coded placement group has been created with
-												doc/architecture.rst: fix a typo in EC section

In erasure coding section under Architecture, there is a mention of k = 2 + M =1 for
number of data copies and redundancy copies respectively, which is a bit ambiguous.
The proposal is to change to k = 2, M = 1 as the + sign is not needed here.

Signed-off-by: Nag Pavan Chilakam <nagpavan.chilakam@gmail.com>

											
										
										
											2020-02-12 12:13:51 +00:00
+								``K = 2, M = 1`` and is supported by three OSDs, two for ``K`` and one for
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								``M``. The acting set of the placement group is made of **OSD 1**, **OSD 2** and
 								**OSD 3**. An object has been encoded and stored in the OSDs : the chunk
 								``D1v1`` (i.e. Data chunk number 1, version 1) is on **OSD 1**, ``D2v1`` on
 								**OSD 2** and ``C1v1`` (i.e. Coding chunk number 1, version 1) on **OSD 3**. The
 								placement group logs on each OSD are identical (i.e. ``1,1`` for epoch 1,
 								version 1).
 								.. ditaa::
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								     Primary OSD
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								   +-------------+
 								   |    OSD 1    |             +-------------+
 								   |         log |  Write Full |             |
 								   |  +----+     |<------------+ Ceph Client |
 								   |  |D1v1| 1,1 |      v1     |             |
 								   |  +----+     |             +-------------+
 								   +------+------+
 								          |
 								          |
 								          |          +-------------+
 								          |          |    OSD 2    |
 								          |          |         log |
 								          +--------->+  +----+     |
 								          |          |  |D2v1| 1,1 |
 								          |          |  +----+     |
 								          |          +-------------+
 								          |
 								          |          +-------------+
 								          |          |    OSD 3    |
 								          |          |         log |
 								          +--------->|  +----+     |
 								                     |  |C1v1| 1,1 |
 								                     |  +----+     |
 								                     +-------------+
 								**OSD 1** is the primary and receives a **WRITE FULL** from a client, which
 								means the payload is to replace the object entirely instead of overwriting a
 								portion of it. Version 2 (v2) of the object is created to override version 1
 								(v1). **OSD 1** encodes the payload into three chunks: ``D1v2`` (i.e. Data
 								chunk number 1 version 2) will be on **OSD 1**, ``D2v2`` on **OSD 2** and
 								``C1v2`` (i.e. Coding chunk number 1 version 2) on **OSD 3**. Each chunk is sent
 								to the target OSD, including the primary OSD which is responsible for storing
 								chunks in addition to handling write operations and maintaining an authoritative
 								version of the placement group logs. When an OSD receives the message
 								instructing it to write the chunk, it also creates a new entry in the placement
 								group logs to reflect the change. For instance, as soon as **OSD 3** stores
 								``C1v2``, it adds the entry ``1,2`` ( i.e. epoch 1, version 2 ) to its logs.
 								Because the OSDs work asynchronously, some chunks may still be in flight ( such
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								as ``D2v2`` ) while others are acknowledged and persisted to storage drives
 								(such as ``C1v1`` and ``D1v1``).
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
 								.. ditaa::
 								     Primary OSD
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								   +-------------+
 								   |    OSD 1    |
 								   |         log |
 								   |  +----+     |             +-------------+
 								   |  |D1v2| 1,2 |  Write Full |             |
 								   |  +----+     +<------------+ Ceph Client |
 								   |             |      v2     |             |
 								   |  +----+     |             +-------------+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   |  |D1v1| 1,1 |
 								   |  +----+     |
 								   +------+------+
 								          |
 								          |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								          |           +------+------+
 								          |           |    OSD 2    |
 								          |  +------+ |         log |
-												doc: fix typo in erasure coding section

Signed-off-by: Venky Shankar <vshankar@redhat.com>

											
										
										
											2014-06-09 09:27:59 +00:00
+								          +->| D2v2 | |  +----+     |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								          |  +------+ |  |D2v1| 1,1 |
 								          |           |  +----+     |
 								          |           +-------------+
 								          |
 								          |           +-------------+
 								          |           |    OSD 3    |
 								          |           |         log |
 								          |           |  +----+     |
 								          |           |  |C1v2| 1,2 |
 								          +---------->+  +----+     |
 								                      |             |
 								                      |  +----+     |
 								                      |  |C1v1| 1,1 |
 								                      |  +----+     |
 								                      +-------------+
 								If all goes well, the chunks are acknowledged on each OSD in the acting set and
 								the logs' ``last_complete`` pointer can move from ``1,1`` to ``1,2``.
 								.. ditaa::
 								     Primary OSD
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								   +-------------+
 								   |    OSD 1    |
 								   |         log |
 								   |  +----+     |             +-------------+
 								   |  |D1v2| 1,2 |  Write Full |             |
 								   |  +----+     +<------------+ Ceph Client |
 								   |             |      v2     |             |
 								   |  +----+     |             +-------------+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   |  |D1v1| 1,1 |
 								   |  +----+     |
 								   +------+------+
 								          |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								          |           +-------------+
 								          |           |    OSD 2    |
 								          |           |         log |
 								          |           |  +----+     |
 								          |           |  |D2v2| 1,2 |
 								          +---------->+  +----+     |
 								          |           |             |
 								          |           |  +----+     |
 								          |           |  |D2v1| 1,1 |
 								          |           |  +----+     |
 								          |           +-------------+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								          |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								          |           +-------------+
 								          |           |    OSD 3    |
 								          |           |         log |
 								          |           |  +----+     |
 								          |           |  |C1v2| 1,2 |
 								          +---------->+  +----+     |
 								                      |             |
 								                      |  +----+     |
 								                      |  |C1v1| 1,1 |
 								                      |  +----+     |
 								                      +-------------+
 								Finally, the files used to store the chunks of the previous version of the
 								object can be removed: ``D1v1`` on **OSD 1**, ``D2v1`` on **OSD 2** and ``C1v1``
 								on **OSD 3**.
 								.. ditaa::
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								     Primary OSD
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								   +-------------+
-												doc :- fixing image in section ERASURE CODING

Description:- When write is completed successfully and previous chunks of previous version of object is removed - log should show 1,2 and no need of indicating - write operation.
Signed-off-by: Rachana Patel <rachana83.patel@gmail.com>

											
										
										
											2016-01-20 18:38:34 +00:00
+								   |    OSD 1    |
 								   |         log |
 								   |  +----+     |
 								   |  |D1v2| 1,2 |
 								   |  +----+     |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								   +------+------+
 								          |
 								          |
 								          |          +-------------+
 								          |          |    OSD 2    |
 								          |          |         log |
 								          +--------->+  +----+     |
-												doc :- fixing image in section ERASURE CODING

Description:- When write is completed successfully and previous chunks of previous version of object is removed - log should show 1,2 and no need of indicating - write operation.
Signed-off-by: Rachana Patel <rachana83.patel@gmail.com>

											
										
										
											2016-01-20 18:38:34 +00:00
+								          |          |  |D2v2| 1,2 |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								          |          |  +----+     |
 								          |          +-------------+
 								          |
 								          |          +-------------+
 								          |          |    OSD 3    |
 								          |          |         log |
 								          +--------->|  +----+     |
-												doc :- fixing image in section ERASURE CODING

Description:- When write is completed successfully and previous chunks of previous version of object is removed - log should show 1,2 and no need of indicating - write operation.
Signed-off-by: Rachana Patel <rachana83.patel@gmail.com>

											
										
										
											2016-01-20 18:38:34 +00:00
+								                     |  |C1v2| 1,2 |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								                     |  +----+     |
 								                     +-------------+
 								But accidents happen. If **OSD 1** goes down while ``D2v2`` is still in flight,
 								the object's version 2 is partially written: **OSD 3** has one chunk but that is
-												doc: fix typo (superfluous "no")

Signed-off-by: Adam Spiers <aspiers@suse.com>

											
										
										
											2014-11-16 20:49:03 +00:00
+								not enough to recover. It lost two chunks: ``D1v2`` and ``D2v2`` and the
-												doc: fix incorrect equalities

The previous punctuation accidentally implied that K == 1 and M == -1.

Signed-off-by: Adam Spiers <aspiers@suse.com>

											
										
										
											2014-11-16 20:51:16 +00:00
+								erasure coding parameters ``K = 2``, ``M = 1`` require that at least two chunks are
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								available to rebuild the third. **OSD 4** becomes the new primary and finds that
 								the ``last_complete`` log entry (i.e., all objects before this entry were known
 								to be available on all OSDs in the previous acting set ) is ``1,1`` and that
 								will be the head of the new authoritative log.
 								.. ditaa::
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								   +-------------+
 								   |    OSD 1    |
 								   |   (down)    |
 								   | c333        |
 								   +------+------+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								          |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								          |           +-------------+
 								          |           |    OSD 2    |
 								          |           |         log |
 								          |           |  +----+     |
-												doc: fix typos in diagram for incomplete write

In this example of a write of v2 of the object being interrupted, OSD2
would never have any version of the D1 chunk.  It only has the old v1
version of the D2 chunk.

Signed-off-by: Adam Spiers <aspiers@suse.com>

											
										
										
											2014-11-16 20:52:36 +00:00
+								          +---------->+  |D2v1| 1,1 |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								          |           |  +----+     |
 								          |           |             |
 								          |           +-------------+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								          |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								          |           +-------------+
 								          |           |    OSD 3    |
 								          |           |         log |
 								          |           |  +----+     |
 								          |           |  |C1v2| 1,2 |
 								          +---------->+  +----+     |
 								                      |             |
 								                      |  +----+     |
 								                      |  |C1v1| 1,1 |
 								                      |  +----+     |
 								                      +-------------+
 								     Primary OSD
 								   +-------------+
 								   |    OSD 4    |
 								   |         log |
 								   |             |
 								   |         1,1 |
 								   |             |
 								   +------+------+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
 								The log entry 1,2 found on **OSD 3** is divergent from the new authoritative log
 								provided by **OSD 4**: it is discarded and the file containing the ``C1v2``
 								chunk is removed. The ``D1v1`` chunk is rebuilt with the ``decode`` function of
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								the erasure coding library during scrubbing and stored on the new primary
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								**OSD 4**.
 								.. ditaa::
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								     Primary OSD
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								   +-------------+
 								   |    OSD 4    |
 								   |         log |
 								   |  +----+     |
 								   |  |D1v1| 1,1 |
 								   |  +----+     |
 								   +------+------+
 								          ^
 								          |
 								          |          +-------------+
 								          |          |    OSD 2    |
 								          |          |         log |
 								          +----------+  +----+     |
 								          |          |  |D2v1| 1,1 |
 								          |          |  +----+     |
 								          |          +-------------+
 								          |
 								          |          +-------------+
 								          |          |    OSD 3    |
 								          |          |         log |
 								          +----------|  +----+     |
 								                     |  |C1v1| 1,1 |
 								                     |  +----+     |
 								                     +-------------+
 								   +-------------+
 								   |    OSD 1    |
 								   |   (down)    |
 								   | c333        |
 								   +-------------+
 								See `Erasure Code Notes`_ for additional details.
 								Cache Tiering
 								-------------
-												doc: note deprecation of Cache Tiering in Reef

Add a note to the documentation that cache tiering is deprecated in
Reef.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-05-22 01:35:26 +00:00
+								.. note:: Cache tiering is deprecated in Reef.
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								A cache tier provides Ceph Clients with better I/O performance for a subset of
 								the data stored in a backing storage tier. Cache tiering involves creating a
 								pool of relatively fast/expensive storage devices (e.g., solid state drives)
 								configured to act as a cache tier, and a backing pool of either erasure-coded
 								or relatively slower/cheaper devices configured to act as an economical storage
 								tier. The Ceph objecter handles where to place the objects and the tiering
 								agent determines when to flush objects from the cache to the backing storage
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								tier. So the cache tier and the backing storage tier are completely transparent
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								to Ceph clients.
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								           +-------------+
 								           | Ceph Client |
 								           +------+------+
 								                  ^
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								     Tiering is   |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								    Transparent   |              Faster I/O
 								        to Ceph   |           +---------------+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								     Client Ops   |           |               |
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								                  |    +----->+   Cache Tier  |
 								                  |    |      |               |
 								                  |    |      +-----+---+-----+
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                  |    |            |   ^
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								                  v    v            |   |   Active Data in Cache Tier
 								           +------+----+--+         |   |
 								           |   Objecter   |         |   |
 								           +-----------+--+         |   |
 								                       ^            |   |   Inactive Data in Storage Tier
 								                       |            v   |
 								                       |      +-----+---+-----+
 								                       |      |               |
 								                       +----->|  Storage Tier |
 								                              |               |
 								                              +---------------+
 								                                 Slower I/O
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								See `Cache Tiering`_ for additional details.  Note that Cache Tiers can be
 								tricky and their use is now discouraged.
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: Extensibility, Ceph Classes
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
 								Extending Ceph
 								--------------
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								You can extend Ceph by creating shared object classes called 'Ceph Classes'.
 								Ceph loads ``.so`` classes stored in the ``osd class dir`` directory dynamically
 								(i.e., ``$libdir/rados-classes`` by default). When you implement a class, you
 								can create new object methods that have the ability to call the native methods
 								in the Ceph Object Store, or other class methods you incorporate via libraries
 								or create yourself.
 								On writes, Ceph Classes can call native or class methods, perform any series of
 								operations on the inbound data and generate a resulting write transaction  that
 								Ceph will apply atomically.
 								On reads, Ceph Classes can call native or class methods, perform any series of
 								operations on the outbound data and return the data to the client.
 								.. topic:: Ceph Class Example
 								   A Ceph class for a content management system that presents pictures of a
 								   particular size and aspect ratio could take an inbound bitmap image, crop it
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   to a particular aspect ratio, resize it and embed an invisible copyright or
 								   watermark to help protect the intellectual property; then, save the
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								   resulting bitmap image to the object store.
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								See ``src/objclass/objclass.h``, ``src/fooclass.cc`` and ``src/barclass`` for
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								exemplary implementations.
 								Summary
 								-------
 								Ceph Storage Clusters are dynamic--like a living organism. Whereas, many storage
 								appliances do not fully utilize the CPU and RAM of a typical commodity server,
 								Ceph does. From heartbeats, to  peering, to rebalancing the cluster or
 								recovering from faults,  Ceph offloads work from clients (and from a centralized
 								gateway which doesn't exist in the Ceph architecture) and uses the computing
 								power of the OSDs to perform the work. When referring to `Hardware
 								Recommendations`_ and the `Network Config Reference`_,  be cognizant of the
 								foregoing concepts to understand how Ceph utilizes computing resources.
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: Ceph Protocol, librados
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
 								Ceph Protocol
 								=============
 								Ceph Clients use the native protocol for interacting with the Ceph Storage
 								Cluster. Ceph packages this functionality into the ``librados`` library so that
 								you can create your own custom Ceph Clients. The following diagram depicts the
 								basic architecture.
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +---------------------------------+
 								            |  Ceph Storage Cluster Protocol  |
 								            |           (librados)            |
 								            +---------------------------------+
 								            +---------------+ +---------------+
 								            |      OSDs     | |    Monitors   |
 								            +---------------+ +---------------+
 								Native Protocol and ``librados``
 								--------------------------------
 								Modern applications need a simple object storage interface with asynchronous
 								communication capability. The Ceph Storage Cluster provides a simple object
 								storage interface with asynchronous communication capability. The interface
 								provides direct, parallel access to objects throughout the cluster.
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								- Pool Operations
 								- Snapshots and Copy-on-write Cloning
 								- Read/Write Objects
 								  - Create or Remove
 								  - Entire Object or Byte Range
 								  - Append or Truncate
 								- Create/Set/Get/Remove XATTRs
 								- Create/Set/Get/Remove Key/Value Pairs
 								- Compound operations and dual-ack semantics
 								- Object Classes
-												doc: Added some detail. Calculating PGs, maps; reorganized a bit.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-04-23 04:02:45 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; watch/notify
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
 								Object Watch/Notify
 								-------------------
 								A client can register a persistent interest with an object and keep a session to
-												doc: architecture minor fixes in watch notify

Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@ril.com>

											
										
										
											2015-06-08 18:51:58 +00:00
+								the primary OSD open. The client can send a notification message and a payload to
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								all watchers and receive notification when the watchers receive the
-												doc: architecture minor fixes in watch notify

Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@ril.com>

											
										
										
											2015-06-08 18:51:58 +00:00
+								notification. This enables a client to use any object as a
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								synchronization/communication channel.
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
 								           +----------+     +----------+     +----------+     +---------------+
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								           | Client 1 |     | Client 2 |     | Client 3 |     | OSD:Object ID |
 								           +----------+     +----------+     +----------+     +---------------+
 								                 |                |                |                  |
 								                 |                |                |                  |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                 |                |  Watch Object  |                  |
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								                 |--------------------------------------------------->|
 								                 |                |                |                  |
 								                 |<---------------------------------------------------|
 								                 |                |   Ack/Commit   |                  |
 								                 |                |                |                  |
 								                 |                |  Watch Object  |                  |
 								                 |                |---------------------------------->|
 								                 |                |                |                  |
 								                 |                |<----------------------------------|
 								                 |                |   Ack/Commit   |                  |
 								                 |                |                |   Watch Object   |
 								                 |                |                |----------------->|
 								                 |                |                |                  |
 								                 |                |                |<-----------------|
 								                 |                |                |    Ack/Commit    |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                 |                |     Notify     |                  |
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								                 |--------------------------------------------------->|
 								                 |                |                |                  |
 								                 |<---------------------------------------------------|
 								                 |                |     Notify     |                  |
 								                 |                |                |                  |
 								                 |                |<----------------------------------|
 								                 |                |     Notify     |                  |
 								                 |                |                |<-----------------|
 								                 |                |                |      Notify      |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                 |                |       Ack      |                  |
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								                 |----------------+---------------------------------->|
 								                 |                |                |                  |
 								                 |                |       Ack      |                  |
 								                 |                +---------------------------------->|
 								                 |                |                |                  |
 								                 |                |                |        Ack       |
 								                 |                |                |----------------->|
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								                 |                |                |                  |
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								                 |<---------------+----------------+------------------|
 								                 |                     Complete
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; Striping
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
 								Data Striping
 								-------------
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
 								Storage devices have throughput limitations, which impact performance and
 								scalability. So storage systems often support `striping`_--storing sequential
-												doc/architecture.rst: remove redundant word "across"

Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>

											
										
										
											2016-03-17 09:34:51 +00:00
+								pieces of information across multiple storage devices--to increase throughput
 								and performance. The most common form of data striping comes from `RAID`_.
 								The RAID type most similar to Ceph's striping is `RAID 0`_, or a 'striped
 								volume'. Ceph's striping offers the throughput of RAID 0 striping, the
 								reliability of n-way RAID mirroring and faster recovery.
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: filesystem to file system

"Filesystem" is not a word (although fairly common in use).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

											
										
										
											2019-09-09 19:36:04 +00:00
+								Ceph provides three types of clients: Ceph Block Device, Ceph File System, and
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								Ceph Object Storage. A Ceph Client converts its data from the representation
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								format it provides to its users (a block device image, RESTful objects, CephFS
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								filesystem directories) into objects for storage in the Ceph Storage Cluster.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								.. tip:: The objects Ceph stores in the Ceph Storage Cluster are not striped.
 								   Ceph Object Storage, Ceph Block Device, and the Ceph File System stripe their
 								   data over multiple Ceph Storage Cluster objects. Ceph Clients that write
-												doc: fix a few typos in architecture page

Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>

											
										
										
											2014-07-09 04:50:54 +00:00
+								   directly to the Ceph Storage Cluster via ``librados`` must perform the
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								   striping (and parallel I/O) for themselves to obtain these benefits.
 								The simplest Ceph striping format involves a stripe count of 1 object. Ceph
 								Clients write stripe units to a Ceph Storage Cluster object until the object is
 								at its maximum capacity, and then create another object for additional stripes
 								of data. The simplest form of striping may be sufficient for small block device
 								images, S3 or Swift objects and CephFS files. However, this simple form doesn't
 								take maximum advantage of Ceph's ability to distribute data across placement
 								groups, and consequently doesn't improve performance very much. The following
 								diagram depicts the simplest form of striping:
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								                        +---------------+
 								                        |  Client Data  |
 								                        |     Format    |
 								                        | cCCC          |
 								                        +---------------+
 								                                |
 								                       +--------+-------+
 								                       |                |
 								                       v                v
 								                 /-----------\    /-----------\
 								                 | Begin cCCC|    | Begin cCCC|
 								                 | Object  0 |    | Object  1 |
 								                 +-----------+    +-----------+
 								                 |  stripe   |    |  stripe   |
 								                 |  unit 1   |    |  unit 5   |
 								                 +-----------+    +-----------+
 								                 |  stripe   |    |  stripe   |
 								                 |  unit 2   |    |  unit 6   |
 								                 +-----------+    +-----------+
 								                 |  stripe   |    |  stripe   |
 								                 |  unit 3   |    |  unit 7   |
 								                 +-----------+    +-----------+
 								                 |  stripe   |    |  stripe   |
 								                 |  unit 4   |    |  unit 8   |
 								                 +-----------+    +-----------+
 								                 | End cCCC  |    | End cCCC  |
 								                 | Object 0  |    | Object 1  |
 								                 \-----------/    \-----------/
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								If you anticipate large images sizes, large S3 or Swift objects (e.g., video),
 								or large CephFS directories, you may see considerable read/write performance
 								improvements by striping client data over multiple objects within an object set.
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								Significant write performance occurs when the client writes the stripe units to
 								their corresponding objects in parallel. Since objects get mapped to different
 								placement groups and further mapped to different OSDs, each write occurs in
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								parallel at the maximum write speed. A write to a single drive would be limited
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								by the head movement (e.g. 6ms per seek) and bandwidth of that one device (e.g.
 MB/s).  By spreading that write over multiple objects (which map to different
 								placement groups and OSDs) Ceph can reduce the number of seeks per drive and
 								combine the throughput of multiple drives to achieve much faster write (or read)
 								speeds.
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. note:: Striping is independent of object replicas. Since CRUSH
 								   replicates objects across OSDs, stripes get replicated automatically.
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								In the following diagram, client data gets striped across an object set
 								(``object set 1`` in the following diagram) consisting of 4 objects, where the
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								first stripe unit is ``stripe unit 0`` in ``object 0``, and the fourth stripe
 								unit is ``stripe unit 3`` in ``object 3``. After writing the fourth stripe, the
 								client determines if the object set is full. If the object set is not full, the
 								client begins writing a stripe to the first object again (``object 0`` in the
 								following diagram). If the object set is full, the client creates a new object
 								set (``object set 2`` in the following diagram), and begins writing to the first
 								stripe (``stripe unit 16``) in the first object in the new object set (``object
 `` in the diagram below).
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
+								.. ditaa::
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								                          +---------------+
 								                          |  Client Data  |
 								                          |     Format    |
 								                          | cCCC          |
 								                          +---------------+
 								                                  |
 								       +-----------------+--------+--------+-----------------+
 								       |                 |                 |                 |     +--\
 								       v                 v                 v                 v        |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								 /-----------\     /-----------\     /-----------\     /-----------\  |
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								 | Begin cCCC|     | Begin cCCC|     | Begin cCCC|     | Begin cCCC|  |
 								 | Object 0  |     | Object  1 |     | Object  2 |     | Object  3 |  |
 								 +-----------+     +-----------+     +-----------+     +-----------+  |
 								 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  |
 								 |  unit 0   |     |  unit 1   |     |  unit 2   |     |  unit 3   |  |
 								 +-----------+     +-----------+     +-----------+     +-----------+  |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  +-\
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								 |  unit 4   |     |  unit 5   |     |  unit 6   |     |  unit 7   |    | Object
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								 +-----------+     +-----------+     +-----------+     +-----------+    +- Set
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |    |   1
 								 |  unit 8   |     |  unit 9   |     |  unit 10  |     |  unit 11  |  +-/
 								 +-----------+     +-----------+     +-----------+     +-----------+  |
 								 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  |
 								 |  unit 12  |     |  unit 13  |     |  unit 14  |     |  unit 15  |  |
 								 +-----------+     +-----------+     +-----------+     +-----------+  |
 								 | End cCCC  |     | End cCCC  |     | End cCCC  |     | End cCCC  |  |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								 | Object 0  |     | Object 1  |     | Object 2  |     | Object 3  |  |
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								 \-----------/     \-----------/     \-----------/     \-----------/  |
 								                                                                      |
 								                                                                   +--/
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								                                                                   +--\
 								                                                                      |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								 /-----------\     /-----------\     /-----------\     /-----------\  |
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								 | Begin cCCC|     | Begin cCCC|     | Begin cCCC|     | Begin cCCC|  |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								 | Object  4 |     | Object  5 |     | Object  6 |     | Object  7 |  |
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								 +-----------+     +-----------+     +-----------+     +-----------+  |
 								 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  |
 								 |  unit 16  |     |  unit 17  |     |  unit 18  |     |  unit 19  |  |
 								 +-----------+     +-----------+     +-----------+     +-----------+  |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  +-\
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								 |  unit 20  |     |  unit 21  |     |  unit 22  |     |  unit 23  |    | Object
 								 +-----------+     +-----------+     +-----------+     +-----------+    +- Set
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |    |   2
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								 |  unit 24  |     |  unit 25  |     |  unit 26  |     |  unit 27  |  +-/
 								 +-----------+     +-----------+     +-----------+     +-----------+  |
 								 |  stripe   |     |  stripe   |     |  stripe   |     |  stripe   |  |
 								 |  unit 28  |     |  unit 29  |     |  unit 30  |     |  unit 31  |  |
 								 +-----------+     +-----------+     +-----------+     +-----------+  |
 								 | End cCCC  |     | End cCCC  |     | End cCCC  |     | End cCCC  |  |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								 | Object 4  |     | Object 5  |     | Object 6  |     | Object 7  |  |
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								 \-----------/     \-----------/     \-----------/     \-----------/  |
 								                                                                      |
 								                                                                   +--/
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								Three important variables determine how Ceph stripes data:
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								- **Object Size:** Objects in the Ceph Storage Cluster have a maximum
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								  configurable size (e.g., 2MB, 4MB, etc.). The object size should be large
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								  enough to accommodate many stripe units, and should be a multiple of
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								  the stripe unit.
-												doc: Edited striping section. Modified stripe graphic to pretty print. Also modified replication graphic to pretty print.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 18:58:02 +00:00
+								- **Stripe Width:** Stripes have a configurable unit size (e.g., 64kb).
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								  The Ceph Client divides the data it will write to objects into equally
 								  sized stripe units, except for the last stripe unit. A stripe width,
 								  should be a fraction of the Object Size so that an object may contain
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								  many stripe units.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								- **Stripe Count:** The Ceph Client writes a sequence of stripe units
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								  over a series of objects determined by the stripe count. The series
 								  of objects is called an object set. After the Ceph Client writes to
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								  the last object in the object set, it returns to the first object in
 								  the object set.
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								.. important:: Test the performance of your striping configuration before
 								   putting your cluster into production. You CANNOT change these striping
 								   parameters after you stripe the data and write it to objects.
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Once the Ceph Client has striped data to stripe units and mapped the stripe
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
+								units to objects, Ceph's CRUSH algorithm maps the objects to placement groups,
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								and the placement groups to Ceph OSD Daemons before the objects are stored as
-												doc: object -> file -> disk is wrong for bluestore

Address tracker 23443

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

doc: object -> file -> disk is wrong for bluestore

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>

											
										
										
											2020-11-19 06:57:54 +00:00
+								files on a storage drive.
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. note:: Since a client writes to a single pool, all data striped into objects
 								   get mapped to placement groups in the same pool. So they use the same CRUSH
 								   map and the same access controls.
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; Ceph Clients
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc/glossary: link to "ceph clients" from entry

Link to the "Ceph Clients" section of doc/architecture.rst from the
"Ceph Clients" entry in the glossary. A glossary entry should be a short
summary of the topic with which it deals, and it should direct the
reader to further and more detailed reading if the reader is interested.
This does that.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-09-11 05:18:40 +00:00
+								.. _architecture_ceph_clients:
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Ceph Clients
 								============
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Ceph Clients include a number of service interfaces. These include:
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc/architecture: repair RBD sentence

Improve an ambiguous sentence in doc/architecture.rst.

The problem presented by the original sentence is that the phrasal verb
"to provide with" is implicated in one of its possible readings.
Interpreted in that way, the sentence seems to express the incorrect
idea that RBD furnishes block devices with snapshotting and cloning, as
though snapshotting and cloning are being delivered to the block
devices. In fact, snapshotting and cloning are just features of RBD, and
are features that are described on this page:
https://docs.ceph.com/en/quincy/rbd/rbd-snapshot/.

Signed-off-by: Zac Dover <zac.dover@proton.me>

											
										
										
											2023-10-07 21:43:43 +00:00
+								- **Block Devices:** The :term:`Ceph Block Device` (a.k.a., RBD) service
 								  provides resizable, thin-provisioned block devices that can be snapshotted
 								  and cloned. Ceph stripes a block device across the cluster for high
 								  performance. Ceph supports both kernel objects (KO) and a QEMU hypervisor
 								  that uses ``librbd`` directly--avoiding the kernel object overhead for
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								  virtualized systems.
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								- **Object Storage:** The :term:`Ceph Object Storage` (a.k.a., RGW) service
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								  provides RESTful APIs with interfaces that are compatible with Amazon S3
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								  and OpenStack Swift.
 								- **Filesystem**: The :term:`Ceph File System` (CephFS) service provides
 								  a POSIX compliant filesystem usable with ``mount`` or as
-												doc: fix typos

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2018-09-18 03:19:18 +00:00
+								  a filesystem in user space (FUSE).
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Ceph can run additional instances of OSDs, MDSs, and monitors for scalability
 								and high availability. The following diagram depicts the high-level
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								architecture.
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. ditaa::
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +--------------+  +----------------+  +-------------+
-												doc: s/Ceph FS/CephFS

Fixes: https://github.com/ceph/ceph/pull/22784#discussion_r200755460
Signed-off-by: Jos Collin <jcollin@redhat.com>

											
										
										
											2018-07-28 12:17:33 +00:00
+								            | Block Device |  | Object Storage |  |   CephFS    |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								            +--------------+  +----------------+  +-------------+
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +--------------+  +----------------+  +-------------+
 								            |    librbd    |  |     librgw     |  |  libcephfs  |
 								            +--------------+  +----------------+  +-------------+
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +---------------------------------------------------+
 								            |      Ceph Storage Cluster Protocol (librados)     |
 								            +---------------------------------------------------+
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +---------------+ +---------------+ +---------------+
 								            |      OSDs     | |      MDSs     | |    Monitors   |
 								            +---------------+ +---------------+ +---------------+
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated index tags.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-06-14 23:52:25 +00:00
+								.. index:: architecture; Ceph Object Storage
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Ceph Object Storage
 								-------------------
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								The Ceph Object Storage daemon, ``radosgw``, is a FastCGI service that provides
 								a RESTful_ HTTP API to store objects and metadata. It layers on top of the Ceph
 								Storage Cluster with its own data formats, and maintains its own user database,
 								authentication, and access control. The RADOS Gateway uses a unified namespace,
 								which means you can use either the OpenStack Swift-compatible API or the Amazon
-												architecture.rst: fix typos

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>

											
										
										
											2014-03-08 10:27:15 +00:00
+								S3-compatible API. For example, you can write data using the S3-compatible API
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								with one application and then read data using the Swift-compatible API with
 								another application.
-												:doc: Rewrote architecture paper. Still needs some work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-09-18 18:08:23 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. topic:: S3/Swift Objects and Store Cluster Objects Compared
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								   Ceph's Object Storage uses the term *object* to describe the data it stores.
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   S3 and Swift objects are not the same as the objects that Ceph writes to the
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								   Ceph Storage Cluster. Ceph Object Storage objects are mapped to Ceph Storage
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								   Cluster objects. The S3 and Swift objects do not necessarily
 								   correspond in a 1:1 manner with an object stored in the storage cluster. It
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								   is possible for an S3 or Swift object to map to multiple Ceph objects.
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								See `Ceph Object Storage`_ for details.
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Fixing index references.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-16 20:57:23 +00:00
+								.. index:: Ceph Block Device; block device; RBD; Rados Block Device
-												doc: Minor edits and added reference to Cephx intro.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-11-05 19:02:55 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Ceph Block Device
 								-----------------
-												doc: Minor edits and added reference to Cephx intro.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-11-05 19:02:55 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								A Ceph Block Device stripes a block device image over multiple objects in the
 								Ceph Storage Cluster, where each object gets mapped to a placement group and
 								distributed, and the placement groups are spread across separate ``ceph-osd``
 								daemons throughout the cluster.
-												doc: Minor edits and added reference to Cephx intro.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-11-05 19:02:55 +00:00
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								.. important:: Striping allows RBD block devices to perform better than a single
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								   server could!
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Thin-provisioned snapshottable Ceph Block Devices are an attractive option for
 								virtualization and cloud computing. In virtual machine scenarios, people
 								typically deploy a Ceph Block Device with the ``rbd`` network storage driver in
-												doc: Fixes all "Qemu" to "QEMU"

Signed-off-by: luokexue <luo.kexue@zte.com.cn>

											
										
										
											2016-04-16 03:42:19 +00:00
+								QEMU/KVM, where the host machine uses ``librbd`` to provide a block device
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								service to the guest. Many cloud computing stacks use ``libvirt`` to integrate
-												doc: Fixes all "Qemu" to "QEMU"

Signed-off-by: luokexue <luo.kexue@zte.com.cn>

											
										
										
											2016-04-16 03:42:19 +00:00
+								with hypervisors. You can use thin-provisioned Ceph Block Devices with QEMU and
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								``libvirt`` to support OpenStack, OpenNebula and CloudStack
 								among other solutions.
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								While we do not provide ``librbd`` support with other hypervisors at this time,
 								you may also use Ceph Block Device kernel objects to provide a block device to a
 								client. Other virtualization technologies such as Xen can access the Ceph Block
 								Device kernel object(s). This is done with the  command-line tool ``rbd``.
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: filesystem to file system

"Filesystem" is not a word (although fairly common in use).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

											
										
										
											2019-09-09 19:36:04 +00:00
+								.. index:: CephFS; Ceph File System; libcephfs; MDS; metadata server; ceph-mds
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: new label and glossary url

Let the other docs link Ceph Filesystem glossary from outside.
If the user wants then let him visit the Ceph Filesystem doc(s)
from the glossary.

Signed-off-by: Jos Collin <jcollin@redhat.com>

											
										
										
											2018-07-01 13:46:14 +00:00
+								.. _arch-cephfs:
-												doc: filesystem to file system

"Filesystem" is not a word (although fairly common in use).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

											
										
										
											2019-09-09 19:36:04 +00:00
+								Ceph File System
 								----------------
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: filesystem to file system

"Filesystem" is not a word (although fairly common in use).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

											
										
										
											2019-09-09 19:36:04 +00:00
+								The Ceph File System (CephFS) provides a POSIX-compliant filesystem as a
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								service that is layered on top of the object-based Ceph Storage Cluster.
-												doc: s/Ceph FS/CephFS

Fixes: https://github.com/ceph/ceph/pull/22784#discussion_r200755460
Signed-off-by: Jos Collin <jcollin@redhat.com>

											
										
										
											2018-07-28 12:17:33 +00:00
+								CephFS files get mapped to objects that Ceph stores in the Ceph Storage
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Cluster. Ceph Clients mount a CephFS filesystem as a kernel object or as
 								a Filesystem in User Space (FUSE).
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. ditaa::
-												doc: use plantweb as fallback of sphinx-ditaa

RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.

also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2020-04-09 13:25:39 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +-----------------------+  +------------------------+
 								            | CephFS Kernel Object  |  |      CephFS FUSE       |
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								            +-----------------------+  +------------------------+
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +---------------------------------------------------+
-												doc: s/Ceph FS/CephFS

Fixes: https://github.com/ceph/ceph/pull/22784#discussion_r200755460
Signed-off-by: Jos Collin <jcollin@redhat.com>

											
										
										
											2018-07-28 12:17:33 +00:00
+								            |            CephFS Library (libcephfs)             |
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +---------------------------------------------------+
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +---------------------------------------------------+
 								            |      Ceph Storage Cluster Protocol (librados)     |
 								            +---------------------------------------------------+
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								            +---------------+ +---------------+ +---------------+
 								            |      OSDs     | |      MDSs     | |    Monitors   |
 								            +---------------+ +---------------+ +---------------+
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: filesystem to file system

"Filesystem" is not a word (although fairly common in use).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

											
										
										
											2019-09-09 19:36:04 +00:00
+								The Ceph File System service includes the Ceph Metadata Server (MDS) deployed
-												doc: fix a few typos in architecture page

Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>

											
										
										
											2014-07-09 04:50:54 +00:00
+								with the Ceph Storage cluster. The purpose of the MDS is to store all the
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								filesystem metadata (directories, file ownership, access modes, etc) in
 								high-availability Ceph Metadata Servers where the metadata resides in memory.
 								The reason for the MDS (a daemon called ``ceph-mds``) is that simple filesystem
 								operations like listing a directory or changing a directory (``ls``, ``cd``)
 								would tax the Ceph OSD Daemons unnecessarily. So separating the metadata from
-												doc: filesystem to file system

"Filesystem" is not a word (although fairly common in use).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

											
										
										
											2019-09-09 19:36:04 +00:00
+								the data means that the Ceph File System can provide high performance services
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								without taxing the Ceph Storage Cluster.
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: s/Ceph FS/CephFS

Fixes: https://github.com/ceph/ceph/pull/22784#discussion_r200755460
Signed-off-by: Jos Collin <jcollin@redhat.com>

											
										
										
											2018-07-28 12:17:33 +00:00
+								CephFS separates the metadata from the data, storing the metadata in the MDS,
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								and storing the file data in one or more objects in the Ceph Storage Cluster.
 								The Ceph filesystem aims for POSIX compatibility. ``ceph-mds`` can run as a
 								single process, or it can be distributed out to multiple physical machines,
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								either for high availability or for scalability.
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												docs: Add information about OpenNebula integration

- Exclude doc build output from git
- Fix missing doc build dependency
- Also includes some involuntary automatically persistent linting by vscode

Co-authored-by: Ilya Dryomov <idryomov@redhat.com>
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Co-authored-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Daniel Clavijo <dclavijo@opennebula.io>

											
										
										
											2023-12-15 15:54:02 +00:00
+								- **High Availability**: The extra ``ceph-mds`` instances can be `standby`,
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								  ready to take over the duties of any failed ``ceph-mds`` that was
 								  `active`. This is easy because all the data, including the journal, is
 								  stored on RADOS. The transition is triggered automatically by ``ceph-mon``.
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								- **Scalability**: Multiple ``ceph-mds`` instances can be `active`, and they
 								  will split the directory tree into subtrees (and shards of a single
 								  busy directory), effectively balancing the load amongst all `active`
 								  servers.
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								Combinations of `standby` and `active` etc are possible, for example
 								running 3 `active` ``ceph-mds`` instances for scaling, and one `standby`
 								instance for high availability.
-												Doc: Restore the previous version of architecture.rst

it was accidentally overwritten with a version of the product
had a somewhat different audience/focus and a few sphinx
formatting errors.

I will cherry-pick the corrections in a subsequent commit.

Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>

											
										
										
											2011-12-01 23:22:15 +00:00
-												doc: Added a striping section for Architecture.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2012-12-04 04:48:02 +00:00
-												doc/architecture: correct PDF link

Correct PDF link from ceph.com website (old) to ceph.io website
(current).

Signed-off-by: Zac Dover <zac.dover@gmail.com>

											
										
										
											2022-11-08 08:24:29 +00:00
+								.. _RADOS - A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters: https://ceph.io/assets/pdfs/weil-rados-pdsw07.pdf
-												doc: fix warning reported by "build-doc linkcheck"

all the HTTP 301 (moved permanently) should be killed.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2017-10-23 11:26:28 +00:00
+								.. _Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science)
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. _Monitor Config Reference: ../rados/configuration/mon-config-ref
 								.. _Monitoring OSDs and PGs: ../rados/operations/monitoring-osd-pg
 								.. _Heartbeats: ../rados/configuration/mon-osd-interaction
 								.. _Monitoring OSDs: ../rados/operations/monitoring-osd-pg/#monitoring-osds
-												doc/various: update link to CRUSH pdf

This commit updates link to the research paper that announces and
explains the CRUSH algorithm. This link was broken in the migration from
the old Ceph website to ceph.io.

Signed-off-by: Zac Dover <zac.dover@gmail.com>

											
										
										
											2022-10-09 07:09:30 +00:00
+								.. _CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data: https://ceph.io/assets/pdfs/weil-crush-sc06.pdf
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. _Data Scrubbing: ../rados/configuration/osd-config-ref#scrubbing
 								.. _Report Peering Failure: ../rados/configuration/mon-osd-interaction#osds-report-peering-failure
 								.. _Troubleshooting Peering Failure: ../rados/troubleshooting/troubleshooting-pg#placement-group-down-peering-failure
 								.. _Ceph Authentication and Authorization: ../rados/operations/auth-intro/
-												doc: fix dead link "Hardware Recommendations"

Signed-off-by: Leo Zhang <nguzcf@gmail.com>

											
										
										
											2016-10-08 14:41:59 +00:00
+								.. _Hardware Recommendations: ../start/hardware-recommendations
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. _Network Config Reference: ../rados/configuration/network-config-ref
 								.. _Data Scrubbing: ../rados/configuration/osd-config-ref#scrubbing
-												doc: fix warning reported by "build-doc linkcheck"

all the HTTP 301 (moved permanently) should be killed.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2017-10-23 11:26:28 +00:00
+								.. _striping: https://en.wikipedia.org/wiki/Data_striping
 								.. _RAID: https://en.wikipedia.org/wiki/RAID
 								.. _RAID 0: https://en.wikipedia.org/wiki/RAID_0#RAID_0
-												doc: Updated architecture document.

fixes: #2968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2013-05-15 00:05:43 +00:00
+								.. _Ceph Object Storage: ../radosgw/
-												doc: fix warning reported by "build-doc linkcheck"

all the HTTP 301 (moved permanently) should be killed.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2017-10-23 11:26:28 +00:00
+								.. _RESTful: https://en.wikipedia.org/wiki/RESTful
-												doc: Added erasure coding and cache tiering notes. Special thanks to Loic Dachary.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-05-06 10:53:03 +00:00
+								.. _Erasure Code Notes: https://github.com/ceph/ceph/blob/40059e12af88267d0da67d8fd8d9cd81244d8f93/doc/dev/osd_internals/erasure_coding/developer_notes.rst
 								.. _Cache Tiering: ../rados/operations/cache-tiering
-												Update architecture.rst:Calculating PG IDs

To Calculate PG ID, if I didn't get it wrong, CRUSH calculates the hash modulo 
the number of PGs instead of OSDs, according to osd/osd_types.cc:963 
ceph_stable_mod(pg.ps(), pg_num, pg_num_mask).

Signed-off-by: Kai Zhang <zakir.exe@gmail.com>
											
										
										
											2014-05-23 00:37:16 +00:00
+								.. _Set Pool Values: ../rados/operations/pools#set-pool-values
-												doc: fix warning reported by "build-doc linkcheck"

all the HTTP 301 (moved permanently) should be killed.

Signed-off-by: Kefu Chai <kchai@redhat.com>

											
										
										
											2017-10-23 11:26:28 +00:00
+								.. _Kerberos: https://en.wikipedia.org/wiki/Kerberos_(protocol)
-												doc: Added a few comments and links to other relevant docs.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

											
										
										
											2014-08-25 18:02:27 +00:00
+								.. _Cephx Config Guide: ../rados/configuration/auth-config-ref
-												doc: architecture minor fixes in watch notify

Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@ril.com>

											
										
										
											2015-06-08 18:51:58 +00:00
+								.. _User Management: ../rados/operations/user-management