diff --git a/.gitignore b/.gitignore index b01aef839be..c74ad2efd69 100644 --- a/.gitignore +++ b/.gitignore @@ -83,3 +83,17 @@ GTAGS # Python building things where it shouldn't /src/python-common/build/ .cache + +# Doc build output +src/pybind/cephfs/build/ +src/pybind/cephfs/cephfs.c +src/pybind/cephfs/cephfs.egg-info/ +src/pybind/rados/build/ +src/pybind/rados/rados.c +src/pybind/rados/rados.egg-info/ +src/pybind/rbd/build/ +src/pybind/rbd/rbd.c +src/pybind/rbd/rbd.egg-info/ +src/pybind/rgw/build/ +src/pybind/rgw/rgw.c +src/pybind/rgw/rgw.egg-info/ diff --git a/doc/architecture.rst b/doc/architecture.rst index 983cec2300a..1be58f68292 100644 --- a/doc/architecture.rst +++ b/doc/architecture.rst @@ -37,7 +37,7 @@ to Ceph clients. Provisioning multiple monitors within the Ceph cluster ensures availability in the event that one of the monitor daemons or its host fails. The Ceph monitor provides copies of the cluster map to storage cluster clients. -A Ceph OSD Daemon checks its own state and the state of other OSDs and reports +A Ceph OSD Daemon checks its own state and the state of other OSDs and reports back to monitors. A Ceph Manager serves as an endpoint for monitoring, orchestration, and plug-in @@ -61,7 +61,7 @@ comes through a :term:`Ceph Block Device`, :term:`Ceph Object Storage`, the ``librados``. The data received by the Ceph Storage Cluster is stored as RADOS objects. Each object is stored on an :term:`Object Storage Device` (this is also called an "OSD"). Ceph OSDs control read, write, and replication -operations on storage drives. The default BlueStore back end stores objects +operations on storage drives. The default BlueStore back end stores objects in a monolithic, database-like fashion. .. ditaa:: @@ -69,7 +69,7 @@ in a monolithic, database-like fashion. /------\ +-----+ +-----+ | obj |------>| {d} |------>| {s} | \------/ +-----+ +-----+ - + Object OSD Drive Ceph OSD Daemons store data as objects in a flat namespace. This means that @@ -85,10 +85,10 @@ created date, and the last modified date. /------+------------------------------+----------------\ | ID | Binary Data | Metadata | +------+------------------------------+----------------+ - | 1234 | 0101010101010100110101010010 | name1 = value1 | + | 1234 | 0101010101010100110101010010 | name1 = value1 | | | 0101100001010100110101010010 | name2 = value2 | | | 0101100001010100110101010010 | nameN = valueN | - \------+------------------------------+----------------/ + \------+------------------------------+----------------/ .. note:: An object ID is unique across the entire cluster, not just the local filesystem. @@ -147,14 +147,14 @@ five maps that constitute the cluster map are: the address, and the TCP port of each monitor. The monitor map specifies the current epoch, the time of the monitor map's creation, and the time of the monitor map's last modification. To view a monitor map, run ``ceph mon - dump``. - + dump``. + #. **The OSD Map:** Contains the cluster ``fsid``, the time of the OSD map's creation, the time of the OSD map's last modification, a list of pools, a list of replica sizes, a list of PG numbers, and a list of OSDs and their statuses (for example, ``up``, ``in``). To view an OSD map, run ``ceph - osd dump``. - + osd dump``. + #. **The PG Map:** Contains the PG version, its time stamp, the last OSD map epoch, the full ratios, and the details of each placement group. This includes the PG ID, the `Up Set`, the `Acting Set`, the state of the PG (for @@ -168,8 +168,8 @@ five maps that constitute the cluster map are: {decomp-crushmap-filename}``. Use a text editor or ``cat`` to view the decompiled map. -#. **The MDS Map:** Contains the current MDS map epoch, when the map was - created, and the last time it changed. It also contains the pool for +#. **The MDS Map:** Contains the current MDS map epoch, when the map was + created, and the last time it changed. It also contains the pool for storing metadata, a list of metadata servers, and which metadata servers are ``up`` and ``in``. To view an MDS map, execute ``ceph fs dump``. @@ -212,13 +212,13 @@ High Availability Authentication ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``cephx`` authentication system is used by Ceph to authenticate users and -daemons and to protect against man-in-the-middle attacks. +daemons and to protect against man-in-the-middle attacks. -.. note:: The ``cephx`` protocol does not address data encryption in transport +.. note:: The ``cephx`` protocol does not address data encryption in transport (for example, SSL/TLS) or encryption at rest. ``cephx`` uses shared secret keys for authentication. This means that both the -client and the monitor cluster keep a copy of the client's secret key. +client and the monitor cluster keep a copy of the client's secret key. The ``cephx`` protocol makes it possible for each party to prove to the other that it has a copy of the key without revealing it. This provides mutual @@ -235,7 +235,7 @@ Direct interactions between Ceph clients and OSDs require authenticated connections. The ``cephx`` authentication system establishes and sustains these authenticated connections. -The ``cephx`` protocol operates in a manner similar to `Kerberos`_. +The ``cephx`` protocol operates in a manner similar to `Kerberos`_. A user invokes a Ceph client to contact a monitor. Unlike Kerberos, each monitor can authenticate users and distribute keys, which means that there is @@ -248,7 +248,7 @@ Monitors. The client then uses the session key to request services from the monitors, and the monitors provide the client with a ticket that authenticates the client against the OSDs that actually handle data. Ceph Monitors and OSDs share a secret, which means that the clients can use the ticket provided by the -monitors to authenticate against any OSD or metadata server in the cluster. +monitors to authenticate against any OSD or metadata server in the cluster. Like Kerberos tickets, ``cephx`` tickets expire. An attacker cannot use an expired ticket or session key that has been obtained surreptitiously. This form @@ -264,8 +264,8 @@ subsystem generates the username and key, stores a copy on the monitor(s), and transmits the user's secret back to the ``client.admin`` user. This means that the client and the monitor share a secret key. -.. note:: The ``client.admin`` user must provide the user ID and - secret key to the user in a secure manner. +.. note:: The ``client.admin`` user must provide the user ID and + secret key to the user in a secure manner. .. ditaa:: @@ -275,7 +275,7 @@ the client and the monitor share a secret key. | request to | | create a user | |-------------->|----------+ create user - | | | and + | | | and |<--------------|<---------+ store key | transmit key | | | @@ -298,25 +298,25 @@ and uses it to sign requests to OSDs and to metadata servers in the cluster. +---------+ +---------+ | authenticate | |-------------->|----------+ generate and - | | | encrypt + | | | encrypt |<--------------|<---------+ session key | transmit | | encrypted | | session key | - | | + | | |-----+ decrypt | - | | session | - |<----+ key | + | | session | + |<----+ key | | | | req. ticket | |-------------->|----------+ generate and - | | | encrypt + | | | encrypt |<--------------|<---------+ ticket | recv. ticket | - | | + | | |-----+ decrypt | - | | ticket | - |<----+ | + | | ticket | + |<----+ | The ``cephx`` protocol authenticates ongoing communications between the clients @@ -331,7 +331,7 @@ between the client and the daemon. | Client | | Monitor | | MDS | | OSD | +---------+ +---------+ +-------+ +-------+ | request to | | | - | create a user | | | + | create a user | | | |-------------->| mon and | | |<--------------| client share | | | receive | a secret. | | @@ -339,7 +339,7 @@ between the client and the daemon. | |<------------>| | | |<-------------+------------>| | | mon, mds, | | - | authenticate | and osd | | + | authenticate | and osd | | |-------------->| share | | |<--------------| a secret | | | session key | | | @@ -355,7 +355,7 @@ between the client and the daemon. | receive response (CephFS only) | | | | make request | - |------------------------------------------->| + |------------------------------------------->| |<-------------------------------------------| receive response @@ -364,7 +364,7 @@ daemons. The authentication is not extended beyond the Ceph client. If a user accesses the Ceph client from a remote host, cephx authentication will not be applied to the connection between the user's host and the client host. -See `Cephx Config Guide`_ for more on configuration details. +See `Cephx Config Guide`_ for more on configuration details. See `User Management`_ for more on user management. @@ -418,7 +418,7 @@ the greater cluster provides several benefits: Monitors receive no such message after a configurable period of time, then they mark the OSD ``down``. This mechanism is a failsafe, however. Normally, Ceph OSD Daemons determine if a neighboring OSD is ``down`` and - report it to the Ceph Monitors. This contributes to making Ceph Monitors + report it to the Ceph Monitors. This contributes to making Ceph Monitors lightweight processes. See `Monitoring OSDs`_ and `Heartbeats`_ for additional details. @@ -465,7 +465,7 @@ the greater cluster provides several benefits: Write (2) | | | | Write (3) +------+ | | +------+ | +------+ +------+ | - | | Ack (4) Ack (5)| | + | | Ack (4) Ack (5)| | v * * v +---------------+ +---------------+ | Secondary OSD | | Tertiary OSD | @@ -492,7 +492,7 @@ About Pools The Ceph storage system supports the notion of 'Pools', which are logical partitions for storing objects. - + Ceph Clients retrieve a `Cluster Map`_ from a Ceph Monitor, and write RADOS objects to pools. The way that Ceph places the data in the pools is determined by the pool's ``size`` or number of replicas, the CRUSH rule, and the number of @@ -513,12 +513,12 @@ placement groups in the pool. +--------+ +---------------+ | Pool |---------->| CRUSH Rule | +--------+ Selects +---------------+ - + Pools set at least the following parameters: - Ownership/Access to Objects -- The Number of Placement Groups, and +- The Number of Placement Groups, and - The CRUSH Rule to Use. See `Set Pool Values`_ for details. @@ -531,12 +531,12 @@ Mapping PGs to OSDs Each pool has a number of placement groups (PGs) within it. CRUSH dynamically maps PGs to OSDs. When a Ceph Client stores objects, CRUSH maps each RADOS -object to a PG. +object to a PG. This mapping of RADOS objects to PGs implements an abstraction and indirection layer between Ceph OSD Daemons and Ceph Clients. The Ceph Storage Cluster must be able to grow (or shrink) and redistribute data adaptively when the internal -topology changes. +topology changes. If the Ceph Client "knew" which Ceph OSD Daemons were storing which objects, a tight coupling would exist between the Ceph Client and the Ceph OSD Daemon. @@ -565,11 +565,11 @@ placement groups, and how it maps placement groups to OSDs. +------+------+-------------+ | | | | | v v v v - /----------\ /----------\ /----------\ /----------\ + /----------\ /----------\ /----------\ /----------\ | | | | | | | | | OSD #1 | | OSD #2 | | OSD #3 | | OSD #4 | | | | | | | | | - \----------/ \----------/ \----------/ \----------/ + \----------/ \----------/ \----------/ \----------/ The client uses its copy of the cluster map and the CRUSH algorithm to compute precisely which OSD it will use when reading or writing a particular object. @@ -583,7 +583,7 @@ When a Ceph Client binds to a Ceph Monitor, it retrieves the latest version of the `Cluster Map`_. When a client has been equipped with a copy of the cluster map, it is aware of all the monitors, OSDs, and metadata servers in the cluster. **However, even equipped with a copy of the latest version of the -cluster map, the client doesn't know anything about object locations.** +cluster map, the client doesn't know anything about object locations.** **Object locations must be computed.** @@ -626,7 +626,7 @@ persists, you may need to refer to the `Troubleshooting Peering Failure`_ section. .. Note:: PGs that agree on the state of the cluster do not necessarily have - the current data yet. + the current data yet. The Ceph Storage Cluster was designed to store at least two copies of an object (that is, ``size = 2``), which is the minimum requirement for data safety. For @@ -656,7 +656,7 @@ epoch. The Ceph OSD daemons that are part of an *Acting Set* might not always be ``up``. When an OSD in the *Acting Set* is ``up``, it is part of the *Up Set*. The *Up Set* is an important distinction, because Ceph can remap PGs to other -Ceph OSD Daemons when an OSD fails. +Ceph OSD Daemons when an OSD fails. .. note:: Consider a hypothetical *Acting Set* for a PG that contains ``osd.25``, ``osd.32`` and ``osd.61``. The first OSD (``osd.25``), is the @@ -676,7 +676,7 @@ process (albeit rather crudely, since it is substantially less impactful with large clusters) where some, but not all of the PGs migrate from existing OSDs (OSD 1, and OSD 2) to the new OSD (OSD 3). Even when rebalancing, CRUSH is stable. Many of the placement groups remain in their original configuration, -and each OSD gets some added capacity, so there are no load spikes on the +and each OSD gets some added capacity, so there are no load spikes on the new OSD after rebalancing is complete. @@ -823,7 +823,7 @@ account. | | | | | +-------+-------+ | | ^ | - | | | + | | | | | | +--+---+ +------+ +---+--+ +---+--+ name | NYAN | | NYAN | | NYAN | | NYAN | @@ -876,7 +876,7 @@ version 1). .. ditaa:: Primary OSD - + +-------------+ | OSD 1 | +-------------+ | log | Write Full | | @@ -921,7 +921,7 @@ as ``D2v2`` ) while others are acknowledged and persisted to storage drives .. ditaa:: Primary OSD - + +-------------+ | OSD 1 | | log | @@ -930,11 +930,11 @@ as ``D2v2`` ) while others are acknowledged and persisted to storage drives | +----+ +<------------+ Ceph Client | | | v2 | | | +----+ | +-------------+ - | |D1v1| 1,1 | - | +----+ | - +------+------+ - | - | + | |D1v1| 1,1 | + | +----+ | + +------+------+ + | + | | +------+------+ | | OSD 2 | | +------+ | log | @@ -962,7 +962,7 @@ the logs' ``last_complete`` pointer can move from ``1,1`` to ``1,2``. .. ditaa:: Primary OSD - + +-------------+ | OSD 1 | | log | @@ -971,10 +971,10 @@ the logs' ``last_complete`` pointer can move from ``1,1`` to ``1,2``. | +----+ +<------------+ Ceph Client | | | v2 | | | +----+ | +-------------+ - | |D1v1| 1,1 | - | +----+ | - +------+------+ - | + | |D1v1| 1,1 | + | +----+ | + +------+------+ + | | +-------------+ | | OSD 2 | | | log | @@ -986,7 +986,7 @@ the logs' ``last_complete`` pointer can move from ``1,1`` to ``1,2``. | | |D2v1| 1,1 | | | +----+ | | +-------------+ - | + | | +-------------+ | | OSD 3 | | | log | @@ -1007,7 +1007,7 @@ on **OSD 3**. .. ditaa:: Primary OSD - + +-------------+ | OSD 1 | | log | @@ -1050,7 +1050,7 @@ will be the head of the new authoritative log. | (down) | | c333 | +------+------+ - | + | | +-------------+ | | OSD 2 | | | log | @@ -1059,7 +1059,7 @@ will be the head of the new authoritative log. | | +----+ | | | | | +-------------+ - | + | | +-------------+ | | OSD 3 | | | log | @@ -1079,20 +1079,20 @@ will be the head of the new authoritative log. | 1,1 | | | +------+------+ - + The log entry 1,2 found on **OSD 3** is divergent from the new authoritative log provided by **OSD 4**: it is discarded and the file containing the ``C1v2`` chunk is removed. The ``D1v1`` chunk is rebuilt with the ``decode`` function of -the erasure coding library during scrubbing and stored on the new primary +the erasure coding library during scrubbing and stored on the new primary **OSD 4**. .. ditaa:: Primary OSD - + +-------------+ | OSD 4 | | log | @@ -1140,7 +1140,7 @@ configured to act as a cache tier, and a backing pool of either erasure-coded or relatively slower/cheaper devices configured to act as an economical storage tier. The Ceph objecter handles where to place the objects and the tiering agent determines when to flush objects from the cache to the backing storage -tier. So the cache tier and the backing storage tier are completely transparent +tier. So the cache tier and the backing storage tier are completely transparent to Ceph clients. @@ -1150,14 +1150,14 @@ to Ceph clients. | Ceph Client | +------+------+ ^ - Tiering is | + Tiering is | Transparent | Faster I/O to Ceph | +---------------+ - Client Ops | | | + Client Ops | | | | +----->+ Cache Tier | | | | | | | +-----+---+-----+ - | | | ^ + | | | ^ v v | | Active Data in Cache Tier +------+----+--+ | | | Objecter | | | @@ -1198,11 +1198,11 @@ operations on the outbound data and return the data to the client. A Ceph class for a content management system that presents pictures of a particular size and aspect ratio could take an inbound bitmap image, crop it - to a particular aspect ratio, resize it and embed an invisible copyright or - watermark to help protect the intellectual property; then, save the + to a particular aspect ratio, resize it and embed an invisible copyright or + watermark to help protect the intellectual property; then, save the resulting bitmap image to the object store. -See ``src/objclass/objclass.h``, ``src/fooclass.cc`` and ``src/barclass`` for +See ``src/objclass/objclass.h``, ``src/fooclass.cc`` and ``src/barclass`` for exemplary implementations. @@ -1279,7 +1279,7 @@ synchronization/communication channel. +----------+ +----------+ +----------+ +---------------+ | | | | | | | | - | | Watch Object | | + | | Watch Object | | |--------------------------------------------------->| | | | | |<---------------------------------------------------| @@ -1295,7 +1295,7 @@ synchronization/communication channel. | | | | | | |<-----------------| | | | Ack/Commit | - | | Notify | | + | | Notify | | |--------------------------------------------------->| | | | | |<---------------------------------------------------| @@ -1305,7 +1305,7 @@ synchronization/communication channel. | | Notify | | | | |<-----------------| | | | Notify | - | | Ack | | + | | Ack | | |----------------+---------------------------------->| | | | | | | Ack | | @@ -1313,7 +1313,7 @@ synchronization/communication channel. | | | | | | | Ack | | | |----------------->| - | | | | + | | | | |<---------------+----------------+------------------| | Complete @@ -1331,13 +1331,13 @@ volume'. Ceph's striping offers the throughput of RAID 0 striping, the reliability of n-way RAID mirroring and faster recovery. Ceph provides three types of clients: Ceph Block Device, Ceph File System, and -Ceph Object Storage. A Ceph Client converts its data from the representation +Ceph Object Storage. A Ceph Client converts its data from the representation format it provides to its users (a block device image, RESTful objects, CephFS -filesystem directories) into objects for storage in the Ceph Storage Cluster. +filesystem directories) into objects for storage in the Ceph Storage Cluster. -.. tip:: The objects Ceph stores in the Ceph Storage Cluster are not striped. - Ceph Object Storage, Ceph Block Device, and the Ceph File System stripe their - data over multiple Ceph Storage Cluster objects. Ceph Clients that write +.. tip:: The objects Ceph stores in the Ceph Storage Cluster are not striped. + Ceph Object Storage, Ceph Block Device, and the Ceph File System stripe their + data over multiple Ceph Storage Cluster objects. Ceph Clients that write directly to the Ceph Storage Cluster via ``librados`` must perform the striping (and parallel I/O) for themselves to obtain these benefits. @@ -1380,7 +1380,7 @@ diagram depicts the simplest form of striping: | End cCCC | | End cCCC | | Object 0 | | Object 1 | \-----------/ \-----------/ - + If you anticipate large images sizes, large S3 or Swift objects (e.g., video), or large CephFS directories, you may see considerable read/write performance @@ -1420,16 +1420,16 @@ stripe (``stripe unit 16``) in the first object in the new object set (``object +-----------------+--------+--------+-----------------+ | | | | +--\ v v v v | - /-----------\ /-----------\ /-----------\ /-----------\ | + /-----------\ /-----------\ /-----------\ /-----------\ | | Begin cCCC| | Begin cCCC| | Begin cCCC| | Begin cCCC| | | Object 0 | | Object 1 | | Object 2 | | Object 3 | | +-----------+ +-----------+ +-----------+ +-----------+ | | stripe | | stripe | | stripe | | stripe | | | unit 0 | | unit 1 | | unit 2 | | unit 3 | | +-----------+ +-----------+ +-----------+ +-----------+ | - | stripe | | stripe | | stripe | | stripe | +-\ + | stripe | | stripe | | stripe | | stripe | +-\ | unit 4 | | unit 5 | | unit 6 | | unit 7 | | Object - +-----------+ +-----------+ +-----------+ +-----------+ +- Set + +-----------+ +-----------+ +-----------+ +-----------+ +- Set | stripe | | stripe | | stripe | | stripe | | 1 | unit 8 | | unit 9 | | unit 10 | | unit 11 | +-/ +-----------+ +-----------+ +-----------+ +-----------+ | @@ -1437,36 +1437,36 @@ stripe (``stripe unit 16``) in the first object in the new object set (``object | unit 12 | | unit 13 | | unit 14 | | unit 15 | | +-----------+ +-----------+ +-----------+ +-----------+ | | End cCCC | | End cCCC | | End cCCC | | End cCCC | | - | Object 0 | | Object 1 | | Object 2 | | Object 3 | | + | Object 0 | | Object 1 | | Object 2 | | Object 3 | | \-----------/ \-----------/ \-----------/ \-----------/ | | +--/ - + +--\ | - /-----------\ /-----------\ /-----------\ /-----------\ | + /-----------\ /-----------\ /-----------\ /-----------\ | | Begin cCCC| | Begin cCCC| | Begin cCCC| | Begin cCCC| | - | Object 4 | | Object 5 | | Object 6 | | Object 7 | | + | Object 4 | | Object 5 | | Object 6 | | Object 7 | | +-----------+ +-----------+ +-----------+ +-----------+ | | stripe | | stripe | | stripe | | stripe | | | unit 16 | | unit 17 | | unit 18 | | unit 19 | | +-----------+ +-----------+ +-----------+ +-----------+ | - | stripe | | stripe | | stripe | | stripe | +-\ + | stripe | | stripe | | stripe | | stripe | +-\ | unit 20 | | unit 21 | | unit 22 | | unit 23 | | Object +-----------+ +-----------+ +-----------+ +-----------+ +- Set - | stripe | | stripe | | stripe | | stripe | | 2 + | stripe | | stripe | | stripe | | stripe | | 2 | unit 24 | | unit 25 | | unit 26 | | unit 27 | +-/ +-----------+ +-----------+ +-----------+ +-----------+ | | stripe | | stripe | | stripe | | stripe | | | unit 28 | | unit 29 | | unit 30 | | unit 31 | | +-----------+ +-----------+ +-----------+ +-----------+ | | End cCCC | | End cCCC | | End cCCC | | End cCCC | | - | Object 4 | | Object 5 | | Object 6 | | Object 7 | | + | Object 4 | | Object 5 | | Object 6 | | Object 7 | | \-----------/ \-----------/ \-----------/ \-----------/ | | +--/ -Three important variables determine how Ceph stripes data: +Three important variables determine how Ceph stripes data: - **Object Size:** Objects in the Ceph Storage Cluster have a maximum configurable size (e.g., 2MB, 4MB, etc.). The object size should be large @@ -1474,24 +1474,24 @@ Three important variables determine how Ceph stripes data: the stripe unit. - **Stripe Width:** Stripes have a configurable unit size (e.g., 64kb). - The Ceph Client divides the data it will write to objects into equally - sized stripe units, except for the last stripe unit. A stripe width, - should be a fraction of the Object Size so that an object may contain + The Ceph Client divides the data it will write to objects into equally + sized stripe units, except for the last stripe unit. A stripe width, + should be a fraction of the Object Size so that an object may contain many stripe units. - **Stripe Count:** The Ceph Client writes a sequence of stripe units - over a series of objects determined by the stripe count. The series - of objects is called an object set. After the Ceph Client writes to + over a series of objects determined by the stripe count. The series + of objects is called an object set. After the Ceph Client writes to the last object in the object set, it returns to the first object in the object set. - + .. important:: Test the performance of your striping configuration before putting your cluster into production. You CANNOT change these striping parameters after you stripe the data and write it to objects. Once the Ceph Client has striped data to stripe units and mapped the stripe units to objects, Ceph's CRUSH algorithm maps the objects to placement groups, -and the placement groups to Ceph OSD Daemons before the objects are stored as +and the placement groups to Ceph OSD Daemons before the objects are stored as files on a storage drive. .. note:: Since a client writes to a single pool, all data striped into objects @@ -1515,23 +1515,23 @@ Ceph Clients include a number of service interfaces. These include: that uses ``librbd`` directly--avoiding the kernel object overhead for virtualized systems. -- **Object Storage:** The :term:`Ceph Object Storage` (a.k.a., RGW) service +- **Object Storage:** The :term:`Ceph Object Storage` (a.k.a., RGW) service provides RESTful APIs with interfaces that are compatible with Amazon S3 - and OpenStack Swift. - -- **Filesystem**: The :term:`Ceph File System` (CephFS) service provides - a POSIX compliant filesystem usable with ``mount`` or as + and OpenStack Swift. + +- **Filesystem**: The :term:`Ceph File System` (CephFS) service provides + a POSIX compliant filesystem usable with ``mount`` or as a filesystem in user space (FUSE). Ceph can run additional instances of OSDs, MDSs, and monitors for scalability and high availability. The following diagram depicts the high-level -architecture. +architecture. .. ditaa:: +--------------+ +----------------+ +-------------+ | Block Device | | Object Storage | | CephFS | - +--------------+ +----------------+ +-------------+ + +--------------+ +----------------+ +-------------+ +--------------+ +----------------+ +-------------+ | librbd | | librgw | | libcephfs | @@ -1563,10 +1563,10 @@ another application. .. topic:: S3/Swift Objects and Store Cluster Objects Compared Ceph's Object Storage uses the term *object* to describe the data it stores. - S3 and Swift objects are not the same as the objects that Ceph writes to the + S3 and Swift objects are not the same as the objects that Ceph writes to the Ceph Storage Cluster. Ceph Object Storage objects are mapped to Ceph Storage - Cluster objects. The S3 and Swift objects do not necessarily - correspond in a 1:1 manner with an object stored in the storage cluster. It + Cluster objects. The S3 and Swift objects do not necessarily + correspond in a 1:1 manner with an object stored in the storage cluster. It is possible for an S3 or Swift object to map to multiple Ceph objects. See `Ceph Object Storage`_ for details. @@ -1582,7 +1582,7 @@ Ceph Storage Cluster, where each object gets mapped to a placement group and distributed, and the placement groups are spread across separate ``ceph-osd`` daemons throughout the cluster. -.. important:: Striping allows RBD block devices to perform better than a single +.. important:: Striping allows RBD block devices to perform better than a single server could! Thin-provisioned snapshottable Ceph Block Devices are an attractive option for @@ -1591,7 +1591,8 @@ typically deploy a Ceph Block Device with the ``rbd`` network storage driver in QEMU/KVM, where the host machine uses ``librbd`` to provide a block device service to the guest. Many cloud computing stacks use ``libvirt`` to integrate with hypervisors. You can use thin-provisioned Ceph Block Devices with QEMU and -``libvirt`` to support OpenStack and CloudStack among other solutions. +``libvirt`` to support OpenStack, OpenNebula and CloudStack +among other solutions. While we do not provide ``librbd`` support with other hypervisors at this time, you may also use Ceph Block Device kernel objects to provide a block device to a @@ -1616,7 +1617,7 @@ a Filesystem in User Space (FUSE). +-----------------------+ +------------------------+ | CephFS Kernel Object | | CephFS FUSE | - +-----------------------+ +------------------------+ + +-----------------------+ +------------------------+ +---------------------------------------------------+ | CephFS Library (libcephfs) | @@ -1645,9 +1646,9 @@ CephFS separates the metadata from the data, storing the metadata in the MDS, and storing the file data in one or more objects in the Ceph Storage Cluster. The Ceph filesystem aims for POSIX compatibility. ``ceph-mds`` can run as a single process, or it can be distributed out to multiple physical machines, -either for high availability or for scalability. +either for high availability or for scalability. -- **High Availability**: The extra ``ceph-mds`` instances can be `standby`, +- **High Availability**: The extra ``ceph-mds`` instances can be `standby`, ready to take over the duties of any failed ``ceph-mds`` that was `active`. This is easy because all the data, including the journal, is stored on RADOS. The transition is triggered automatically by ``ceph-mon``. diff --git a/doc/install/index.rst b/doc/install/index.rst index d8e9ca3a63e..82585edd8b8 100644 --- a/doc/install/index.rst +++ b/doc/install/index.rst @@ -4,13 +4,13 @@ Installing Ceph =============== -There are multiple ways to install Ceph. +There are multiple ways to install Ceph. Recommended methods ~~~~~~~~~~~~~~~~~~~ :ref:`Cephadm ` is a tool that can be used to -install and manage a Ceph cluster. +install and manage a Ceph cluster. * cephadm supports only Octopus and newer releases. * cephadm is fully integrated with the orchestration API and fully supports the @@ -59,6 +59,8 @@ tool that can be used to quickly deploy clusters. It is deprecated. `github.com/openstack/puppet-ceph `_ installs Ceph via Puppet. +`OpenNebula HCI clusters `_ deploys Ceph on various cloud platforms. + Ceph can also be :ref:`installed manually `. diff --git a/doc/rbd/index.rst b/doc/rbd/index.rst index 4a8029bbaee..96f1e138978 100644 --- a/doc/rbd/index.rst +++ b/doc/rbd/index.rst @@ -32,9 +32,9 @@ the ``librbd`` library. Ceph's block devices deliver high performance with vast scalability to `kernel modules`_, or to :abbr:`KVMs (kernel virtual machines)` such as `QEMU`_, and -cloud-based computing systems like `OpenStack`_ and `CloudStack`_ that rely on -libvirt and QEMU to integrate with Ceph block devices. You can use the same cluster -to operate the :ref:`Ceph RADOS Gateway `, the +cloud-based computing systems like `OpenStack`_, `OpenNebula`_ and `CloudStack`_ +that rely on libvirt and QEMU to integrate with Ceph block devices. You can use +the same cluster to operate the :ref:`Ceph RADOS Gateway `, the :ref:`Ceph File System `, and Ceph block devices simultaneously. .. important:: To use Ceph Block Devices, you must have access to a running @@ -69,4 +69,5 @@ to operate the :ref:`Ceph RADOS Gateway `, the .. _kernel modules: ./rbd-ko/ .. _QEMU: ./qemu-rbd/ .. _OpenStack: ./rbd-openstack +.. _OpenNebula: https://docs.opennebula.io/stable/open_cluster_deployment/storage_setup/ceph_ds.html .. _CloudStack: ./rbd-cloudstack diff --git a/doc/rbd/libvirt.rst b/doc/rbd/libvirt.rst index e3523f8a800..a55a4f95b79 100644 --- a/doc/rbd/libvirt.rst +++ b/doc/rbd/libvirt.rst @@ -4,11 +4,11 @@ .. index:: Ceph Block Device; livirt -The ``libvirt`` library creates a virtual machine abstraction layer between -hypervisor interfaces and the software applications that use them. With -``libvirt``, developers and system administrators can focus on a common +The ``libvirt`` library creates a virtual machine abstraction layer between +hypervisor interfaces and the software applications that use them. With +``libvirt``, developers and system administrators can focus on a common management framework, common API, and common shell interface (i.e., ``virsh``) -to many different hypervisors, including: +to many different hypervisors, including: - QEMU/KVM - XEN @@ -18,7 +18,7 @@ to many different hypervisors, including: Ceph block devices support QEMU/KVM. You can use Ceph block devices with software that interfaces with ``libvirt``. The following stack diagram -illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``. +illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``. .. ditaa:: @@ -41,10 +41,11 @@ illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``. The most common ``libvirt`` use case involves providing Ceph block devices to -cloud solutions like OpenStack or CloudStack. The cloud solution uses +cloud solutions like OpenStack, OpenNebula or CloudStack. The cloud solution uses ``libvirt`` to interact with QEMU/KVM, and QEMU/KVM interacts with Ceph block -devices via ``librbd``. See `Block Devices and OpenStack`_ and `Block Devices -and CloudStack`_ for details. See `Installation`_ for installation details. +devices via ``librbd``. See `Block Devices and OpenStack`_, +`Block Devices and OpenNebula`_ and `Block Devices and CloudStack`_ for details. +See `Installation`_ for installation details. You can also use Ceph block devices with ``libvirt``, ``virsh`` and the ``libvirt`` API. See `libvirt Virtualization API`_ for details. @@ -62,12 +63,12 @@ Configuring Ceph To configure Ceph for use with ``libvirt``, perform the following steps: -#. `Create a pool`_. The following example uses the +#. `Create a pool`_. The following example uses the pool name ``libvirt-pool``.:: ceph osd pool create libvirt-pool - Verify the pool exists. :: + Verify the pool exists. :: ceph osd lspools @@ -80,23 +81,23 @@ To configure Ceph for use with ``libvirt``, perform the following steps: and references ``libvirt-pool``. :: ceph auth get-or-create client.libvirt mon 'profile rbd' osd 'profile rbd pool=libvirt-pool' - - Verify the name exists. :: - + + Verify the name exists. :: + ceph auth ls - **NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``, - not the Ceph name ``client.libvirt``. See `User Management - User`_ and - `User Management - CLI`_ for a detailed explanation of the difference - between ID and name. + **NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``, + not the Ceph name ``client.libvirt``. See `User Management - User`_ and + `User Management - CLI`_ for a detailed explanation of the difference + between ID and name. -#. Use QEMU to `create an image`_ in your RBD pool. +#. Use QEMU to `create an image`_ in your RBD pool. The following example uses the image name ``new-libvirt-image`` and references ``libvirt-pool``. :: qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 2G - Verify the image exists. :: + Verify the image exists. :: rbd -p libvirt-pool ls @@ -111,7 +112,7 @@ To configure Ceph for use with ``libvirt``, perform the following steps: admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok The ``client.libvirt`` section name should match the cephx user you created - above. + above. If SELinux or AppArmor is enabled, note that this could prevent the client process (qemu via libvirt) from doing some operations, such as writing logs or operate the images or admin socket to the destination locations (``/var/ @@ -123,7 +124,7 @@ Preparing the VM Manager ======================== You may use ``libvirt`` without a VM manager, but you may find it simpler to -create your first domain with ``virt-manager``. +create your first domain with ``virt-manager``. #. Install a virtual machine manager. See `KVM/VirtManager`_ for details. :: @@ -131,7 +132,7 @@ create your first domain with ``virt-manager``. #. Download an OS image (if necessary). -#. Launch the virtual machine manager. :: +#. Launch the virtual machine manager. :: sudo virt-manager @@ -142,12 +143,12 @@ Creating a VM To create a VM with ``virt-manager``, perform the following steps: -#. Press the **Create New Virtual Machine** button. +#. Press the **Create New Virtual Machine** button. #. Name the new virtual machine domain. In the exemplary embodiment, we use the name ``libvirt-virtual-machine``. You may use any name you wish, - but ensure you replace ``libvirt-virtual-machine`` with the name you - choose in subsequent commandline and configuration examples. :: + but ensure you replace ``libvirt-virtual-machine`` with the name you + choose in subsequent commandline and configuration examples. :: libvirt-virtual-machine @@ -155,9 +156,9 @@ To create a VM with ``virt-manager``, perform the following steps: /path/to/image/recent-linux.img - **NOTE:** Import a recent image. Some older images may not rescan for + **NOTE:** Import a recent image. Some older images may not rescan for virtual devices properly. - + #. Configure and start the VM. #. You may use ``virsh list`` to verify the VM domain exists. :: @@ -179,11 +180,11 @@ you that root privileges are required. For a reference of ``virsh`` commands, refer to `Virsh Command Reference`_. -#. Open the configuration file with ``virsh edit``. :: +#. Open the configuration file with ``virsh edit``. :: sudo virsh edit {vm-domain-name} - Under ```` there should be a ```` entry. :: + Under ```` there should be a ```` entry. :: /usr/bin/kvm @@ -196,18 +197,18 @@ commands, refer to `Virsh Command Reference`_. Replace ``/path/to/image/recent-linux.img`` with the path to the OS image. - The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See + The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See `Virtio`_ for details. - **IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit - the configuration file under ``/etc/libvirt/qemu`` with a text editor, - ``libvirt`` may not recognize the change. If there is a discrepancy between - the contents of the XML file under ``/etc/libvirt/qemu`` and the result of - ``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work + **IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit + the configuration file under ``/etc/libvirt/qemu`` with a text editor, + ``libvirt`` may not recognize the change. If there is a discrepancy between + the contents of the XML file under ``/etc/libvirt/qemu`` and the result of + ``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work properly. - -#. Add the Ceph RBD image you created as a ```` entry. :: + +#. Add the Ceph RBD image you created as a ```` entry. :: @@ -216,21 +217,21 @@ commands, refer to `Virsh Command Reference`_. - Replace ``{monitor-host}`` with the name of your host, and replace the - pool and/or image name as necessary. You may add multiple ```` + Replace ``{monitor-host}`` with the name of your host, and replace the + pool and/or image name as necessary. You may add multiple ```` entries for your Ceph monitors. The ``dev`` attribute is the logical - device name that will appear under the ``/dev`` directory of your - VM. The optional ``bus`` attribute indicates the type of disk device to - emulate. The valid settings are driver specific (e.g., "ide", "scsi", + device name that will appear under the ``/dev`` directory of your + VM. The optional ``bus`` attribute indicates the type of disk device to + emulate. The valid settings are driver specific (e.g., "ide", "scsi", "virtio", "xen", "usb" or "sata"). - + See `Disks`_ for details of the ```` element, and its child elements and attributes. - + #. Save the file. -#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by - default), you must generate a secret. :: +#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by + default), you must generate a secret. :: cat > secret.xml < @@ -249,11 +250,11 @@ commands, refer to `Virsh Command Reference`_. ceph auth get-key client.libvirt | sudo tee client.libvirt.key -#. Set the UUID of the secret. :: +#. Set the UUID of the secret. :: sudo virsh secret-set-value --secret {uuid of secret} --base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml - You must also set the secret manually by adding the following ```` + You must also set the secret manually by adding the following ```` entry to the ```` element you entered earlier (replacing the ``uuid`` value with the result from the command line example above). :: @@ -266,14 +267,14 @@ commands, refer to `Virsh Command Reference`_. - `` exists:: - + virsh domblklist {vm-domain-name} --details -If everything looks okay, you may begin using the Ceph block device +If everything looks okay, you may begin using the Ceph block device within your VM. .. _Installation: ../../install .. _libvirt Virtualization API: http://www.libvirt.org .. _Block Devices and OpenStack: ../rbd-openstack +.. _Block Devices and OpenNebula: https://docs.opennebula.io/stable/open_cluster_deployment/storage_setup/ceph_ds.html#datastore-internals .. _Block Devices and CloudStack: ../rbd-cloudstack .. _Create a pool: ../../rados/operations/pools#create-a-pool .. _Create a Ceph User: ../../rados/operations/user-management#add-a-user diff --git a/doc/rbd/rbd-snapshot.rst b/doc/rbd/rbd-snapshot.rst index 120dd8ec125..4a4309f8e7d 100644 --- a/doc/rbd/rbd-snapshot.rst +++ b/doc/rbd/rbd-snapshot.rst @@ -10,7 +10,7 @@ you can create snapshots of images to retain point-in-time state history. Ceph also supports snapshot layering, which allows you to clone images (for example, VM images) quickly and easily. Ceph block device snapshots are managed using the ``rbd`` command and several higher-level interfaces, including `QEMU`_, -`libvirt`_, `OpenStack`_, and `CloudStack`_. +`libvirt`_, `OpenStack`_, `OpenNebula`_ and `CloudStack`_. .. important:: To use RBD snapshots, you must have a running Ceph cluster. @@ -18,14 +18,14 @@ the ``rbd`` command and several higher-level interfaces, including `QEMU`_, .. note:: Because RBD is unaware of any file system within an image (volume), snapshots are merely `crash-consistent` unless they are coordinated within the mounting (attaching) operating system. We therefore recommend that you - pause or stop I/O before taking a snapshot. - + pause or stop I/O before taking a snapshot. + If the volume contains a file system, the file system should be in an internally consistent state before a snapshot is taken. Snapshots taken without write quiescing could need an `fsck` pass before they are mounted again. To quiesce I/O you can use `fsfreeze` command. See the `fsfreeze(8)` - man page for more details. - + man page for more details. + For virtual machines, `qemu-guest-agent` can be used to automatically freeze file systems when creating a snapshot. @@ -44,7 +44,7 @@ Cephx Notes When `cephx`_ authentication is enabled (it is by default), you must specify a user name or ID and a path to the keyring containing the corresponding key. See -:ref:`User Management ` for details. +:ref:`User Management ` for details. .. prompt:: bash $ @@ -83,7 +83,7 @@ For example: .. prompt:: bash $ rbd snap create rbd/foo@snapname - + List Snapshots -------------- @@ -135,7 +135,7 @@ name, the image name, and the snap name: .. prompt:: bash $ rbd snap rm {pool-name}/{image-name}@{snap-name} - + For example: .. prompt:: bash $ @@ -186,20 +186,20 @@ snapshot simplifies semantics, making it possible to create clones rapidly. | | to Parent | | | (read only) | | (writable) | +-------------+ +-------------+ - + Parent Child .. note:: The terms "parent" and "child" refer to a Ceph block device snapshot (parent) and the corresponding image cloned from the snapshot (child). These terms are important for the command line usage below. - + Each cloned image (child) stores a reference to its parent image, which enables the cloned image to open the parent snapshot and read it. A copy-on-write clone of a snapshot behaves exactly like any other Ceph block device image. You can read to, write from, clone, and resize cloned images. There are no special restrictions with cloned images. However, the -copy-on-write clone of a snapshot depends on the snapshot, so you must +copy-on-write clone of a snapshot depends on the snapshot, so you must protect the snapshot before you clone it. The diagram below depicts this process. @@ -222,7 +222,7 @@ have performed these steps, you can begin cloning the snapshot. | | | | +----------------------------+ +-----------------------------+ | - +--------------------------------------+ + +--------------------------------------+ | v +----------------------------+ +-----------------------------+ @@ -265,7 +265,7 @@ Protecting a Snapshot --------------------- Clones access the parent snapshots. All clones would break if a user -inadvertently deleted the parent snapshot. To prevent data loss, you must +inadvertently deleted the parent snapshot. To prevent data loss, you must protect the snapshot before you can clone it: .. prompt:: bash $ @@ -290,13 +290,13 @@ protect the snapshot before you can clone it: .. prompt:: bash $ rbd clone {pool-name}/{parent-image-name}@{snap-name} {pool-name}/{child-image-name} - + For example: .. prompt:: bash $ rbd clone rbd/foo@snapname rbd/bar - + .. note:: You may clone a snapshot from one pool to an image in another pool. For example, you may maintain read-only images and snapshots as templates in @@ -364,5 +364,6 @@ For example: .. _cephx: ../../rados/configuration/auth-config-ref/ .. _QEMU: ../qemu-rbd/ .. _OpenStack: ../rbd-openstack/ +.. _OpenNebula: https://docs.opennebula.io/stable/management_and_operations/vm_management/vm_instances.html?highlight=ceph#managing-disk-snapshots .. _CloudStack: ../rbd-cloudstack/ .. _libvirt: ../libvirt/ diff --git a/doc/start/documenting-ceph.rst b/doc/start/documenting-ceph.rst index d94e87f6d74..fef870f0086 100644 --- a/doc/start/documenting-ceph.rst +++ b/doc/start/documenting-ceph.rst @@ -5,7 +5,7 @@ ================== You can help the Ceph project by contributing to the documentation. Even -small contributions help the Ceph project. +small contributions help the Ceph project. The easiest way to suggest a correction to the documentation is to send an email to `ceph-users@ceph.io`. Include the string "ATTN: DOCS" or @@ -27,7 +27,7 @@ Location of the Documentation in the Repository =============================================== The Ceph documentation source is in the ``ceph/doc`` directory of the Ceph -repository. Python Sphinx renders the source into HTML and manpages. +repository. Python Sphinx renders the source into HTML and manpages. Viewing Old Ceph Documentation ============================== @@ -113,27 +113,27 @@ this, you must: The Ceph documentation is organized by component: -- **Ceph Storage Cluster:** The Ceph Storage Cluster documentation is +- **Ceph Storage Cluster:** The Ceph Storage Cluster documentation is in the ``doc/rados`` directory. - -- **Ceph Block Device:** The Ceph Block Device documentation is in + +- **Ceph Block Device:** The Ceph Block Device documentation is in the ``doc/rbd`` directory. - -- **Ceph Object Storage:** The Ceph Object Storage documentation is in + +- **Ceph Object Storage:** The Ceph Object Storage documentation is in the ``doc/radosgw`` directory. -- **Ceph File System:** The Ceph File System documentation is in the +- **Ceph File System:** The Ceph File System documentation is in the ``doc/cephfs`` directory. - + - **Installation (Quick):** Quick start documentation is in the ``doc/start`` directory. - + - **Installation (Manual):** Documentaton concerning the manual installation of Ceph is in the ``doc/install`` directory. - + - **Manpage:** Manpage source is in the ``doc/man`` directory. -- **Developer:** Developer documentation is in the ``doc/dev`` +- **Developer:** Developer documentation is in the ``doc/dev`` directory. - **Images:** Images including JPEG and PNG files are stored in the @@ -152,7 +152,7 @@ are in the current release. ``main`` is the most commonly used branch. : git checkout main -When you make changes to documentation that affect an upcoming release, use +When you make changes to documentation that affect an upcoming release, use the ``next`` branch. ``next`` is the second most commonly used branch. : .. prompt:: bash $ @@ -206,8 +206,8 @@ or a table of contents entry. The ``index.rst`` file of a top-level directory usually contains a TOC, where you can add the new file name. All documents must have a title. See `Headings`_ for details. -Your new document doesn't get tracked by ``git`` automatically. When you want -to add the document to the repository, you must use ``git add +Your new document doesn't get tracked by ``git`` automatically. When you want +to add the document to the repository, you must use ``git add {path-to-filename}``. For example, from the top level directory of the repository, adding an ``example.rst`` file to the ``rados`` subdirectory would look like this: @@ -307,6 +307,7 @@ the following packages are required: - graphviz - ant - ditaa +- cython3 .. raw:: html @@ -354,7 +355,7 @@ distributions, execute the following: .. prompt:: bash $ sudo apt-get install gcc python-dev python3-pip libxml2-dev libxslt-dev doxygen graphviz ant ditaa - sudo apt-get install python3-sphinx python3-venv + sudo apt-get install python3-sphinx python3-venv cython3 For Fedora distributions, execute the following: @@ -436,39 +437,39 @@ Ceph documentation commits are simple, but follow a strict convention: - A commit MUST have a comment. - A commit comment MUST be prepended with ``doc:``. (strict) - The comment summary MUST be one line only. (strict) -- Additional comments MAY follow a blank line after the summary, +- Additional comments MAY follow a blank line after the summary, but should be terse. - A commit MAY include ``Fixes: https://tracker.ceph.com/issues/{bug number}``. - Commits MUST include ``Signed-off-by: Firstname Lastname ``. (strict) -.. tip:: Follow the foregoing convention particularly where it says - ``(strict)`` or you will be asked to modify your commit to comply with +.. tip:: Follow the foregoing convention particularly where it says + ``(strict)`` or you will be asked to modify your commit to comply with this convention. -The following is a common commit comment (preferred):: +The following is a common commit comment (preferred):: doc: Fixes a spelling error and a broken hyperlink. - + Signed-off-by: John Doe -The following comment includes a reference to a bug. :: +The following comment includes a reference to a bug. :: doc: Fixes a spelling error and a broken hyperlink. Fixes: https://tracker.ceph.com/issues/1234 - + Signed-off-by: John Doe The following comment includes a terse sentence following the comment summary. -There is a carriage return between the summary line and the description:: +There is a carriage return between the summary line and the description:: doc: Added mon setting to monitor config reference - + Describes 'mon setting', which is a new setting added to config_opts.h. - + Signed-off-by: John Doe @@ -477,7 +478,7 @@ To commit changes, execute the following: .. prompt:: bash $ git commit -a - + An easy way to manage your documentation commits is to use visual tools for ``git``. For example, ``gitk`` provides a graphical interface for viewing the @@ -504,7 +505,7 @@ Then, execute: cd {git-ceph-repo-path} gitk - + Finally, select **File->Start git gui** to activate the graphical user interface. @@ -546,15 +547,15 @@ commits will be squashed into a single commit. #. Make the commits that you will later squash. #. Make the first commit. - + :: - + doc/glossary: improve "CephX" entry - + Improve the glossary entry for "CephX". - + Signed-off-by: Zac Dover - + # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # @@ -562,18 +563,18 @@ commits will be squashed into a single commit. # Changes to be committed: # modified: glossary.rst # - + #. Make the second commit. - + :: - + doc/glossary: add link to architecture doc - + Add a link to a section in the architecture document, which link will be used in the process of improving the "CephX" glossary entry. - + Signed-off-by: Zac Dover - + # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # @@ -582,18 +583,18 @@ commits will be squashed into a single commit. # # Changes to be committed: # modified: architecture.rst - + #. Make the third commit. - + :: - + doc/glossary: link to Arch doc in "CephX" glossary - + Link to the Architecture document from the "CephX" entry in the Glossary. - + Signed-off-by: Zac Dover - + # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # @@ -604,24 +605,24 @@ commits will be squashed into a single commit. # modified: glossary.rst #. There are now three commits in the feature branch. We will now begin the - process of squashing them into a single commit. - - #. Run the command ``git rebase -i main``, which rebases the current branch + process of squashing them into a single commit. + + #. Run the command ``git rebase -i main``, which rebases the current branch (the feature branch) against the ``main`` branch: .. prompt:: bash - + git rebase -i main - + #. A list of the commits that have been made to the feature branch now appear, and looks like this: :: - + pick d395e500883 doc/glossary: improve "CephX" entry pick b34986e2922 doc/glossary: add link to architecture doc pick 74d0719735c doc/glossary: link to Arch doc in "CephX" glossary - + # Rebase 0793495b9d1..74d0719735c onto 0793495b9d1 (3 commands) # # Commands: @@ -650,7 +651,7 @@ commits will be squashed into a single commit. # # If you remove a line here THAT COMMIT WILL BE LOST. - Find the part of the screen that says "pick". This is the part that you will + Find the part of the screen that says "pick". This is the part that you will alter. There are three commits that are currently labeled "pick". We will choose one of them to remain labeled "pick", and we will label the other two commits "squash". @@ -662,7 +663,7 @@ commits will be squashed into a single commit. pick d395e500883 doc/glossary: improve "CephX" entry squash b34986e2922 doc/glossary: add link to architecture doc squash 74d0719735c doc/glossary: link to Arch doc in "CephX" glossary - + # Rebase 0793495b9d1..74d0719735c onto 0793495b9d1 (3 commands) # # Commands: @@ -699,34 +700,34 @@ commits will be squashed into a single commit. like this: :: - + # This is a combination of 3 commits. # This is the 1st commit message: - + doc/glossary: improve "CephX" entry - + Improve the glossary entry for "CephX". - + Signed-off-by: Zac Dover - + # This is the commit message #2: - + doc/glossary: add link to architecture doc - + Add a link to a section in the architecture document, which link will be used in the process of improving the "CephX" glossary entry. - + Signed-off-by: Zac Dover - + # This is the commit message #3: - + doc/glossary: link to Arch doc in "CephX" glossary - + Link to the Architecture document from the "CephX" entry in the Glossary. - + Signed-off-by: Zac Dover - + # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # @@ -742,17 +743,17 @@ commits will be squashed into a single commit. # Changes to be committed: # modified: doc/architecture.rst # modified: doc/glossary.rst - - #. The commit messages have been revised into the simpler form presented here: - + + #. The commit messages have been revised into the simpler form presented here: + :: - + doc/glossary: improve "CephX" entry - + Improve the glossary entry for "CephX". - + Signed-off-by: Zac Dover - + # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # @@ -771,13 +772,13 @@ commits will be squashed into a single commit. #. Force push the squashed commit from your local working copy to the remote upstream branch. The force push is necessary because the newly squashed commit - does not have an ancestor in the remote. If that confuses you, just run this + does not have an ancestor in the remote. If that confuses you, just run this command and don't think too much about it: - .. prompt:: bash $ + .. prompt:: bash $ git push -f - + :: Enumerating objects: 9, done. @@ -821,17 +822,17 @@ Review the following style guides to maintain this consistency. Headings -------- -#. **Document Titles:** Document titles use the ``=`` character overline and - underline with a leading and trailing space on the title text line. +#. **Document Titles:** Document titles use the ``=`` character overline and + underline with a leading and trailing space on the title text line. See `Document Title`_ for details. #. **Section Titles:** Section tiles use the ``=`` character underline with no - leading or trailing spaces for text. Two carriage returns should precede a + leading or trailing spaces for text. Two carriage returns should precede a section title (unless an inline reference precedes it). See `Sections`_ for details. -#. **Subsection Titles:** Subsection titles use the ``_`` character underline - with no leading or trailing spaces for text. Two carriage returns should +#. **Subsection Titles:** Subsection titles use the ``_`` character underline + with no leading or trailing spaces for text. Two carriage returns should precede a subsection title (unless an inline reference precedes it). @@ -843,18 +844,18 @@ a command line interface without leading or trailing white space. Where possible, we prefer to maintain this convention with text, lists, literal text (exceptions allowed), tables, and ``ditaa`` graphics. -#. **Paragraphs**: Paragraphs have a leading and a trailing carriage return, - and should be 80 characters wide or less so that the documentation can be +#. **Paragraphs**: Paragraphs have a leading and a trailing carriage return, + and should be 80 characters wide or less so that the documentation can be read in native format in a command line terminal. #. **Literal Text:** To create an example of literal text (e.g., command line usage), terminate the preceding paragraph with ``::`` or enter a carriage return to create an empty line after the preceding paragraph; then, enter ``::`` on a separate line followed by another empty line. Then, begin the - literal text with tab indentation (preferred) or space indentation of 3 + literal text with tab indentation (preferred) or space indentation of 3 characters. -#. **Indented Text:** Indented text such as bullet points +#. **Indented Text:** Indented text such as bullet points (e.g., ``- some text``) may span multiple lines. The text of subsequent lines should begin at the same character position as the text of the indented text (less numbers, bullets, etc.). @@ -867,13 +868,13 @@ possible, we prefer to maintain this convention with text, lists, literal text #. **Numbered Lists:** Numbered lists should use autonumbering by starting a numbered indent with ``#.`` instead of the actual number so that - numbered paragraphs can be repositioned without requiring manual + numbered paragraphs can be repositioned without requiring manual renumbering. -#. **Code Examples:** Ceph supports the use of the - ``.. code-block::`` role, so that you can add highlighting to - source examples. This is preferred for source code. However, use of this - tag will cause autonumbering to restart at 1 if it is used as an example +#. **Code Examples:** Ceph supports the use of the + ``.. code-block::`` role, so that you can add highlighting to + source examples. This is preferred for source code. However, use of this + tag will cause autonumbering to restart at 1 if it is used as an example within a numbered list. See `Showing code examples`_ for details. @@ -894,12 +895,12 @@ The Ceph project uses `paragraph level markup`_ to highlight points. #. **Version Added:** Use the ``.. versionadded::`` directive for new features or configuration settings so that users know the minimum release for using a feature. - + #. **Version Changed:** Use the ``.. versionchanged::`` directive for changes in usage or configuration settings. -#. **Deprecated:** Use the ``.. deprecated::`` directive when CLI usage, - a feature or a configuration setting is no longer preferred or will be +#. **Deprecated:** Use the ``.. deprecated::`` directive when CLI usage, + a feature or a configuration setting is no longer preferred or will be discontinued. #. **Topic:** Use the ``.. topic::`` directive to encapsulate text that is @@ -917,7 +918,7 @@ Every document (every ``.rst`` file) in the Sphinx-controlled Ceph documentation suite must be linked either (1) from another document in the documentation suite or (2) from a table of contents (TOC). If any document in the documentation suite is not linked in this way, the ``build-doc`` script -generates warnings when it tries to build the documentation. +generates warnings when it tries to build the documentation. The Ceph project uses the ``.. toctree::`` directive. See `The TOC tree`_ for details. When rendering a table of contents (TOC), specify the ``:maxdepth:`` @@ -943,16 +944,16 @@ to refer explicitly to the title of the section being linked to. For example, RST that links to the Sphinx Python Document Generator homepage and generates a sentence reading "Click here to learn more about Python -Sphinx." looks like this: +Sphinx." looks like this: :: ``Click `here `_ to learn more about Python - Sphinx.`` + Sphinx.`` And here it is, rendered: -Click `here `_ to learn more about Python Sphinx. +Click `here `_ to learn more about Python Sphinx. Pay special attention to the underscore after the backtick. If you forget to include it and this is your first day working with RST, there's a chance that @@ -998,8 +999,8 @@ addresses external to the Ceph documentation: `inline text `_ .. note:: Do not fail to include the space between the inline text and the - less-than sign. - + less-than sign. + Do not fail to include the underscore after the final backtick. To link to addresses that are external to the Ceph documentation, include a @@ -1041,7 +1042,7 @@ Link to target with inline text:: :ref:`inline text` -.. note:: +.. note:: There is no space between "inline text" and the angle bracket that immediately follows it. This is precisely the opposite of :ref:`the @@ -1053,7 +1054,7 @@ Escaping Bold Characters within Words ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section explains how to make certain letters within a word bold while -leaving the other letters in the word regular (non-bold). +leaving the other letters in the word regular (non-bold). The following single-line paragraph provides an example of this: diff --git a/doc/start/hardware-recommendations.rst b/doc/start/hardware-recommendations.rst index c1bff769b28..e41c2eaa45e 100644 --- a/doc/start/hardware-recommendations.rst +++ b/doc/start/hardware-recommendations.rst @@ -5,17 +5,17 @@ ========================== Ceph is designed to run on commodity hardware, which makes building and -maintaining petabyte-scale data clusters flexible and economically feasible. -When planning your cluster's hardware, you will need to balance a number +maintaining petabyte-scale data clusters flexible and economically feasible. +When planning your cluster's hardware, you will need to balance a number of considerations, including failure domains, cost, and performance. -Hardware planning should include distributing Ceph daemons and -other processes that use Ceph across many hosts. Generally, we recommend -running Ceph daemons of a specific type on a host configured for that type -of daemon. We recommend using separate hosts for processes that utilize your -data cluster (e.g., OpenStack, CloudStack, Kubernetes, etc). +Hardware planning should include distributing Ceph daemons and +other processes that use Ceph across many hosts. Generally, we recommend +running Ceph daemons of a specific type on a host configured for that type +of daemon. We recommend using separate hosts for processes that utilize your +data cluster (e.g., OpenStack, OpenNebula, CloudStack, Kubernetes, etc). The requirements of one Ceph cluster are not the same as the requirements of -another, but below are some general guidelines. +another, but below are some general guidelines. .. tip:: check out the `ceph blog`_ too. @@ -106,7 +106,7 @@ that the OSD attempts to consume by changing the :confval:`osd_memory_target` configuration option. - Setting the :confval:`osd_memory_target` below 2GB is not - recommended. Ceph may fail to keep the memory consumption under 2GB and + recommended. Ceph may fail to keep the memory consumption under 2GB and extremely slow performance is likely. - Setting the memory target between 2GB and 4GB typically works but may result @@ -118,7 +118,7 @@ configuration option. OSD performance. - Setting the :confval:`osd_memory_target` higher than 4GB can improve - performance when there many (small) objects or when large (256GB/OSD + performance when there many (small) objects or when large (256GB/OSD or more) data sets are processed. This is especially true with fast NVMe OSDs. @@ -130,7 +130,7 @@ configuration option. fragmented huge pages. Modern versions of Ceph disable transparent huge pages at the application level to avoid this, but that does not guarantee that the kernel will immediately reclaim unmapped memory. The OSD - may still at times exceed its memory target. We recommend budgeting + may still at times exceed its memory target. We recommend budgeting at least 20% extra memory on your system to prevent OSDs from going OOM (**O**\ut **O**\f **M**\emory) during temporary spikes or due to delay in the kernel reclaiming freed pages. That 20% value might be more or less than @@ -193,11 +193,11 @@ per gigabyte (i.e., $150 / 3072 = 0.0488). In the foregoing example, using the .. tip:: Hosting multiple OSDs on a single SAS / SATA HDD is **NOT** a good idea. -.. tip:: Hosting an OSD with monitor, manager, or MDS data on a single +.. tip:: Hosting an OSD with monitor, manager, or MDS data on a single drive is also **NOT** a good idea. .. tip:: With spinning disks, the SATA and SAS interface increasingly - becomes a bottleneck at larger capacities. See also the `Storage Networking + becomes a bottleneck at larger capacities. See also the `Storage Networking Industry Association's Total Cost of Ownership calculator`_. @@ -219,7 +219,7 @@ Solid State Drives ------------------ Ceph performance is much improved when using solid-state drives (SSDs). This -reduces random access time and reduces latency while increasing throughput. +reduces random access time and reduces latency while increasing throughput. SSDs cost more per gigabyte than do HDDs but SSDs often offer access times that are, at a minimum, 100 times faster than HDDs. @@ -236,10 +236,10 @@ to many of the limitations of HDDs. SSDs do have significant limitations though. When evaluating SSDs, it is important to consider the performance of sequential and random reads and writes. -.. important:: We recommend exploring the use of SSDs to improve performance. +.. important:: We recommend exploring the use of SSDs to improve performance. However, before making a significant investment in SSDs, we **strongly recommend** reviewing the performance metrics of an SSD and testing the - SSD in a test configuration in order to gauge performance. + SSD in a test configuration in order to gauge performance. Relatively inexpensive SSDs may appeal to your sense of economy. Use caution. Acceptable IOPS are not the only factor to consider when selecting SSDs for @@ -317,7 +317,7 @@ An HBA-free system may also cost hundreds of US dollars less every year if one purchases an annual maintenance contract or extended warranty. .. tip:: The `Ceph blog`_ is often an excellent source of information on Ceph - performance issues. See `Ceph Write Throughput 1`_ and `Ceph Write + performance issues. See `Ceph Write Throughput 1`_ and `Ceph Write Throughput 2`_ for additional details. @@ -490,7 +490,7 @@ The faster that a placement group (PG) can recover from a degraded state to an ``active + clean`` state, the better. Notably, fast recovery minimizes the likelihood of multiple, overlapping failures that can cause data to become temporarily unavailable or even lost. Of course, when provisioning your -network, you will have to balance price against performance. +network, you will have to balance price against performance. Some deployment tools employ VLANs to make hardware and network cabling more manageable. VLANs that use the 802.1q protocol require VLAN-capable NICs and @@ -520,7 +520,7 @@ carefully consider before deploying a large scale data cluster. Additionally BMCs as of 2023 rarely sport network connections faster than 1 Gb/s, so dedicated and inexpensive 1 Gb/s switches for BMC administrative traffic may reduce costs by wasting fewer expenive ports on faster host switches. - + Failure Domains ===============