ceph/doc/architecture.rst

======================
 Ceph Product Overview
======================

======================About this Document======================
This document describes the features and benefits of using the Ceph Unified Distributed Storage System, and why it is superior to other  systems.  
The audience for this document consists of sales and marketing personnel, new customers, and all persons who need to get a basic overview of the features and functionality of the system.
======================Introduction to Ceph======================
Ceph is a unified, distributed  file system that operates on a large number of hosts connected by a network.  Ceph has been designed to accommodate multiple  petabytes of storage with ease.  Since file sizes and network systems are always increasing, Ceph is perfectly positioned to accommodate these new technologies with its unique, self-healing and self-replicating architecture.   Customers that need to move large amounts of metadata, such as media and entertainment companies, can greatly benefit from this product. Ceph is also dynamic;  no need to cache data like those old-fashioned client-servers!
Benefits of Using Ceph
Ceph<EFBFBD>s flexible and scalable architecture translates into cost savings for users.  Its powerful load balancing technology ensures the highest performance in terms of both speed and 
reliability.  Nodes can be added <20>on the fly<6C> with no impact to the system. In the case of node failure, the load is re-distributed with no degradation to the system. 
 Failure detection is rapid and immediately remedied by efficiently re-adding nodes that were temporarily cut off from the network.
Manageability
Ceph is easy to manage, requiring little or no system administrative intervention.  Its powerful placement algorithm and intelligent nodes manage data seamlessly across any node
configuration.  It also features multiple access methods to its object storage, block storage, and file systems.  Figure 1 displays this configuration.
<img> CephConfig.jpg.
======================RADOS======================
The Reliable Autonomic Distributed Object Store (RADOS) provides a scalable object storage management platform.  RADOS allows the Object Storage Devices (OSD) to operate autonomously 
when recovering from failures or migrating data to expand clusters.   RADOS employs existing node device intelligence to maximized scalability.
The RADOS Block Device (RBD) provides a block device interface to a Linux machine, while striping the data across multiple RADOS objects for improved performance.  
RDB is supported for Linux kernels 2.6.37 and higher.  Each RDB device contains a directory with files and information

======================RADOS GATEWAY======================
``radosgw`` is an S3-compatible RESTful HTTP service for object
storage, using RADOS storage.

The RADOS Block Device (RBD) provides a block device interface to a Linux machine.  To the user, RDB is transparent, which means that the entire Ceph system looks like a single, 
limitless hard drive that is always up and has no size limitations.  .


======================Hypervisor Support======================
RBD supports the QEMU processor emulator and the Kernel-based Virtual Machine (KVM) virtualization infrastructure for the Linux kernel.  Normally, these hypervisors would not be used 
together in a single configuration.
KVM RBD
The Linux Kernel-based Virtual Machine (KVM) RBD provides the functionality for striping data across multiple distributed RADOS objects for improved performance.  
KVM-RDB is supported for Linux kernels 2.6.37 and higher.  Each RDB device contains a directory with files and information.  
KVM employs the XEN hypervisor to manage its virtual machines.  
QEMU RBD
QEMU-RBD facilitates striping a VM block device over objects stored in the Ceph distributed object store. This provides shared block storage to facilitate VM migration between hosts. 
QEMU has its own hypervisor which interfaces with the librdb user-space library to store its virtual machines

.. _monitor:

Monitor cluster
===============

``ceph-mon`` is a lightweight daemon that provides a consensus for
distributed decisionmaking in a Ceph/RADOS cluster.

It also is the initial point of contact for new clients, and will hand
out information about the topology of the cluster, such as the
``osdmap``.

You normally run 3 ``ceph-mon`` daemons, on 3 separate physical machines,
isolated from each other; for example, in different racks or rows.

You could run just 1 instance, but that means giving up on high
availability.

You may use the same hosts for ``ceph-mon`` and other purposes.

``ceph-mon`` processes talk to each other using a Paxos_\-style
protocol. They discover each other via the ``[mon.X] mon addr`` fields
in ``ceph.conf``.

.. todo:: What about ``monmap``? Fact check.

Any decision requires the majority of the ``ceph-mon`` processes to be
healthy and communicating with each other. For this reason, you never
want an even number of ``ceph-mon``\s; there is no unambiguous majority
subgroup for an even number.

.. _Paxos: http://en.wikipedia.org/wiki/Paxos_algorithm

.. todo:: explain monmap


.. index:: RADOS, OSD, ceph-osd, object
.. _rados:

RADOS
=====

``ceph-osd`` is the storage daemon that provides the RADOS service. It
uses ``ceph-mon`` for cluster membership, services object read/write/etc
request from clients, and peers with other ``ceph-osd``\s for data
replication.

The data model is fairly simple on this level. There are multiple
named pools, and within each pool there are named objects, in a flat
namespace (no directories). Each object has both data and metadata.

The data for an object is a single, potentially big, series of
bytes. Additionally, the series may be sparse, it may have holes that
contain binary zeros, and take up no actual storage.

The metadata is an unordered set of key-value pairs. It's semantics
are completely up to the client; for example, the Ceph filesystem uses
metadata to store file owner etc.

.. todo:: Verify that metadata is unordered.

Underneath, ``ceph-osd`` stores the data on a local filesystem. We
recommend using Btrfs_, but any POSIX filesystem that has extended
attributes should work (see :ref:`xattr`).

.. _Btrfs: http://en.wikipedia.org/wiki/Btrfs

.. todo:: write about access control

.. todo:: explain osdmap

.. todo:: explain plugins ("classes")


.. index:: Ceph filesystem, Ceph Distributed File System, MDS, ceph-mds
.. _cephfs:

Ceph filesystem
===============

The Ceph filesystem service is provided by a daemon called
``ceph-mds``. It uses RADOS to store all the filesystem metadata
(directories, file ownership, access modes, etc), and directs clients
to access RADOS directly for the file contents.

The Ceph filesystem aims for POSIX compatibility, except for a few
chosen differences. See :doc:`/appendix/differences-from-posix`.

``ceph-mds`` can run as a single process, or it can be distributed out to
multiple physical machines, either for high availability or for
scalability.

For high availability, the extra ``ceph-mds`` instances can be `standby`,
ready to take over the duties of any failed ``ceph-mds`` that was
`active`. This is easy because all the data, including the journal, is
stored on RADOS. The transition is triggered automatically by
``ceph-mon``.

For scalability, multiple ``ceph-mds`` instances can be `active`, and they
will split the directory tree into subtrees (and shards of a single
busy directory), effectively balancing the load amongst all `active`
servers.

Combinations of `standby` and `active` etc are possible, for example
running 3 `active` ``ceph-mds`` instances for scaling, and one `standby`.

To control the number of `active` ``ceph-mds``\es, see
:doc:`/ops/manage/grow/mds`.

.. topic:: Status as of 2011-09:

   Multiple `active` ``ceph-mds`` operation is stable under normal
   circumstances, but some failure scenarios may still cause
   operational issues.

.. todo:: document `standby-replay`

.. todo:: mds.0 vs mds.alpha etc details


.. index:: RADOS Gateway, radosgw
.. _radosgw:

``radosgw``
===========

``radosgw`` is a FastCGI service that provides a RESTful_ HTTP API to
store objects and metadata. It layers on top of RADOS with its own
data formats, and maintains it's own user database, authentication,
access control, and so on.

.. _RESTful: http://en.wikipedia.org/wiki/RESTful


.. index:: RBD, Rados Block Device
.. _rbd:

Rados Block Device (RBD)
========================

In virtual machine scenarios, RBD is typically used via the ``rbd``
network storage driver in Qemu/KVM, where the host machine uses
``librbd`` to provide a block device service to the guest.

Alternatively, as no direct ``librbd`` support is available in Xen,
the Linux kernel can act as the RBD client and provide a real block
device on the host machine, that can then be accessed by the
virtualization. This is done with the command-line tool ``rbd`` (see
:doc:`/ops/rbd`).

The latter is also useful in non-virtualized scenarios.

Internally, RBD stripes the device image over multiple RADOS objects,
each typically located on a separate ``ceph-osd``, allowing it to perform
better than a single server could.


Client
======

.. todo:: cephfs, ceph-fuse, librados, libcephfs, librbd


.. todo:: Summarize how much Ceph trusts the client, for what parts (security vs reliability).


TODO
====

.. todo:: Example scenarios Ceph projects are/not suitable for
-												First draft of the documentation overhaul.

To build the docs, run ./admin/build-doc. To browse them, either get
them on any static website, or just run ./admin/serve-doc to serve
them quickly off of port 8080.

build-doc sets up a virtualenv to avoid needing Sphinx installed
system-wide. serve-doc needs thttpd installed.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-08-19 23:43:21 +00:00
+								======================
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:02:23 +00:00
+								 Ceph Product Overview
-												First draft of the documentation overhaul.

To build the docs, run ./admin/build-doc. To browse them, either get
them on any static website, or just run ./admin/serve-doc to serve
them quickly off of port 8080.

build-doc sets up a virtualenv to avoid needing Sphinx installed
system-wide. serve-doc needs thttpd installed.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-08-19 23:43:21 +00:00
+								======================
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:02:23 +00:00
+								======================About this Document======================
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:16:51 +00:00
+								This document describes the features and benefits of using the Ceph Unified Distributed Storage System, and why it is superior to other  systems.
 								The audience for this document consists of sales and marketing personnel, new customers, and all persons who need to get a basic overview of the features and functionality of the system.
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:02:23 +00:00
+								======================Introduction to Ceph======================
 								Ceph is a unified, distributed  file system that operates on a large number of hosts connected by a network.  Ceph has been designed to accommodate multiple  petabytes of storage with ease.  Since file sizes and network systems are always increasing, Ceph is perfectly positioned to accommodate these new technologies with its unique, self-healing and self-replicating architecture.   Customers that need to move large amounts of metadata, such as media and entertainment companies, can greatly benefit from this product. Ceph is also dynamic;  no need to cache data like those old-fashioned client-servers!
 								Benefits of Using Ceph
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:16:51 +00:00
+								Ceph<EFBFBD>s flexible and scalable architecture translates into cost savings for users.  Its powerful load balancing technology ensures the highest performance in terms of both speed and
 								reliability.  Nodes can be added <20>on the fly<6C> with no impact to the system. In the case of node failure, the load is re-distributed with no degradation to the system.
 								 Failure detection is rapid and immediately remedied by efficiently re-adding nodes that were temporarily cut off from the network.
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:02:23 +00:00
+								Manageability
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:16:51 +00:00
+								Ceph is easy to manage, requiring little or no system administrative intervention.  Its powerful placement algorithm and intelligent nodes manage data seamlessly across any node
 								configuration.  It also features multiple access methods to its object storage, block storage, and file systems.  Figure 1 displays this configuration.
 								<img> CephConfig.jpg.
 								======================RADOS======================
 								The Reliable Autonomic Distributed Object Store (RADOS) provides a scalable object storage management platform.  RADOS allows the Object Storage Devices (OSD) to operate autonomously
 								when recovering from failures or migrating data to expand clusters.   RADOS employs existing node device intelligence to maximized scalability.
 								The RADOS Block Device (RBD) provides a block device interface to a Linux machine, while striping the data across multiple RADOS objects for improved performance.
 								RDB is supported for Linux kernels 2.6.37 and higher.  Each RDB device contains a directory with files and information
-												First draft of the documentation overhaul.

To build the docs, run ./admin/build-doc. To browse them, either get
them on any static website, or just run ./admin/serve-doc to serve
them quickly off of port 8080.

build-doc sets up a virtualenv to avoid needing Sphinx installed
system-wide. serve-doc needs thttpd installed.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-08-19 23:43:21 +00:00
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:02:23 +00:00
+								======================RADOS GATEWAY======================
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								``radosgw`` is an S3-compatible RESTful HTTP service for object
 								storage, using RADOS storage.
-												First draft of the documentation overhaul.

To build the docs, run ./admin/build-doc. To browse them, either get
them on any static website, or just run ./admin/serve-doc to serve
them quickly off of port 8080.

build-doc sets up a virtualenv to avoid needing Sphinx installed
system-wide. serve-doc needs thttpd installed.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-08-19 23:43:21 +00:00
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:16:51 +00:00
+								The RADOS Block Device (RBD) provides a block device interface to a Linux machine.  To the user, RDB is transparent, which means that the entire Ceph system looks like a single,
 								limitless hard drive that is always up and has no size limitations.  .
-												First draft of the documentation overhaul.

To build the docs, run ./admin/build-doc. To browse them, either get
them on any static website, or just run ./admin/serve-doc to serve
them quickly off of port 8080.

build-doc sets up a virtualenv to avoid needing Sphinx installed
system-wide. serve-doc needs thttpd installed.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-08-19 23:43:21 +00:00
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:16:51 +00:00
+								======================Hypervisor Support======================
 								RBD supports the QEMU processor emulator and the Kernel-based Virtual Machine (KVM) virtualization infrastructure for the Linux kernel.  Normally, these hypervisors would not be used
 								together in a single configuration.
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:02:23 +00:00
+								KVM RBD
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:16:51 +00:00
+								The Linux Kernel-based Virtual Machine (KVM) RBD provides the functionality for striping data across multiple distributed RADOS objects for improved performance.
 								KVM-RDB is supported for Linux kernels 2.6.37 and higher.  Each RDB device contains a directory with files and information.
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:02:23 +00:00
+								KVM employs the XEN hypervisor to manage its virtual machines.
 								QEMU RBD
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:16:51 +00:00
+								QEMU-RBD facilitates striping a VM block device over objects stored in the Ceph distributed object store. This provides shared block storage to facilitate VM migration between hosts.
 								QEMU has its own hypervisor which interfaces with the librdb user-space library to store its virtual machines
-												rgw: handle swift PUT with incorrect etag

											
										
										
											2011-10-27 18:02:23 +00:00
-												doc: Document mkcephfs-style installation.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-06 20:19:44 +00:00
+								.. _monitor:
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								Monitor cluster
 								===============
-												First draft of the documentation overhaul.

To build the docs, run ./admin/build-doc. To browse them, either get
them on any static website, or just run ./admin/serve-doc to serve
them quickly off of port 8080.

build-doc sets up a virtualenv to avoid needing Sphinx installed
system-wide. serve-doc needs thttpd installed.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-08-19 23:43:21 +00:00
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								``ceph-mon`` is a lightweight daemon that provides a consensus for
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								distributed decisionmaking in a Ceph/RADOS cluster.
-												doc/architecture: describe lib arch, config arch

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>

											
										
										
											2011-08-31 02:48:33 +00:00
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								It also is the initial point of contact for new clients, and will hand
 								out information about the topology of the cluster, such as the
 								``osdmap``.
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								You normally run 3 ``ceph-mon`` daemons, on 3 separate physical machines,
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								isolated from each other; for example, in different racks or rows.
 								You could run just 1 instance, but that means giving up on high
 								availability.
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								You may use the same hosts for ``ceph-mon`` and other purposes.
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								``ceph-mon`` processes talk to each other using a Paxos_\-style
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								protocol. They discover each other via the ``[mon.X] mon addr`` fields
 								in ``ceph.conf``.
 								.. todo:: What about ``monmap``? Fact check.
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								Any decision requires the majority of the ``ceph-mon`` processes to be
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								healthy and communicating with each other. For this reason, you never
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								want an even number of ``ceph-mon``\s; there is no unambiguous majority
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								subgroup for an even number.
 								.. _Paxos: http://en.wikipedia.org/wiki/Paxos_algorithm
 								.. todo:: explain monmap
-												doc: Add index entries, remove glossary as it's practically replaced by the index.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-22 17:53:34 +00:00
+								.. index:: RADOS, OSD, ceph-osd, object
-												doc: Document mkcephfs-style installation.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-06 20:19:44 +00:00
+								.. _rados:
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								RADOS
 								=====
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								``ceph-osd`` is the storage daemon that provides the RADOS service. It
 								uses ``ceph-mon`` for cluster membership, services object read/write/etc
 								request from clients, and peers with other ``ceph-osd``\s for data
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								replication.
 								The data model is fairly simple on this level. There are multiple
 								named pools, and within each pool there are named objects, in a flat
 								namespace (no directories). Each object has both data and metadata.
 								The data for an object is a single, potentially big, series of
 								bytes. Additionally, the series may be sparse, it may have holes that
 								contain binary zeros, and take up no actual storage.
 								The metadata is an unordered set of key-value pairs. It's semantics
 								are completely up to the client; for example, the Ceph filesystem uses
 								metadata to store file owner etc.
 								.. todo:: Verify that metadata is unordered.
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								Underneath, ``ceph-osd`` stores the data on a local filesystem. We
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								recommend using Btrfs_, but any POSIX filesystem that has extended
 								attributes should work (see :ref:`xattr`).
 								.. _Btrfs: http://en.wikipedia.org/wiki/Btrfs
 								.. todo:: write about access control
 								.. todo:: explain osdmap
 								.. todo:: explain plugins ("classes")
-												doc: Add index entries, remove glossary as it's practically replaced by the index.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-22 17:53:34 +00:00
+								.. index:: Ceph filesystem, Ceph Distributed File System, MDS, ceph-mds
-												doc: Document mkcephfs-style installation.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-06 20:19:44 +00:00
+								.. _cephfs:
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								Ceph filesystem
 								===============
 								The Ceph filesystem service is provided by a daemon called
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								``ceph-mds``. It uses RADOS to store all the filesystem metadata
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								(directories, file ownership, access modes, etc), and directs clients
 								to access RADOS directly for the file contents.
 								The Ceph filesystem aims for POSIX compatibility, except for a few
 								chosen differences. See :doc:`/appendix/differences-from-posix`.
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								``ceph-mds`` can run as a single process, or it can be distributed out to
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								multiple physical machines, either for high availability or for
 								scalability.
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								For high availability, the extra ``ceph-mds`` instances can be `standby`,
 								ready to take over the duties of any failed ``ceph-mds`` that was
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								`active`. This is easy because all the data, including the journal, is
 								stored on RADOS. The transition is triggered automatically by
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								``ceph-mon``.
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								For scalability, multiple ``ceph-mds`` instances can be `active`, and they
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								will split the directory tree into subtrees (and shards of a single
 								busy directory), effectively balancing the load amongst all `active`
 								servers.
 								Combinations of `standby` and `active` etc are possible, for example
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								running 3 `active` ``ceph-mds`` instances for scaling, and one `standby`.
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								To control the number of `active` ``ceph-mds``\es, see
-												doc: Move ops/grow under ops/manage.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-20 22:51:16 +00:00
+								:doc:`/ops/manage/grow/mds`.
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
 								.. topic:: Status as of 2011-09:
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								   Multiple `active` ``ceph-mds`` operation is stable under normal
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								   circumstances, but some failure scenarios may still cause
 								   operational issues.
 								.. todo:: document `standby-replay`
 								.. todo:: mds.0 vs mds.alpha etc details
-												doc: Add index entries, remove glossary as it's practically replaced by the index.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-22 17:53:34 +00:00
+								.. index:: RADOS Gateway, radosgw
-												doc: Content for Getting Started with RADOS.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-21 20:43:32 +00:00
+								.. _radosgw:
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
 								``radosgw``
 								===========
 								``radosgw`` is a FastCGI service that provides a RESTful_ HTTP API to
 								store objects and metadata. It layers on top of RADOS with its own
 								data formats, and maintains it's own user database, authentication,
 								access control, and so on.
 								.. _RESTful: http://en.wikipedia.org/wiki/RESTful
-												doc: Add index entries, remove glossary as it's practically replaced by the index.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-22 17:53:34 +00:00
+								.. index:: RBD, Rados Block Device
-												doc: Content for Getting Started with cephfs and rbd.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-21 21:55:11 +00:00
+								.. _rbd:
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								Rados Block Device (RBD)
 								========================
 								In virtual machine scenarios, RBD is typically used via the ``rbd``
 								network storage driver in Qemu/KVM, where the host machine uses
 								``librbd`` to provide a block device service to the guest.
 								Alternatively, as no direct ``librbd`` support is available in Xen,
 								the Linux kernel can act as the RBD client and provide a real block
 								device on the host machine, that can then be accessed by the
 								virtualization. This is done with the command-line tool ``rbd`` (see
 								:doc:`/ops/rbd`).
 								The latter is also useful in non-virtualized scenarios.
 								Internally, RBD stripes the device image over multiple RADOS objects,
-												c* -> ceph-*

Hopefully I didn't miss too much...

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-21 23:28:43 +00:00
+								each typically located on a separate ``ceph-osd``, allowing it to perform
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
+								better than a single server could.
 								Client
 								======
-												libceph -> libcephfs

Signed-off-by: Sage Weil <sage@newdream.net>

											
										
										
											2011-09-22 22:07:19 +00:00
+								.. todo:: cephfs, ceph-fuse, librados, libcephfs, librbd
-												doc: Architecture, placeholder in install, and first appendix.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>

											
										
										
											2011-09-01 20:24:06 +00:00
 								.. todo:: Summarize how much Ceph trusts the client, for what parts (security vs reliability).
 								TODO
 								====
 								.. todo:: Example scenarios Ceph projects are/not suitable for