ceph/doc/faq.rst

383 lines
16 KiB
ReStructuredText

============================
Frequently Asked Questions
============================
These questions have been frequently asked on the ceph-users and ceph-devel
mailing lists, the IRC channel, and on the `Ceph.com`_ blog.
.. _Ceph.com: http://ceph.com
Is Ceph Production-Quality?
===========================
Ceph's object store (RADOS) is production ready. Large-scale storage systems (i.e.,
petabytes of data) use Ceph's RESTful Object Gateway (RGW), which provides APIs
compatible with Amazon's S3 and OpenStack's Swift. Many deployments also use
the Ceph Block Device (RBD), including deployments of OpenStack and CloudStack.
`Inktank`_ provides commercial support for the Ceph object store, Object
Gateway, block devices and CephFS with running a single metadata server.
The CephFS POSIX-compliant filesystem is functionally complete and has been
evaluated by a large community of users. There are production systems using
CephFS with a single metadata server. The Ceph community is actively testing
clusters with multiple metadata servers for quality assurance. Once CephFS
passes QA muster when running with multiple metadata servers, `Inktank`_ will
provide commercial support for CephFS with multiple metadata servers, too.
.. _Inktank: http://inktank.com
What Kind of Hardware Does Ceph Require?
========================================
Ceph runs on commodity hardware. A typical configuration involves a
rack mountable server with a baseboard management controller, multiple
processors, multiple drives, and multiple NICs. There are no requirements for
proprietary hardware. For details, see `Ceph Hardware Recommendations`_.
What Kind of OS Does Ceph Require?
==================================
Ceph runs on Linux for both the client and server side.
Ceph runs on Debian/Ubuntu distributions, which you can install from `APT
packages`_.
Ceph also runs on Fedora and Enterprise Linux derivates (RHEL, CentOS) using
`RPM packages`_ .
You can also download Ceph source `tarballs`_ and build Ceph for your
distribution. See `Installation`_ for details.
.. _try-ceph:
How Can I Give Ceph a Try?
==========================
Follow our `Quick Start`_ guides. They will get you up an running quickly
without requiring deeper knowledge of Ceph. Our `Quick Start`_ guides will also
help you avoid a few issues related to limited deployments. If you choose to
stray from the Quick Starts, there are a few things you need to know.
We recommend using at least two hosts, and a recent Linux kernel. In older
kernels, Ceph can deadlock if you try to mount CephFS or RBD client services on
the same host that runs your test Ceph cluster. This is not a Ceph-related
issue. It's related to memory pressure and needing to relieve free memory.
Recent kernels with up-to-date ``glibc`` and ``syncfs(2)`` reduce this issue
considerably. However, a memory pool large enough to handle incoming requests is
the only thing that guarantees against the deadlock occuring. When you run Ceph
clients on a Ceph cluster machine, loopback NFS can experience a similar problem
related to buffer cache management in the kernel. You can avoid these scenarios
entirely by using a separate client host, which is more realistic for deployment
scenarios anyway.
We recommend using at least two OSDs with at least two replicas of the data.
OSDs report other OSDs to the monitor, and also interact with other OSDs when
replicating data. If you have only one OSD, a second OSD cannot check its
heartbeat. Also, if an OSD expects another OSD to tell it which placement groups
it should have, the lack of another OSD prevents this from occurring. So a
placement group can remain stuck "stale" forever. These are not likely
production issues.
Finally, `Quick Start`_ guides are a way to get you up and running quickly. To
build performant systems, you'll need a drive for each OSD, and you will likely
benefit by writing the OSD journal to a separate drive from the OSD data.
How Many OSDs Can I Run per Host?
=================================
Theoretically, a host can run as many OSDs as the hardware can support. Many
vendors market storage hosts that have large numbers of drives (e.g., 36 drives)
capable of supporting many OSDs. We don't recommend a huge number of OSDs per
host though. Ceph was designed to distribute the load across what we call
"failure domains." See `CRUSH Maps`_ for details.
At the petabyte scale, hardware failure is an expectation, not a freak
occurrence. Failure domains include datacenters, rooms, rows, racks, and network
switches. In a single host, power supplies, motherboards, NICs, and drives are
all potential points of failure.
If you place a large percentage of your OSDs on a single host and that host
fails, a large percentage of your OSDs will fail too. Having too large a
percentage of a cluster's OSDs on a single host can cause disruptive data
migration and long recovery times during host failures. We encourage
diversifying the risk across failure domains, and that includes making
reasonable tradeoffs regarding the number of OSDs per host.
Can I Use the Same Drive for Multiple OSDs?
===========================================
Yes. **Please don't do this!** Except for initial evaluations of Ceph, we do not
recommend running multiple OSDs on the same drive. In fact, we recommend
**exactly** the opposite. Only run one OSD per drive. For better performance,
run journals on a separate drive from the OSD drive, and consider using SSDs for
journals. Run operating systems on a separate drive from any drive storing data
for Ceph.
Storage drives are a performance bottleneck. Total throughput is an important
consideration. Sequential reads and writes are important considerations too.
When you run multiple OSDs per drive, you split up the total throughput between
competing OSDs, which can slow performance considerably.
Why Do You Recommend One Drive Per OSD?
=======================================
Ceph OSD performance is one of the most common requests for assistance, and
running an OS, a journal and an OSD on the same disk is a frequently the
impediment to high performance. Total throughput and simultaneous reads and
writes are a major bottleneck. If you journal data, run an OS, or run multiple
OSDs on the same drive, you will very likely see performance degrade
significantly--especially under high loads.
Running multiple OSDs on a single drive is fine for evaluation purposes. We
even encourage that in our `5-minute quick start`_. However, just because it
works does NOT mean that it will provide acceptable performance in an
operational cluster.
What Underlying Filesystem Do You Recommend?
============================================
Currently, we recommend using XFS as the underlying filesystem for OSD drives.
We think ``btrfs`` will become the optimal filesystem. However, we still
encounter enough issues that we do not recommend it for production systems yet.
See `Filesystem Recommendations`_ for details.
How Does Ceph Ensure Data Integrity Across Replicas?
====================================================
Ceph periodically scrubs placement groups to ensure that they contain the same
information. Low-level or deep scrubbing reads the object data in each replica
of the placement group to ensure that the data is identical across replicas.
How Many NICs Per Host?
=======================
You can use one :abbr:`NIC (Network Interface Card)` per machine. We recommend a
minimum of two NICs: one for a public (front-side) network and one for a cluster
(back-side) network. When you write an object from the client to the primary
OSD, that single write only accounts for the bandwidth consumed during one leg
of the transaction. If you store multiple copies (usually 2-3 copies in a
typical cluster), the primary OSD makes a write request to your secondary and
tertiary OSDs. So your back-end network traffic can dwarf your front-end network
traffic on writes very easily.
What Kind of Network Throughput Do I Need?
==========================================
Network throughput requirements depend on your load. We recommend starting with
a minimum of 1GB Ethernet. 10GB Ethernet is more expensive, but often comes with
some additional advantages, including virtual LANs (VLANs). VLANs can
dramatically reduce the cabling requirements when you run front-side, back-side
and other special purpose networks.
The number of object copies (replicas) you create is an important factor,
because replication becomes a larger network load than the initial write itself
when making multiple copies (e.g., triplicate). Network traffic between Ceph and
a cloud-based system such as OpenStack or CloudStack may also become a factor.
Some deployments even run a separate NIC for management APIs.
Finally load spikes are a factor too. Certain times of the day, week or month
you may see load spikes. You must plan your network capacity to meet those load
spikes in order for Ceph to perform well. This means that excess capacity may
remain idle or unused during low load times.
Can Ceph Support Multiple Data Centers?
=======================================
Yes, but with safeguards to ensure data safety. When a client writes data to
Ceph the primary OSD will not acknowledge the write to the client until the
secondary OSDs have written the replicas synchronously. See `How Ceph Scales`_
for details.
The Ceph community is working to ensure that OSD/monitor heartbeats and peering
processes operate effectively with the additional latency that may occur when
deploying hardware in different geographic locations. See `Monitor/OSD
Interaction`_ for details.
If your data centers have dedicated bandwidth and low latency, you can
distribute your cluster across data centers easily. If you use a WAN over the
Internet, you may need to configure Ceph to ensure effective peering, heartbeat
acknowledgement and writes to ensure the cluster performs well with additional
WAN latency.
The Ceph community is working on an asynchronous write capability via the Ceph
Object Gateway (RGW) which will provide an eventually-consistent copy of data
for disaster recovery purposes. This will work with data read and written via
the Object Gateway only. Work is also starting on a similar capability for Ceph
Block devices which are managed via the various cloudstacks.
How Does Ceph Authenticate Users?
=================================
Ceph provides an authentication framework called ``cephx`` that operates in a
manner similar to Kerberos. The principal difference is that Ceph's
authentication system is distributed too, so that it doesn't constitute a single
point of failure. For details, see `Ceph Authentication & Authorization`_.
Does Ceph Authentication Provide Multi-tenancy?
===============================================
Ceph provides authentication at the `pool`_ level, which may be sufficient
for multi-tenancy in limited cases. Ceph plans on developing authentication
namespaces within pools in future releases, so that Ceph is well-suited for
multi-tenancy within pools.
Can Ceph use other Multi-tenancy Modules?
=========================================
The Bobtail release of Ceph integrates the Object Gateway with OpenStack's Keystone.
See `Keystone Integration`_ for details.
.. _Keystone Integration: ../radosgw/config#integrating-with-openstack-keystone
Does Ceph Enforce Quotas?
=========================
Currently, Ceph doesn't provide enforced storage quotas. The Ceph community has
discussed enforcing user quotas within CephFS.
Does Ceph Track Per User Usage?
===============================
The CephFS filesystem provides user-based usage tracking on a subtree basis.
RADOS Gateway also provides detailed per-user usage tracking. RBD and the
underlying object store do not track per user statistics. The underlying object
store provides storage capacity utilization statistics.
Does Ceph Provide Billing?
==========================
Usage information is available via a RESTful API for the Ceph Object Gateway
which can be integrated into billing systems. Usage data at the RADOS pool
level is not currently possible but is on the roadmap.
Can Ceph Export a Filesystem via NFS or Samba/CIFS?
===================================================
Ceph doesn't export CephFS via NFS or Samba. However, you can use a gateway to
serve a CephFS filesystem to NFS or Samba clients.
Can I Access Ceph via a Hypervisor?
===================================
Currently, the `QEMU`_ hypervisor can interact with the Ceph `block device`_.
The :abbr:`KVM (Kernel Virtual Machine)` `module`_ and the `librbd` library
allow you to use QEMU with Ceph. Most Ceph deployments use the `librbd` library.
Cloud solutions like `OpenStack`_ and `CloudStack`_ interact `libvirt`_ and QEMU
to as a means of integrating with Ceph.
Ceph integrates cloud solutions via ``libvirt`` and QEMU. The Ceph community
is also looking to support the Xen hypervisor in a future release.
There is interest in support for VMWare, but there is no deep-level integration
between VMWare and Ceph as yet.
Can Block, CephFS, and Gateway Clients Share Data?
==================================================
For the most part, no. You cannot write data to Ceph using RBD and access the
same data via CephFS, for example. You cannot write data with RADOS gateway and
read it with RBD. However, you can write data with the RADOS Gateway
S3-compatible API and read the same data using the RADOS Gateway
Swift-comptatible API.
RBD, CephFS and the RADOS Gateway each have their own namespace. The way they
store data differs significantly enough that it isn't possible to use the
clients interchangeably. However, you can use all three types of clients, and
clients you develop yourself via ``librados`` simultaneously on the same
cluster.
Which Ceph Clients Support Striping?
====================================
Ceph clients--RBD, CephFS and RADOS Gateway--providing striping capability. For
details on striping, see `Striping`_.
What Programming Languages can Interact with the Object Store?
==============================================================
Ceph's ``librados`` is written in the C programming language. There are
interfaces for other languages, including:
- C++
- Java
- PHP
- Python
- Ruby
Can I Develop a Client With Another Language?
=============================================
Ceph does not have many native bindings for ``librados`` at this time. If you'd
like to fork Ceph and build a wrapper to the C or C++ versions of ``librados``,
please check out the `Ceph repository`_. You can also use other languages that
can use the ``librados`` native bindings (e.g., you can access the C/C++ bindings
from within Perl).
Do Ceph Clients Run on Windows?
===============================
No. There are no immediate plans to support Windows clients at this time. However,
you may be able to emulate a Linux environment on a Windows host. For example,
Cygwin may make it feasible to use ``librados`` in an emulated environment.
How can I add a question to this list?
======================================
If you'd like to add a question to this list (hopefully with an
accompanying answer!), you can find it in the doc/ directory of our
main git repository:
`https://github.com/ceph/ceph/blob/master/doc/faq.rst`_
We use Sphinx to manage our documentation, and this page is generated
from reStructuredText source. See the section on Building Ceph
Documentation for the build procedure.
.. _Ceph Hardware Recommendations: ../install/hardware-recommendations
.. _APT packages: ../install/debian
.. _RPM packages: ../install/rpm
.. _tarballs: ../install/get-tarballs
.. _Installation: ../install
.. _CRUSH Maps: ../rados/operations/crush-map
.. _5-minute quick start: ../start/quick-start
.. _How Ceph Scales: ../architecture#how-ceph-scales
.. _Monitor/OSD Interaction: ../rados/configuration/mon-osd-interaction
.. _Ceph Authentication & Authorization: ../rados/operations/auth-intro
.. _Ceph repository: https://github.com/ceph/ceph
.. _QEMU: ../rbd/qemu-rbd
.. _block device: ../rbd
.. _module: ../rbd/rbd-ko
.. _libvirt: ../rbd/libvirt
.. _OpenStack: ../rbd/rbd-openstack
.. _CloudStack: ../rbd/rbd-cloudstack
.. _pool: ../rados/operations/pools
.. _Striping: ../architecture##how-ceph-clients-stripe-data
.. _https://github.com/ceph/ceph/blob/master/doc/faq.rst: https://github.com/ceph/ceph/blob/master/doc/faq.rst
.. _Filesystem Recommendations: ../rados/configuration/filesystem-recommendations
.. _Quick Start: ../start