Ceph is a distributed storage system, so it depends upon networks to peer with
OSDs, replicate objects, recover from faults and check heartbeats. Networking
issues can cause OSD latency and flapping OSDs. See `Flapping OSDs`_ for
details.
Ensure that Ceph processes and Ceph-dependent processes are connected and/or
listening. ::
netstat -a | grep ceph
netstat -l | grep ceph
sudo netstat -p | grep ceph
Check network statistics. ::
netstat -s
Drive Configuration
-------------------
A storage drive should only support one OSD. Sequential read and sequential
write throughput can bottleneck if other processes share the drive, including
journals, operating systems, monitors, other OSDs and non-Ceph processes.
Ceph acknowledges writes *after* journaling, so fast SSDs are an attractive
option to accelerate the response time--particularly when using the ``ext4`` or
XFS filesystems. By contrast, the ``btrfs`` filesystem can write and journal
simultaneously.
..note:: Partitioning a drive does not change its total throughput or
sequential read/write limits. Running a journal in a separate partition
may help, but you should prefer a separate physical drive.
Bad Sectors / Fragmented Disk
-----------------------------
Check your disks for bad sectors and fragmentation. This can cause total throughput
to drop substantially.
Co-resident Monitors/OSDs
-------------------------
Monitors are generally light-weight processes, but they do lots of ``fsync()``,
which can interfere with other workloads, particularly if monitors run on the
same drive as your OSDs. Additionally, if you run monitors on the same host as
the OSDs, you may incur performance issues related to:
- Running an older kernel (pre-3.0)
- Running Argonaut with an old ``glibc``
- Running a kernel with no syncfs(2) syscall.
In these cases, multiple OSDs running on the same host can drag each other down
by doing lots of commits. That often leads to the bursty writes.
Co-resident Processes
---------------------
Spinning up co-resident processes such as a cloud-based solution, virtual
machines and other applications that write data to Ceph while operating on the
same hardware as OSDs can introduce significant OSD latency. Generally, we
recommend optimizing a host for use with Ceph and using other hosts for other
processes. The practice of separating Ceph operations from other applications
may help improve performance and may streamline troubleshooting and maintenance.
Logging Levels
--------------
If you turned logging levels up to track an issue and then forgot to turn
logging levels back down, the OSD may be putting a lot of logs onto the disk. If
you intend to keep logging levels high, you may consider mounting a drive to the
default path for logging (i.e., ``/var/log/ceph/$cluster-$name.log``).
Recovery Throttling
-------------------
Depending upon your configuration, Ceph may reduce recovery rates to maintain
performance or it may increase recovery rates to the point that recovery
impacts OSD performance. Check to see if the OSD is recovering.
Kernel Version
--------------
Check the kernel version you are running. Older kernels may not receive
new backports that Ceph depends upon for better performance.
Kernel Issues with SyncFS
-------------------------
Try running one OSD per host to see if performance improves. Old kernels
might not have a recent enough version of ``glibc`` to support ``syncfs(2)``.
Filesystem Issues
-----------------
Currently, we recommend deploying clusters with XFS or ext4. The btrfs
filesystem has many attractive features, but bugs in the filesystem may
lead to performance issues.
Insufficient RAM
----------------
We recommend 1GB of RAM per OSD daemon. You may notice that during normal
operations, the OSD only uses a fraction of that amount (e.g., 100-200MB).
Unused RAM makes it tempting to use the excess RAM for co-resident applications,
VMs and so forth. However, when OSDs go into recovery mode, their memory
utilization spikes. If there is no RAM available, the OSD performance will slow
considerably.
Old Requests or Slow Requests
-----------------------------
If a ``ceph-osd`` daemon is slow to respond to a request, it will generate log messages
complaining about requests that are taking too long. The warning threshold
defaults to 30 seconds, and is configurable via the ``osd op complaint time``
option. When this happens, the cluster log will receive messages.
Legacy versions of Ceph complain about 'old requests`::
osd.0 192.168.106.220:6800/18813 312 : [WRN] old request osd_op(client.5099.0:790 fatty_26485_object789 [write 0~4096] 2.5e54f643) v4 received at 2012-03-06 15:42:56.054801 currently waiting for sub ops
New versions of Ceph complain about 'slow requests`::
{date} {osd.num} [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.005692 secs
{date} {osd.num} [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
Possible causes include:
- A bad drive (check ``dmesg`` output)
- A bug in the kernel file system bug (check ``dmesg`` output)
- An overloaded cluster (check system load, iostat, etc.)
- A bug in the ``ceph-osd`` daemon.
Possible solutions
- Remove VMs Cloud Solutions from Ceph Hosts
- Upgrade Kernel
- Upgrade Ceph
- Restart OSDs
Flapping OSDs
=============
We recommend using both a public (front-end) network and a cluster (back-end)
network so that you can better meet the capacity requirements of object replication. Another
advantage is that you can run a cluster network such that it isn't connected to
the internet, thereby preventing some denial of service attacks. When OSDs peer
and check heartbeats, they use the cluster (back-end) network when it's available.
See `Monitor/OSD Interaction`_ for details.
However, if the cluster (back-end) network fails or develops significant latency
while the public (front-end) network operates optimally, OSDs currently do not
handle this situation well. What happens is that OSDs mark each other ``down``
on the monitor, while marking themselves ``up``. We call this scenario 'flapping`.
If something is causing OSDs to 'flap' (repeatedly getting marked ``down`` and then
``up`` again), you can force the monitors to stop the flapping with::
ceph osd set noup # prevent osds from getting marked up
ceph osd set nodown # prevent osds from getting marked down
These flags are recorded in the osdmap structure::
ceph osd dump | grep flags
flags no-up,no-down
You can clear the flags with::
ceph osd unset noup
ceph osd unset nodown
Two other flags are supported, ``noin`` and ``noout``, which prevent
booting OSDs from being marked ``in`` (allocated data) or down
ceph-osds from eventually being marked ``out`` (regardless of what the
current value for ``mon osd down out interval`` is).
..note::``noup``, ``noout``, and ``nodown`` are temporary in the
sense that once the flags are cleared, the action they were blocking
should occur shortly after. The ``noin`` flag, on the other hand,
prevents OSDs from being marked ``in`` on boot, and any daemons that
started while the flag was set will remain that way.