From 7948e13b9d12f3da676e93347b3f18151c424488 Mon Sep 17 00:00:00 2001 From: John Wilkins Date: Thu, 28 Aug 2014 17:25:07 -0700 Subject: [PATCH] doc: Added sysctl max thread count discussion. Fixes: #6142 Signed-off-by: John Wilkins --- .../troubleshooting/troubleshooting-osd.rst | 16 ++++++++++++++++ doc/start/hardware-recommendations.rst | 10 ++++++++++ 2 files changed, 26 insertions(+) diff --git a/doc/rados/troubleshooting/troubleshooting-osd.rst b/doc/rados/troubleshooting/troubleshooting-osd.rst index 8fe25f40aeb..e67038c6875 100644 --- a/doc/rados/troubleshooting/troubleshooting-osd.rst +++ b/doc/rados/troubleshooting/troubleshooting-osd.rst @@ -134,6 +134,20 @@ If you start your cluster and an OSD won't start, check the following: actual mounts, you may have trouble starting OSDs. If you want to store the journal on a block device, you should partition your journal disk and assign one partition per OSD. + +- **Check Max Threadcount:** If you have a node with a lot of OSDs, you may be + hitting the default maximum number of threads (e.g., usually 32k), especially + during recovery. You can increase the number of threads using ``sysctl`` to + see if increasing the maximum number of threads to the maximum possible + number of threads allowed (i.e., 4194303) will help. For example:: + + sysctl -w kernel.pid_max=4194303 + + If increasing the maximum thread count resolves the issue, you can make it + permanent by including a ``kernel.pid_max`` setting in the + ``/etc/sysctl.conf`` file. For example:: + + kernel.pid_max = 4194303 - **Kernel Version:** Identify the kernel version and distribution you are using. Ceph uses some third party tools by default, which may be @@ -145,6 +159,8 @@ If you start your cluster and an OSD won't start, check the following: (if it isn't already), and try again. If it segment faults again, contact the ceph-devel email list and provide your Ceph configuration file, your monitor output and the contents of your log file(s). + + If you cannot resolve the issue and the email list isn't helpful, you may contact `Inktank`_ for support. diff --git a/doc/start/hardware-recommendations.rst b/doc/start/hardware-recommendations.rst index ffbc37a5890..da91af75fad 100644 --- a/doc/start/hardware-recommendations.rst +++ b/doc/start/hardware-recommendations.rst @@ -192,6 +192,16 @@ is up to date. See `OS Recommendations`_ for notes on ``glibc`` and ``syncfs(2)`` to ensure that your hardware performs as expected when running multiple OSDs per host. +Hosts with high numbers of OSDs (e.g., > 20) may spawn a lot of threads, +especially during recovery and rebalancing. Many Linux kernels default to +a relatively small maximum number of threads (e.g., 32k). If you encounter +problems starting up OSDs on hosts with a high number of OSDs, consider +setting ``kernel.pid_max`` to a higher number of threads. The theoretical +maximum is 4,194,303 threads. For example, you could add the following to +the ``/etc/sysctl.conf`` file:: + + kernel.pid_max = 4194303 + Networks ========