mirror of
https://github.com/ceph/ceph
synced 2025-01-25 12:34:46 +00:00
b3887379d6
Instead of the default 100ms pause before trying to restart an OSD, wait 20 seconds instead and retry 30 times instead of 3. There is no scenario in which restarting an OSD almost immediately after it failed would get a better result. It is possible that a failure to start is due to a race with another systemd unit at boot time. For instance if ceph-disk@.service is delayed, it may start after the OSD that needs it. A long pause may give the racing service enough time to complete and the next attempt to start the OSD may succeed. This is not a sound alternative to resolve a race, it only makes the OSD boot process less sensitive. In the example above, the proper fix is to enable --runtime ceph-osd@.service so that it cannot race at boot time. The wait delay should not be minutes to preserve the current runtime behavior. For instance, if an OSD is killed or fails and restarts after 10 minutes, it will be marked down by the ceph cluster. This is not a change that could break things but it is significant and should be avoided. Refs: http://tracker.ceph.com/issues/17889 Signed-off-by: Loic Dachary <loic@dachary.org> |
||
---|---|---|
.. | ||
50-ceph.preset | ||
ceph | ||
ceph-disk@.service | ||
ceph-mds.target | ||
ceph-mds@.service | ||
ceph-mgr.target | ||
ceph-mgr@.service | ||
ceph-mon.target | ||
ceph-mon@.service | ||
ceph-osd.target | ||
ceph-osd@.service | ||
ceph-radosgw.target | ||
ceph-radosgw@.service | ||
ceph-rbd-mirror.target | ||
ceph-rbd-mirror@.service | ||
ceph.target | ||
ceph.tmpfiles.d | ||
CMakeLists.txt | ||
rbdmap.service |