It may take tens of seconds to restart each time, so 5 in 30s does not stop
the crash on startup respawn loop in many cases. In particular, we'd like
to catch the case where the internal heartbeats fail.
This should be enough for all but the most sluggish of OSDs and capture
many cases of failure shortly after startup.
Fixes: #11798
Signed-off-by: Sage Weil <sage@redhat.com>
If ceph-mon segfault, socket file isn't removed.
By adding a remove in post-stop, upstart clean run directory properly.
Signed-off-by: Guilhem Lettron <guilhem@lettron.fr>
Use a 'ceph-mds' or 'ceph-mon' event to start instances instead of
explicitly calling start. This avoids the ugly is-this-already-running
check. [Thanks Guilhem Lettron for that!]
Make the -all job abstract (which means it stays started and can be
stopped). Trigger a helper task (-all-starter) to trigger instance
start. Make instances stop with the -all task. This allows you to do
start ceph-mds-all
stop ceph-mds-all
start ceph-mds id=foo
start ceph-mds-all
stop ceph-mds id=bar
stop ceph-mds-all
but not
start ceph-mds id=foo
stop ceph-mds-all
because ceph-mds-all isn't running. Not quite as flexible in sysvinit in
that regard, but good enough for me.
Fixes: #2414
Signed-off-by: Sage Weil <sage@inktank.com>