Once upon a time, we configured our init systems to only restart an OSD 3 times
in a 30 minute period. This made sure a permanently-slow OSD would stay dead,
and that an OSD which was dying on boot (but only after a long boot process)
would not insist on rejoining the cluster for *too* long.
In 62084375fa, Boris applied these same rules to
systemd in a great bid for init system consistency. Hurray!
Sadly, Loic discovered that the great dragons udev and ceph-disk were
susceptible to races under systemd (that we apparently didn't see with
the other init systems?), and our 3x start limit was preventing the
system from sorting them out. In b3887379d6
he configured the system to allow *30* restarts in 30 minutes, but no more
frequently than every 20 seconds.
So that resolved the race issue, which was far more immediately annoying
than any concern about OSDs sometimes taking too long to die. But I've started
hearing in-person reports about OSDs not failing hard and fast when they go bad,
and I attribute some of those reports to these init system differences.
Happily, we no longer rely on udev and ceph-disk, and ceph-volume shouldn't
be susceptible to the same race, so I think we can just go back to the old way.
Partly-reverts: b3887379d6
Partly-fixes: http://tracker.ceph.com/issues/24368
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
this change also fix the EnvironmentFile specified in rbdmap.service.
without this change EnvironmentFile in rbdmap.service is always
/etc/sysconfig/ceph even on debian derived distros. after this change,
this variable is /etc/default/ceph in rbdmap.service shipped by the deb
packages.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Allow all daemons drop privilege themselves, instead of letting
systemd do it.
Among other things, this means that admins can conditionally not
drop prives by setting
setuser match path = /var/lib/ceph/$type/$cluster-$id
in their ceph.conf to ease the pain of upgrade.
Signed-off-by: Sage Weil <sage@redhat.com>
Reviewed-by: Boris Ranto <branto@redhat.com>
Specify the nofile ulimit in one standard place, where everyone expects it
to be. Drop it from the ceph-osd unit file.
Leave upstart and sysvinit untouched for the time being to avoid compat
issues.
Signed-off-by: Sage Weil <sage@redhat.com>
Before this patch, the command 'logrotate -f /etc/logrotate.d/ceph'
was generating an error "Failed to reload ceph.target: Job type reload is not
applicable for unit ceph.target".
Before we issue systemctl reload, check that there is at least
one active ceph-* service. (The hyphen is significant.)
Since we use grep, make the grep package a dependency.
http://tracker.ceph.com/issues/12173Fixes: #12173
Signed-off-by: Tim Serong <tserong@suse.com>
Signed-off-by: Lars Marowsky-Bree <lmb@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
The libexec path is different for different distributions.
systemd. This path is defined by a new variable on the
configure path.
This variable can be set with enviroment SYSTEMD_LIBEXEC_DIR.
The parameter --with-systemd-libexec-dir overrides the enviroment
variable.
Appropriate conditionals are set for SUSE and RHEL derivatives.
This is then used to template out systemd/ceph-osd@.service
Signed-off-by: Owen Synge <osynge@suse.com>