blk/kernel: retry forever if bdev_flock_retry is 0

retry forever if cct->_conf->bdev_flock_retry is 0.
systemd-udevd is most likely the reason why ceph-osd fails to
acquire the flock when "mkfs", because systemd-udevd probes
all block devices when the device changes in the system using
libblkid, and when systemd-udevd starts looking at the device
it takes a `LOCK_SH|LOCK_NB` lock. and it releases the lock
right after done with it. so normally, it only takes a jiffy,
see
ee0b9e721a/src/shared/lockfile-util.c (L18)
so, we just need to retry couple times before acquiring the
lock.

Fixes: https://tracker.ceph.com/issues/46124
Signed-off-by: Kefu Chai <kchai@redhat.com>
This commit is contained in:
Kefu Chai 2020-09-16 09:28:04 +08:00
parent ec7fa39624
commit 743b5bda65
2 changed files with 6 additions and 2 deletions

View File

@ -108,7 +108,7 @@ int KernelDevice::_lock()
dout(1) << __func__ << " flock busy on " << path << dendl;
if (const uint64_t max_retry =
cct->_conf.get_val<uint64_t>("bdev_flock_retry");
nr_tries++ == max_retry) {
max_retry > 0 && nr_tries++ == max_retry) {
return -EAGAIN;
}
double retry_interval =

View File

@ -4072,7 +4072,11 @@ std::vector<Option> get_global_options() {
Option("bdev_flock_retry", Option::TYPE_UINT, Option::LEVEL_ADVANCED)
.set_default(3)
.set_description("times to retry the flock"),
.set_description("times to retry the flock")
.set_long_description(
"The number of times to retry on getting the block device lock. "
"Programs such as systemd-udevd may compete with Ceph for this lock. "
"0 means 'unlimited'."),
Option("bluefs_alloc_size", Option::TYPE_SIZE, Option::LEVEL_ADVANCED)
.set_default(1_M)