crimson/os: synchronize producers with consumers in AlienStore's queues.

Some time ago we replaced the single, `boost::lockfree`-based queue
in `ThreadPool` with the in-house, lockish `ShardedWorkQueue` vector.
Unfortunately, pushing into such queue isn't synchronized with
consuming from it -- the former happens without locking the `mutex`.
As the underlying primitive behind `ShardedWorkQueue::pending` is
plain `std::deque`, it's unsafe to operate that way in multi-thread
environment. Indeed, weirdly looking crashes have been spotted at Sepia:

```
(virtualenv) rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-06-21_14:49:36-rados-master-distro-basic-smithi/6182668$ less ./remote/smithi196/log/ceph-osd.7.log.gz
...
 0# 0x000055862FD67ADF in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
 4# 0x00005586357540E4 in ceph-osd
 5# 0x00007FB22CF36B20 in /lib64/libpthread.so.0
 6# pthread_cond_timedwait in /lib64/libpthread.so.0
 7# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
 8# 0x00005586313E303B in ceph-osd
 9# 0x00007FB22CC51BA3 in /lib64/libstdc++.so.6
10# 0x00007FB22CF2C14A in /lib64/libpthread.so.0
11# clone in /lib64/libc.so.6
Fault at location: 0x18
daemon-helper: command crashed with signal 11
```

This fix introduces the synchronization to the `push_back()` method of
`ShardedWorkQueue`. The side effect is that it may stall the reactor.
Therefore, a follow-up change that switches to e.g. `boost::lockfree`
is expected.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This commit is contained in:
Radoslaw Zarzynski 2021-06-21 21:16:35 +00:00
parent a5fd875665
commit 809c5d10a3

View File

@ -90,10 +90,14 @@ public:
return work_item;
}
void stop() {
std::unique_lock lock{mutex};
stopping = true;
cond.notify_all();
}
void push_back(WorkItem* work_item) {
// XXX: oops, we can stall the reactor!
// TODO: switch to boost::lockfree.
std::unique_lock lock{mutex};
pending.push_back(work_item);
cond.notify_one();
}