Dear list,
I have met this when I was using radosstriper C API. My program is
roughly like this:
rados_striper_aio_write
rados_aio_flush
rados_aio_wait_for_safe
rados_aio_release
rados_striper_destroy
rados_ioctx_destroy
rados_shutdown /Hangs here/
In most time, this works well, But the programm occasionally
hangs forever. Output of gstack:
Thread 1 (Thread 0x7fe0afba0760 (LWP 18509)):
0 0x000000330f20822d in pthread_join () from /lib64/libpthread.so.0
1 0x000000347566cea2 in Thread::join(void**) () from
/usr/lib64/librados.so.2
2 0x00000034755ac535 in librados::RadosClient::shutdown() () from
/usr/lib64/librados.so.2
3 0x0000003475592269 in rados_shutdown () from /usr/lib64/librados.so.2
4 0x0000000000402349 in main ()
Thread 4 (Thread 0x7fe0ab14d700 (LWP 18541)):
0 0x000000330f20e264 in __lll_lock_wait () from /lib64/libpthread.so.0
1 0x000000330f209508 in _L_lock_854 () from /lib64/libpthread.so.0
2 0x000000330f2093d7 in pthread_mutex_lock () from
/lib64/libpthread.so.0
3 0x0000003475633af1 in Mutex::Lock(bool) () from
/usr/lib64/librados.so.2
4 0x00000034755abd37 in librados::RadosClient::put() () from
/usr/lib64/librados.so.2
5 0x0000003475592501 in librados::Rados::shutdown() () from
/usr/lib64/librados.so.2
6 0x00007fe0afbba9f7 in
libradosstriper::RadosStriperImpl::CompletionData::~CompletionData() ()
from /usr/lib64/libradosstriper.so.1
7 0x00007fe0afbbaad9 in
libradosstriper::RadosStriperImpl::WriteCompletionData::~WriteCompletionData()
() from /usr/lib64/libradosstriper.so.1
8 0x00007fe0afbc1d75 in RefCountedObject::put() () from
/usr/lib64/libradosstriper.so.1
9 0x00007fe0afbc224d in
libradosstriper::MultiAioCompletionImpl::safe_request(long) () from
/usr/lib64/libradosstriper.so.1
10 0x00000034755c5ce8 in librados::C_AioSafe::finish(int) () from
/usr/lib64/librados.so.2
11 0x00000034755a0e89 in Context::complete(int) () from
/usr/lib64/librados.so.2
12 0x000000347564d4c8 in Finisher::finisher_thread_entry() () from
/usr/lib64/librados.so.2
13 0x000000330f2079d1 in start_thread () from /lib64/libpthread.so.0
14 0x000000330eae886d in clone () from /lib64/libc.so.6
It is obvious that librados::Rados::shutdown is not a thread-safe
function here. It will hang forever. The culprit of this is when
CompletionData is released, it will first notify
"rados_aio_wait_for_safe" to continue, and CompletionData will call
put() to release other data. But if the main thread(Thread 1 here) runs
fast enough, rados_striper_destroy will be executed before other
thread(Thread 4 here)'s releasing refcnf. In this situation, main thread
runs Rados::shutdown() while other thread runs Rados::shutdown() in the
same time.
My suggestion is to let RadosStriperImpl::aio_flush to block until all
the CompletionData has been released. This makes sure other thread will
never call rados_shutdown.
If we fail to cancel the tick_event, we rely on tick() itself to clear
tick_event. I'm not quite sure how we got this wrong in the previous
commit, but this boils down to two cases:
1) shutdown() successfully cancels the event and clears tick_event. tick()
never runs. tick_event == NULL when we finish.
2) shutdown() fails to cancel the event because it has already started. In
this case tick itself is blocking (or about to block) waiting on the
rlock. When it does run it will clear tick_event itself, then see
initiazed == 0 and exit without rescheduling.
Fixes: #9873
Signed-off-by: Sage Weil <sage@redhat.com>
If we have safe_callbacks==false, the stopping flag may have changed while
we were doing our callback. Recheck it and exit to avoid a deadlock on
shutdown.
Signed-off-by: Sage Weil <sage@redhat.com>
There are issues in certain versions of apache 2.4, where the reason is
not sent back. Instead, just provide the reason explicitly.
Backport: firefly, giant
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Added stat filling helper function but only stat and lstat were updated.
This patch makes fstat use it. Crucially the fstat wasn't updating the
mode flags.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
LRC now uses Jerasure as the default EC backend. But it is actually
possible to switch to other backend like Isa using the low level
configuration. This commits Adds documents on how to specify the EC
backend in each LRC layer:
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
We are dropping the requirement for MON_CAP_R for MMonGetMap.
Reason is simple enough: clients may need to contact the monitors and
obtain the latest monmap before authenticating. This happens, for
instance, when a client calls MonClient::get_monmap_privately(). The
osd uses this function during mkfs, prior to initializing a keyring or
even so much as existing.
Fixes: #9859
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>