If bl_map begins *after* the first item in buffer_extents, we want to
skip only the first buffer extent before doing 'continue' to loop to the
next one.
This fixes a crash caused by underflow with a pattern like:
2012-11-20 13:54:30.347861 7f9404ed6700 10 striper add_partial_sparse_result(0x1efa088) 192 covering {12288=192} (offset 2906) to [0,5286,38054,4288]
2012-11-20 13:54:30.347863 7f9404ed6700 20 striper t 0~5286 bl has 192 off 2906
2012-11-20 13:54:30.347866 7f9404ed6700 20 striper s gap 9382, skipping
2012-11-20 13:54:30.347867 7f9404ed6700 20 striper s has 192, copying
2012-11-20 13:54:30.347872 7f9404ed6700 20 striper t 9574~18446744073709547328 bl has 0 off 12480
2012-11-20 13:54:30.347874 7f9404ed6700 20 striper s at end
2012-11-20 13:54:30.347876 7f9404ed6700 20 striper t 38054~4288 bl has 0 off 12480
2012-11-20 13:54:30.347877 7f9404ed6700 20 striper s at end
Dan reproduced this with
./test_librbd_fsx -d -W -R -p 10 -t 1 -S 4 -N 300 rbd fsx
(although I was unable to do so).
Re-fixes #3428.
Reported-and-tested-by: Dan Mick <dan.mick@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
These tests are showing intermittent failures so we'll drop them
from the default list for the time being.
Signed-off-by: Alex Elder <elder@inktank.com>
Before the mon, and lockdep, in particular.
#0 __pthread_mutex_lock (mutex=0x30) at pthread_mutex_lock.c:50
#1 0x0000000000816092 in ceph::log::Log::submit_entry (this=0x0, e=0x2f4a270) at log/Log.cc:138
#2 0x00000000007ee0f8 in handle_fatal_signal (signum=11) at global/signal_handler.cc:100
#3 <signal handler called>
#4 0x00000000008e1300 in lockdep_will_lock (name=0x959aa7 "SignalHandler::lock", id=17) at common/lockdep.cc:163
#5 0x00000000008867fc in Mutex::_will_lock (this=0x2f20428) at ./common/Mutex.h:56
#6 0x0000000000886605 in Mutex::Lock (this=0x2f20428, no_lockdep=false) at common/Mutex.cc:81
#7 0x00000000007eeb95 in SignalHandler::entry (this=0x2f20300) at global/signal_handler.cc:198
#8 0x00000000008b0bd1 in Thread::_entry_func (arg=0x2f20300) at common/Thread.cc:43
#9 0x00007f36fefd6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#10 0x00007f36fd80b6dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
#0 0x00007f36fefd7e75 in pthread_join (threadid=139874129766144, thread_return=0x0) at pthread_join.c:89
#1 0x00000000008b11ec in Thread::join (this=0x2f20300, prval=0x0) at common/Thread.cc:130
#2 0x00000000007eeae7 in SignalHandler::shutdown (this=0x2f20300) at global/signal_handler.cc:186
#3 0x00000000007ee9cf in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:175
#4 0x00000000007eea58 in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:176
#5 0x00000000007ee643 in shutdown_async_signal_handler () at global/signal_handler.cc:324
#6 0x00000000006de9d2 in main (argc=7, argv=0x7fffbfb8a1e8) at ceph_mon.cc:439
Signed-off-by: Sage Weil <sage@inktank.com>
Move the failure logic into the caller, where we easier to do something
about it and return the right value to the caller.
Signed-off-by: Sage Weil <sage@inktank.com>
We need to drop the Message ref() here; the msgr owns one ref
independent of those from the intrusive_ptr's in the queue itself.
Signed-off-by: Sage Weil <sage@inktank.com>
radosgw depends on libresolv since since the commit 951c6be. So we need to
add -lresolve flags, or it cannot link right library.
Signed-off-by: Chen Baozi <baozich@gmail.com>
Avoid using shared-state rand() when picking monitors. This way we don't
screw with library users like test_librbd_fsx that rely on srand() and
rand() being deterministic.
Signed-off-by: Sage Weil <sage@inktank.com>
CID 743400 (#1 of 1): Missing unlock (LOCK)
At (5): Returning without unlocking "this->client_lock._m".
CID 743399 (#1 of 1): Missing unlock (LOCK)
At (5): Returning without unlocking "this->client_lock._m".
Signed-off-by: Sage Weil <sage@inktank.com>
It is possible for rebind() to fail, in which case the OSD will go through
it's shutdown procedure and call stop(). This is simpler than trying to
avoid calling stop() when rebind() fails.
Fixes: #3504
Signed-off-by: Sage Weil <sage@inktank.com>
This error left a completion that should have been attached
to the right BufferHead on the left BufferHead, which would
result in the completion never being called unless the buffers
were merged before it's original read completed. This would cause
a hang in any higher level waiting for a read to complete.
The existing loop went backwards (using a forward iterator),
but stopped when the iterator reached the beginning of the map,
or when a waiter belonged to the left BufferHead.
If the first list of waiters should have been moved to the right
BufferHead, it was skipped because at that point the iterator
was at the beginning of the map, which was the main condition
of the loop.
Restructure the waiters-moving loop to go forward in the map instead,
so it's harder to make an off-by-one error.
Possibly-fixes: #3286
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
If the desired offset overlaps a BH, data.lower_bound() will return
the element after it, since it's indexed by the start of a range.
The confusingly similarly named data_lower_bound() method fixes this,
and returns the correct starting element.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This isn't called because it's potentially expensive, but calling it
in various places can help future debugging.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This iterator is now reset on each run through the loop,
so there's no point in incrementing it here.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Skipping these callbacks when there's a racing write or
a gap in the results causes the original reads they represent
to never be completed. If the read falls within the range
of a BufferHead, retry all waiters no matter what.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
CID 743418 (#1 of 1): Dereference before null check (REVERSE_INULL)
Null-checking "argv" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
Signed-off-by: Sage Weil <sage@inktank.com>
CID 743433 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
At (2): Non-static class member "authorize_handler_registry" is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Sage Weil <sage@inktank.com>
CID 743405 (#2 of 2): Resource leak (RESOURCE_LEAK)
At (16): Handle variable "fd" going out of scope leaks the handle.
Signed-off-by: Sage Weil <sage@inktank.com>
CID 743406 (#3 of 3): Resource leak (RESOURCE_LEAK)
At (26): Handle variable "fd" going out of scope leaks the handle.
Signed-off-by: Sage Weil <sage@inktank.com>
CID 743435 (#1 of 1): Use after free (USE_AFTER_FREE)
At (68): Passing freed pointer "rd" as an argument to function "std::basic_ostream<char, std::char_traits<char> >::operator <<(void const *)".
Signed-off-by: Sage Weil <sage@inktank.com>
Use a fake writeback handler and respond to all requests with -ENOENT.
This tests that all operations will complete, and the cache doesn't
lose waiters or callbacks.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
There is a difference in naming conventions between debian and
rpm based distributions for this library. In configure.ac we
check first for boost_thread-mt, then if it's not found check
for boost_thread. A side effect of the AC_CEHCK_LIB macro is
to add the library to the $LIBS, so the explicit -llibboost_thread
in the Makefile has been removed.
(cherry picked from commit f0c7bb3630)