Commit Graph

22470 Commits

Author SHA1 Message Date
Dan Mick
0dbf6e8906 test_librbd_fsx: Add OP_FLATTEN 2012-11-21 08:34:50 -08:00
Dan Mick
7021f1a27b test_librbd_fsx: consume saved-image files as test runs
Avoid consuming lots of disk space by holding only as many file
copies as needed (compare the n-2 file as we make clone n).
2012-11-21 08:34:47 -08:00
Sage Weil
b35e37fb73 osdc/Striper: fix handling for sparse reads in add_partial_sparse_result()
If bl_map begins *after* the first item in buffer_extents, we want to
skip only the first buffer extent before doing 'continue' to loop to the
next one.

This fixes a crash caused by underflow with a pattern like:

2012-11-20 13:54:30.347861 7f9404ed6700 10 striper add_partial_sparse_result(0x1efa088) 192 covering {12288=192} (offset 2906) to [0,5286,38054,4288]
2012-11-20 13:54:30.347863 7f9404ed6700 20 striper   t 0~5286 bl has 192 off 2906
2012-11-20 13:54:30.347866 7f9404ed6700 20 striper   s gap 9382, skipping
2012-11-20 13:54:30.347867 7f9404ed6700 20 striper   s has 192, copying
2012-11-20 13:54:30.347872 7f9404ed6700 20 striper   t 9574~18446744073709547328 bl has 0 off 12480
2012-11-20 13:54:30.347874 7f9404ed6700 20 striper   s at end
2012-11-20 13:54:30.347876 7f9404ed6700 20 striper   t 38054~4288 bl has 0 off 12480
2012-11-20 13:54:30.347877 7f9404ed6700 20 striper   s at end

Dan reproduced this with

 ./test_librbd_fsx -d -W -R -p 10 -t 1 -S 4 -N 300 rbd fsx

(although I was unable to do so).

Re-fixes #3428.

Reported-and-tested-by: Dan Mick <dan.mick@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-20 16:41:28 -08:00
Alex Elder
700b5c0029 qa/run_xfstests.sh: drop tests 174 and 181
These tests are showing intermittent failures so we'll drop them
from the default list for the time being.

Signed-off-by: Alex Elder <elder@inktank.com>
2012-11-20 15:53:55 -06:00
Sage Weil
f8f452f324 Merge remote-tracking branch 'gh/wip-mon-parsing' into next 2012-11-18 21:20:36 -08:00
Sage Weil
34e5f9bbfc Merge branch 'wip-mon-leaks-fix' into next 2012-11-18 14:37:22 -08:00
Sage Weil
288db95aa9 mon: shutdown async signal handler sooner
Before the mon, and lockdep, in particular.

#0  __pthread_mutex_lock (mutex=0x30) at pthread_mutex_lock.c:50
#1  0x0000000000816092 in ceph::log::Log::submit_entry (this=0x0, e=0x2f4a270) at log/Log.cc:138
#2  0x00000000007ee0f8 in handle_fatal_signal (signum=11) at global/signal_handler.cc:100
#3  <signal handler called>
#4  0x00000000008e1300 in lockdep_will_lock (name=0x959aa7 "SignalHandler::lock", id=17) at common/lockdep.cc:163
#5  0x00000000008867fc in Mutex::_will_lock (this=0x2f20428) at ./common/Mutex.h:56
#6  0x0000000000886605 in Mutex::Lock (this=0x2f20428, no_lockdep=false) at common/Mutex.cc:81
#7  0x00000000007eeb95 in SignalHandler::entry (this=0x2f20300) at global/signal_handler.cc:198
#8  0x00000000008b0bd1 in Thread::_entry_func (arg=0x2f20300) at common/Thread.cc:43
#9  0x00007f36fefd6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#10 0x00007f36fd80b6dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()

#0  0x00007f36fefd7e75 in pthread_join (threadid=139874129766144, thread_return=0x0) at pthread_join.c:89
#1  0x00000000008b11ec in Thread::join (this=0x2f20300, prval=0x0) at common/Thread.cc:130
#2  0x00000000007eeae7 in SignalHandler::shutdown (this=0x2f20300) at global/signal_handler.cc:186
#3  0x00000000007ee9cf in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:175
#4  0x00000000007eea58 in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:176
#5  0x00000000007ee643 in shutdown_async_signal_handler () at global/signal_handler.cc:324
#6  0x00000000006de9d2 in main (argc=7, argv=0x7fffbfb8a1e8) at ceph_mon.cc:439

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:34:35 -08:00
Sage Weil
45c652d772 mon/AuthMonitor: refactor assign_global_id
Move the failure logic into the caller, where we easier to do something
about it and return the right value to the caller.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:01 -08:00
Sage Weil
92d6b8e636 mon/AuthMonitor: reorder session->put()
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:01 -08:00
Sage Weil
82042adfe0 msg/Pipe: remove useless reader_joining
We set it but do not read it.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:01 -08:00
Sage Weil
c07c93e01d msg/Pipe: join previous reader threads
We may stop and then restart the reader thread.  Join previous threads
before we create new ones.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
c4caf871aa msg/DispatchQueue: fix message leak from discard_queue()
We need to drop the Message ref() here; the msgr owns one ref
independent of those from the intrusive_ptr's in the queue itself.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
183953e14b msg/SimpleMessenger: use put() on local_connection
This aids leak debugging; not much else.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
3e2eb3a16b mon: clean up Subsription xlists
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
005967d256 mon: drop con->session reference in remove_session()
This captures all callers.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
6d3afce40f mon: sessions get cleaned up before dtor
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
e0e9a2dab7 msg/Pipe: don't leak session_security
Make sure we free old instances of sesseion_security before we reset the
pointer.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Joao Eduardo Luis
d005732553 mon: Monitor: make MSG_MON_PAXOS case a bit more consistent
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
bbe2e1ad02 mon: Paxos{,Service}: finish contexts and put messages on shutdown
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
9e3ceca055 mon: Monitor: finish contexts on shutdown
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
900a0fa2d0 mon: Monitor: drop election messages if entity doesn't have enough caps
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Sage Weil
988f92a7fa mon: remove all sessions on shutdown
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
5cf6c7e9be ceph_mon: cleanup on shutdown
Properly cleanup the throttlers, 'g_ceph_context' and the
async_singnal_handler.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 07:42:04 -08:00
Chen Baozi
68491afceb rgw: add -lresolv flags to Makefile.am
radosgw depends on libresolv since since the commit 951c6be. So we need to
add -lresolve flags, or it cannot link right library.

Signed-off-by: Chen Baozi <baozich@gmail.com>
2012-11-17 23:21:08 -08:00
Sage Weil
7903aabeb1 mon/MonClient: use thread-safe RNG for picking monitors
Avoid using shared-state rand() when picking monitors.  This way we don't
screw with library users like test_librbd_fsx that rely on srand() and
rand() being deterministic.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-17 16:30:00 -08:00
Sage Weil
f9fd0659cd Merge remote-tracking branch 'gh/wip-3431' into next 2012-11-16 21:26:30 -08:00
Josh Durgin
3610754a57 Makefile.am: fix LDADD for test_objectcacher_stress
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 17:14:06 -08:00
Sage Weil
e85c9e7b16 Merge branch 'wip-coverity' into next
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 17:36:34 -08:00
Sage Weil
12eb797fb8 client: fix lock leak in lazio_*() failure paths
CID 743400 (#1 of 1): Missing unlock (LOCK)
At (5): Returning without unlocking "this->client_lock._m".

CID 743399 (#1 of 1): Missing unlock (LOCK)
At (5): Returning without unlocking "this->client_lock._m".

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 17:36:16 -08:00
Josh Durgin
78382fecaa Merge branch 'wip-oc-hang' into next
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2012-11-16 16:43:07 -08:00
Sage Weil
be11c317bf upstart: set high open file limits
The default 1024 limit is easily hit on larger clusters.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 16:19:00 -08:00
Sage Weil
25f003ba5f msg/Accepter: only close socket if >= 0
It is possible for rebind() to fail, in which case the OSD will go through
it's shutdown procedure and call stop().  This is simpler than trying to
avoid calling stop() when rebind() fails.

Fixes: #3504
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 16:10:56 -08:00
Sage Weil
30373ce872 osd: default journal size to 5GB
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 16:04:13 -08:00
Josh Durgin
a562518b6b librbd: take cache lock when discarding data from cache
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:27:52 -08:00
Josh Durgin
2e862f4d18 ObjectCacher: fix off-by-one error in split
This error left a completion that should have been attached
to the right BufferHead on the left BufferHead, which would
result in the completion never being called unless the buffers
were merged before it's original read completed. This would cause
a hang in any higher level waiting for a read to complete.

The existing loop went backwards (using a forward iterator),
but stopped when the iterator reached the beginning of the map,
or when a waiter belonged to the left BufferHead.

If the first list of waiters should have been moved to the right
BufferHead, it was skipped because at that point the iterator
was at the beginning of the map, which was the main condition
of the loop.

Restructure the waiters-moving loop to go forward in the map instead,
so it's harder to make an off-by-one error.

Possibly-fixes: #3286
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
fdadefe331 ObjectCacher: begin at the right place when iterating over BufferHeads
If the desired offset overlaps a BH, data.lower_bound() will return
the element after it, since it's indexed by the start of a range.

The confusingly similarly named data_lower_bound() method fixes this,
and returns the correct starting element.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
20a0c56da6 ObjectCacher: add debug function to check BufferHead consistency
This isn't called because it's potentially expensive, but calling it
in various places can help future debugging.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
5d760b776a ObjectCacher: more debugging for read completions
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
c054ad6d48 ObjectCacher: assert lock is held everywhere
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
7570e6c894 ObjectCacher: debug read waiters
Now we can tell which ones will be called.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
8c961610a6 ObjectCacher: don't needlessly increment iterator
This iterator is now reset on each run through the loop,
so there's no point in incrementing it here.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
b948e4c91f ObjectCacher: retry reads when they are incomplete
Skipping these callbacks when there's a racing write or
a gap in the results causes the original reads they represent
to never be completed. If the read falls within the range
of a BufferHead, retry all waiters no matter what.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:27 -08:00
Sage Weil
8da6ddeaec common/ceph_argparse: fix malloc failure check
CID 743418 (#1 of 1): Dereference before null check (REVERSE_INULL)
Null-checking "argv" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:19:25 -08:00
Sage Weil
e82ca0d5e0 mon/MonClient: initialize ptr in ctor
CID 743433 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
At (2): Non-static class member "authorize_handler_registry" is not initialized in this constructor nor in any functions that it calls.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:18:21 -08:00
Sage Weil
8f1f36d5a2 os/FileStore: fix fd leak in _rmattr
CID 743405 (#2 of 2): Resource leak (RESOURCE_LEAK)
At (16): Handle variable "fd" going out of scope leaks the handle.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:15:23 -08:00
Sage Weil
426b58da29 os/FileStore: fix fd leaks in _setattrs
CID 743406 (#3 of 3): Resource leak (RESOURCE_LEAK)
At (26): Handle variable "fd" going out of scope leaks the handle.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:14:17 -08:00
Sage Weil
1df38fd73d osdc/ObjectCacher: faux use-after-free
CID 743435 (#1 of 1): Use after free (USE_AFTER_FREE)
At (68): Passing freed pointer "rd" as an argument to function "std::basic_ostream<char, std::char_traits<char> >::operator <<(void const *)".

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:11:05 -08:00
Josh Durgin
9a10ebb2ef test: add ObjectCacher stress test that does not use a cluster
Use a fake writeback handler and respond to all requests with -ENOENT.
This tests that all operations will complete, and the cache doesn't
lose waiters or callbacks.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 13:23:26 -08:00
Josh Durgin
fd928b9b71 ObjectCacher: more debugging for BufferHeads
This is useful for checking for lost waiters.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 12:11:01 -08:00
Gary Lowell
8b187bd8ca build: update for boost_thread library.
There is a difference in naming conventions between debian and
rpm based distributions for this library.  In configure.ac we
check first for boost_thread-mt, then if it's not found check
for boost_thread.  A side effect of the AC_CEHCK_LIB macro is
to add the library to the $LIBS, so the explicit -llibboost_thread
in the Makefile has been removed.
(cherry picked from commit f0c7bb3630)
2012-11-16 10:51:42 -08:00