Commit Graph

22460 Commits

Author SHA1 Message Date
Sage Weil
34e5f9bbfc Merge branch 'wip-mon-leaks-fix' into next 2012-11-18 14:37:22 -08:00
Sage Weil
288db95aa9 mon: shutdown async signal handler sooner
Before the mon, and lockdep, in particular.

#0  __pthread_mutex_lock (mutex=0x30) at pthread_mutex_lock.c:50
#1  0x0000000000816092 in ceph::log::Log::submit_entry (this=0x0, e=0x2f4a270) at log/Log.cc:138
#2  0x00000000007ee0f8 in handle_fatal_signal (signum=11) at global/signal_handler.cc:100
#3  <signal handler called>
#4  0x00000000008e1300 in lockdep_will_lock (name=0x959aa7 "SignalHandler::lock", id=17) at common/lockdep.cc:163
#5  0x00000000008867fc in Mutex::_will_lock (this=0x2f20428) at ./common/Mutex.h:56
#6  0x0000000000886605 in Mutex::Lock (this=0x2f20428, no_lockdep=false) at common/Mutex.cc:81
#7  0x00000000007eeb95 in SignalHandler::entry (this=0x2f20300) at global/signal_handler.cc:198
#8  0x00000000008b0bd1 in Thread::_entry_func (arg=0x2f20300) at common/Thread.cc:43
#9  0x00007f36fefd6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#10 0x00007f36fd80b6dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()

#0  0x00007f36fefd7e75 in pthread_join (threadid=139874129766144, thread_return=0x0) at pthread_join.c:89
#1  0x00000000008b11ec in Thread::join (this=0x2f20300, prval=0x0) at common/Thread.cc:130
#2  0x00000000007eeae7 in SignalHandler::shutdown (this=0x2f20300) at global/signal_handler.cc:186
#3  0x00000000007ee9cf in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:175
#4  0x00000000007eea58 in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:176
#5  0x00000000007ee643 in shutdown_async_signal_handler () at global/signal_handler.cc:324
#6  0x00000000006de9d2 in main (argc=7, argv=0x7fffbfb8a1e8) at ceph_mon.cc:439

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:34:35 -08:00
Sage Weil
45c652d772 mon/AuthMonitor: refactor assign_global_id
Move the failure logic into the caller, where we easier to do something
about it and return the right value to the caller.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:01 -08:00
Sage Weil
92d6b8e636 mon/AuthMonitor: reorder session->put()
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:01 -08:00
Sage Weil
82042adfe0 msg/Pipe: remove useless reader_joining
We set it but do not read it.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:01 -08:00
Sage Weil
c07c93e01d msg/Pipe: join previous reader threads
We may stop and then restart the reader thread.  Join previous threads
before we create new ones.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
c4caf871aa msg/DispatchQueue: fix message leak from discard_queue()
We need to drop the Message ref() here; the msgr owns one ref
independent of those from the intrusive_ptr's in the queue itself.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
183953e14b msg/SimpleMessenger: use put() on local_connection
This aids leak debugging; not much else.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
3e2eb3a16b mon: clean up Subsription xlists
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
005967d256 mon: drop con->session reference in remove_session()
This captures all callers.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
6d3afce40f mon: sessions get cleaned up before dtor
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
e0e9a2dab7 msg/Pipe: don't leak session_security
Make sure we free old instances of sesseion_security before we reset the
pointer.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Joao Eduardo Luis
d005732553 mon: Monitor: make MSG_MON_PAXOS case a bit more consistent
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
bbe2e1ad02 mon: Paxos{,Service}: finish contexts and put messages on shutdown
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
9e3ceca055 mon: Monitor: finish contexts on shutdown
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
900a0fa2d0 mon: Monitor: drop election messages if entity doesn't have enough caps
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Sage Weil
988f92a7fa mon: remove all sessions on shutdown
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
5cf6c7e9be ceph_mon: cleanup on shutdown
Properly cleanup the throttlers, 'g_ceph_context' and the
async_singnal_handler.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 07:42:04 -08:00
Chen Baozi
68491afceb rgw: add -lresolv flags to Makefile.am
radosgw depends on libresolv since since the commit 951c6be. So we need to
add -lresolve flags, or it cannot link right library.

Signed-off-by: Chen Baozi <baozich@gmail.com>
2012-11-17 23:21:08 -08:00
Sage Weil
7903aabeb1 mon/MonClient: use thread-safe RNG for picking monitors
Avoid using shared-state rand() when picking monitors.  This way we don't
screw with library users like test_librbd_fsx that rely on srand() and
rand() being deterministic.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-17 16:30:00 -08:00
Sage Weil
f9fd0659cd Merge remote-tracking branch 'gh/wip-3431' into next 2012-11-16 21:26:30 -08:00
Josh Durgin
3610754a57 Makefile.am: fix LDADD for test_objectcacher_stress
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 17:14:06 -08:00
Sage Weil
e85c9e7b16 Merge branch 'wip-coverity' into next
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 17:36:34 -08:00
Sage Weil
12eb797fb8 client: fix lock leak in lazio_*() failure paths
CID 743400 (#1 of 1): Missing unlock (LOCK)
At (5): Returning without unlocking "this->client_lock._m".

CID 743399 (#1 of 1): Missing unlock (LOCK)
At (5): Returning without unlocking "this->client_lock._m".

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 17:36:16 -08:00
Josh Durgin
78382fecaa Merge branch 'wip-oc-hang' into next
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2012-11-16 16:43:07 -08:00
Sage Weil
be11c317bf upstart: set high open file limits
The default 1024 limit is easily hit on larger clusters.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 16:19:00 -08:00
Sage Weil
25f003ba5f msg/Accepter: only close socket if >= 0
It is possible for rebind() to fail, in which case the OSD will go through
it's shutdown procedure and call stop().  This is simpler than trying to
avoid calling stop() when rebind() fails.

Fixes: #3504
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 16:10:56 -08:00
Sage Weil
30373ce872 osd: default journal size to 5GB
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 16:04:13 -08:00
Josh Durgin
a562518b6b librbd: take cache lock when discarding data from cache
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:27:52 -08:00
Josh Durgin
2e862f4d18 ObjectCacher: fix off-by-one error in split
This error left a completion that should have been attached
to the right BufferHead on the left BufferHead, which would
result in the completion never being called unless the buffers
were merged before it's original read completed. This would cause
a hang in any higher level waiting for a read to complete.

The existing loop went backwards (using a forward iterator),
but stopped when the iterator reached the beginning of the map,
or when a waiter belonged to the left BufferHead.

If the first list of waiters should have been moved to the right
BufferHead, it was skipped because at that point the iterator
was at the beginning of the map, which was the main condition
of the loop.

Restructure the waiters-moving loop to go forward in the map instead,
so it's harder to make an off-by-one error.

Possibly-fixes: #3286
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
fdadefe331 ObjectCacher: begin at the right place when iterating over BufferHeads
If the desired offset overlaps a BH, data.lower_bound() will return
the element after it, since it's indexed by the start of a range.

The confusingly similarly named data_lower_bound() method fixes this,
and returns the correct starting element.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
20a0c56da6 ObjectCacher: add debug function to check BufferHead consistency
This isn't called because it's potentially expensive, but calling it
in various places can help future debugging.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
5d760b776a ObjectCacher: more debugging for read completions
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
c054ad6d48 ObjectCacher: assert lock is held everywhere
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
7570e6c894 ObjectCacher: debug read waiters
Now we can tell which ones will be called.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
8c961610a6 ObjectCacher: don't needlessly increment iterator
This iterator is now reset on each run through the loop,
so there's no point in incrementing it here.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
b948e4c91f ObjectCacher: retry reads when they are incomplete
Skipping these callbacks when there's a racing write or
a gap in the results causes the original reads they represent
to never be completed. If the read falls within the range
of a BufferHead, retry all waiters no matter what.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:27 -08:00
Sage Weil
8da6ddeaec common/ceph_argparse: fix malloc failure check
CID 743418 (#1 of 1): Dereference before null check (REVERSE_INULL)
Null-checking "argv" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:19:25 -08:00
Sage Weil
e82ca0d5e0 mon/MonClient: initialize ptr in ctor
CID 743433 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
At (2): Non-static class member "authorize_handler_registry" is not initialized in this constructor nor in any functions that it calls.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:18:21 -08:00
Sage Weil
8f1f36d5a2 os/FileStore: fix fd leak in _rmattr
CID 743405 (#2 of 2): Resource leak (RESOURCE_LEAK)
At (16): Handle variable "fd" going out of scope leaks the handle.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:15:23 -08:00
Sage Weil
426b58da29 os/FileStore: fix fd leaks in _setattrs
CID 743406 (#3 of 3): Resource leak (RESOURCE_LEAK)
At (26): Handle variable "fd" going out of scope leaks the handle.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:14:17 -08:00
Sage Weil
1df38fd73d osdc/ObjectCacher: faux use-after-free
CID 743435 (#1 of 1): Use after free (USE_AFTER_FREE)
At (68): Passing freed pointer "rd" as an argument to function "std::basic_ostream<char, std::char_traits<char> >::operator <<(void const *)".

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 14:11:05 -08:00
Josh Durgin
9a10ebb2ef test: add ObjectCacher stress test that does not use a cluster
Use a fake writeback handler and respond to all requests with -ENOENT.
This tests that all operations will complete, and the cache doesn't
lose waiters or callbacks.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 13:23:26 -08:00
Josh Durgin
fd928b9b71 ObjectCacher: more debugging for BufferHeads
This is useful for checking for lost waiters.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 12:11:01 -08:00
Gary Lowell
8b187bd8ca build: update for boost_thread library.
There is a difference in naming conventions between debian and
rpm based distributions for this library.  In configure.ac we
check first for boost_thread-mt, then if it's not found check
for boost_thread.  A side effect of the AC_CEHCK_LIB macro is
to add the library to the $LIBS, so the explicit -llibboost_thread
in the Makefile has been removed.
(cherry picked from commit f0c7bb3630)
2012-11-16 10:51:42 -08:00
Sage Weil
71cfaf1cc5 os/FileStore: only try BTRFS_IOC_SUBVOL_CREATE on btrfs
Only try to create a btrfs subvolume if the fs is btrfs.  Otherwise, just
create a directory.  Then we can error out on *any* ioctl error, and not
rely on the ioctl error code to determine if we failed because we are on
a non-btrfs or a real error.

Fixes: #3052
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-11-15 16:50:55 -08:00
Samuel Just
918c58c85e PrioritizedQueue: remove internal lock, not used
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-11-15 16:02:34 -08:00
Samuel Just
b53e06cac8 DispatchQueue: lock DispatchQueue when for get_queue_len()
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-11-15 15:59:18 -08:00
Alex Elder
659d4c25b2 run_xfstests.sh: activate more tests that now work
I've gone through the set of xfstests that were previously found to
not work.  Some of those now do work, and with the addition of an
option to pass to "mkfs.xfs" a large number of other tests now
produce expected output as well.

This patch updates the default list of tests to run to reflect
the result of this exercise.  The following 50 additional tests
are now run by default:

    029 074 078 084-087 100 105 117 121 124 126 129-134
    164 165 167 174 181 184 186 187 192 214-216 227 236
    237 241 243 245-249 257-259 261 277 278 280 285 286

Test 127 completed without error, but it took from 1-3 hours so I
kept that out of the list.

Signed-off-by: Alex Elder <elder@inktank.com>
2012-11-15 17:51:34 -06:00
Sage Weil
b40387de23 msg/Pipe: fix leak of Authorizer
Reported-by: Joao Luis <joao.luis@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-15 10:06:07 -08:00