Commit Graph

22596 Commits

Author SHA1 Message Date
Dan Mick
7021f1a27b test_librbd_fsx: consume saved-image files as test runs
Avoid consuming lots of disk space by holding only as many file
copies as needed (compare the n-2 file as we make clone n).
2012-11-21 08:34:47 -08:00
Sage Weil
b35e37fb73 osdc/Striper: fix handling for sparse reads in add_partial_sparse_result()
If bl_map begins *after* the first item in buffer_extents, we want to
skip only the first buffer extent before doing 'continue' to loop to the
next one.

This fixes a crash caused by underflow with a pattern like:

2012-11-20 13:54:30.347861 7f9404ed6700 10 striper add_partial_sparse_result(0x1efa088) 192 covering {12288=192} (offset 2906) to [0,5286,38054,4288]
2012-11-20 13:54:30.347863 7f9404ed6700 20 striper   t 0~5286 bl has 192 off 2906
2012-11-20 13:54:30.347866 7f9404ed6700 20 striper   s gap 9382, skipping
2012-11-20 13:54:30.347867 7f9404ed6700 20 striper   s has 192, copying
2012-11-20 13:54:30.347872 7f9404ed6700 20 striper   t 9574~18446744073709547328 bl has 0 off 12480
2012-11-20 13:54:30.347874 7f9404ed6700 20 striper   s at end
2012-11-20 13:54:30.347876 7f9404ed6700 20 striper   t 38054~4288 bl has 0 off 12480
2012-11-20 13:54:30.347877 7f9404ed6700 20 striper   s at end

Dan reproduced this with

 ./test_librbd_fsx -d -W -R -p 10 -t 1 -S 4 -N 300 rbd fsx

(although I was unable to do so).

Re-fixes #3428.

Reported-and-tested-by: Dan Mick <dan.mick@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-20 16:41:28 -08:00
Noah Watkins
436baa0b47 java: add Java exception for ENOTDIR
This specialization is useful in the Hadoop CephFS shim. An lstat may
return ENOTENT or ENOTDIR or some other IOException without a
specialization. In Hadoop we convert ENOTDIR into ENOENT.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-11-20 13:55:32 -08:00
Alex Elder
700b5c0029 qa/run_xfstests.sh: drop tests 174 and 181
These tests are showing intermittent failures so we'll drop them
from the default list for the time being.

Signed-off-by: Alex Elder <elder@inktank.com>
2012-11-20 15:53:55 -06:00
John Wilkins
57c8116c44 doc: filename change to fix a link.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-11-20 13:12:35 -08:00
John Wilkins
15f77131ec doc: fixed links that broke due to new IA.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-11-20 11:21:58 -08:00
John Wilkins
394768bcfe doc: Removed "deprecated" from toctree. Confused some users.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-11-19 16:44:45 -08:00
John Wilkins
739bca159e doc: Removing old/unused images.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-11-19 16:34:04 -08:00
Sage Weil
de12ae9803 Merge branch 'next' 2012-11-19 08:04:19 -08:00
Sage Weil
f8f452f324 Merge remote-tracking branch 'gh/wip-mon-parsing' into next 2012-11-18 21:20:36 -08:00
Sage Weil
34e5f9bbfc Merge branch 'wip-mon-leaks-fix' into next 2012-11-18 14:37:22 -08:00
Sage Weil
288db95aa9 mon: shutdown async signal handler sooner
Before the mon, and lockdep, in particular.

#0  __pthread_mutex_lock (mutex=0x30) at pthread_mutex_lock.c:50
#1  0x0000000000816092 in ceph::log::Log::submit_entry (this=0x0, e=0x2f4a270) at log/Log.cc:138
#2  0x00000000007ee0f8 in handle_fatal_signal (signum=11) at global/signal_handler.cc:100
#3  <signal handler called>
#4  0x00000000008e1300 in lockdep_will_lock (name=0x959aa7 "SignalHandler::lock", id=17) at common/lockdep.cc:163
#5  0x00000000008867fc in Mutex::_will_lock (this=0x2f20428) at ./common/Mutex.h:56
#6  0x0000000000886605 in Mutex::Lock (this=0x2f20428, no_lockdep=false) at common/Mutex.cc:81
#7  0x00000000007eeb95 in SignalHandler::entry (this=0x2f20300) at global/signal_handler.cc:198
#8  0x00000000008b0bd1 in Thread::_entry_func (arg=0x2f20300) at common/Thread.cc:43
#9  0x00007f36fefd6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#10 0x00007f36fd80b6dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()

#0  0x00007f36fefd7e75 in pthread_join (threadid=139874129766144, thread_return=0x0) at pthread_join.c:89
#1  0x00000000008b11ec in Thread::join (this=0x2f20300, prval=0x0) at common/Thread.cc:130
#2  0x00000000007eeae7 in SignalHandler::shutdown (this=0x2f20300) at global/signal_handler.cc:186
#3  0x00000000007ee9cf in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:175
#4  0x00000000007eea58 in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:176
#5  0x00000000007ee643 in shutdown_async_signal_handler () at global/signal_handler.cc:324
#6  0x00000000006de9d2 in main (argc=7, argv=0x7fffbfb8a1e8) at ceph_mon.cc:439

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:34:35 -08:00
Sage Weil
45c652d772 mon/AuthMonitor: refactor assign_global_id
Move the failure logic into the caller, where we easier to do something
about it and return the right value to the caller.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:01 -08:00
Sage Weil
92d6b8e636 mon/AuthMonitor: reorder session->put()
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:01 -08:00
Sage Weil
82042adfe0 msg/Pipe: remove useless reader_joining
We set it but do not read it.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:01 -08:00
Sage Weil
c07c93e01d msg/Pipe: join previous reader threads
We may stop and then restart the reader thread.  Join previous threads
before we create new ones.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
c4caf871aa msg/DispatchQueue: fix message leak from discard_queue()
We need to drop the Message ref() here; the msgr owns one ref
independent of those from the intrusive_ptr's in the queue itself.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
183953e14b msg/SimpleMessenger: use put() on local_connection
This aids leak debugging; not much else.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
3e2eb3a16b mon: clean up Subsription xlists
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
005967d256 mon: drop con->session reference in remove_session()
This captures all callers.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
6d3afce40f mon: sessions get cleaned up before dtor
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Sage Weil
e0e9a2dab7 msg/Pipe: don't leak session_security
Make sure we free old instances of sesseion_security before we reset the
pointer.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:29:00 -08:00
Joao Eduardo Luis
d005732553 mon: Monitor: make MSG_MON_PAXOS case a bit more consistent
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
bbe2e1ad02 mon: Paxos{,Service}: finish contexts and put messages on shutdown
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
9e3ceca055 mon: Monitor: finish contexts on shutdown
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
900a0fa2d0 mon: Monitor: drop election messages if entity doesn't have enough caps
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 08:28:59 -08:00
Sage Weil
988f92a7fa mon: remove all sessions on shutdown
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-18 08:28:59 -08:00
Joao Eduardo Luis
5cf6c7e9be ceph_mon: cleanup on shutdown
Properly cleanup the throttlers, 'g_ceph_context' and the
async_singnal_handler.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-11-18 07:42:04 -08:00
Chen Baozi
68491afceb rgw: add -lresolv flags to Makefile.am
radosgw depends on libresolv since since the commit 951c6be. So we need to
add -lresolve flags, or it cannot link right library.

Signed-off-by: Chen Baozi <baozich@gmail.com>
2012-11-17 23:21:08 -08:00
Sage Weil
7903aabeb1 mon/MonClient: use thread-safe RNG for picking monitors
Avoid using shared-state rand() when picking monitors.  This way we don't
screw with library users like test_librbd_fsx that rely on srand() and
rand() being deterministic.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-17 16:30:00 -08:00
Sage Weil
f9fd0659cd Merge remote-tracking branch 'gh/wip-3431' into next 2012-11-16 21:26:30 -08:00
Sage Weil
07c831acd5 upstart: fix limit lines
Two arguments.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 17:59:08 -08:00
Sage Weil
b4a769df32 upstart: add ceph-osd-all-starter.conf
Starter helper will start all osds that appear in /var/lib/ceph/osd/*,
as we do with the mons and mdss.

This will only proceed if the 'ready' file is there, which is currently
only touched by ceph-disk-activate.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 17:28:14 -08:00
Sage Weil
ff0a44bb81 upstart: make ceph-osd-all, ceph jobs
This will let you start/stop all daemons.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 17:28:14 -08:00
Josh Durgin
48295a188f Merge branch 'next' 2012-11-16 17:14:28 -08:00
Josh Durgin
3610754a57 Makefile.am: fix LDADD for test_objectcacher_stress
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 17:14:06 -08:00
Sage Weil
e85c9e7b16 Merge branch 'wip-coverity' into next
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 17:36:34 -08:00
Sage Weil
12eb797fb8 client: fix lock leak in lazio_*() failure paths
CID 743400 (#1 of 1): Missing unlock (LOCK)
At (5): Returning without unlocking "this->client_lock._m".

CID 743399 (#1 of 1): Missing unlock (LOCK)
At (5): Returning without unlocking "this->client_lock._m".

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 17:36:16 -08:00
Josh Durgin
5909997243 Merge branch 'next' 2012-11-16 16:44:41 -08:00
Josh Durgin
78382fecaa Merge branch 'wip-oc-hang' into next
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2012-11-16 16:43:07 -08:00
Sage Weil
be11c317bf upstart: set high open file limits
The default 1024 limit is easily hit on larger clusters.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 16:19:00 -08:00
Sage Weil
25f003ba5f msg/Accepter: only close socket if >= 0
It is possible for rebind() to fail, in which case the OSD will go through
it's shutdown procedure and call stop().  This is simpler than trying to
avoid calling stop() when rebind() fails.

Fixes: #3504
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 16:10:56 -08:00
Sage Weil
30373ce872 osd: default journal size to 5GB
Signed-off-by: Sage Weil <sage@inktank.com>
2012-11-16 16:04:13 -08:00
Josh Durgin
a562518b6b librbd: take cache lock when discarding data from cache
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:27:52 -08:00
Josh Durgin
2e862f4d18 ObjectCacher: fix off-by-one error in split
This error left a completion that should have been attached
to the right BufferHead on the left BufferHead, which would
result in the completion never being called unless the buffers
were merged before it's original read completed. This would cause
a hang in any higher level waiting for a read to complete.

The existing loop went backwards (using a forward iterator),
but stopped when the iterator reached the beginning of the map,
or when a waiter belonged to the left BufferHead.

If the first list of waiters should have been moved to the right
BufferHead, it was skipped because at that point the iterator
was at the beginning of the map, which was the main condition
of the loop.

Restructure the waiters-moving loop to go forward in the map instead,
so it's harder to make an off-by-one error.

Possibly-fixes: #3286
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
fdadefe331 ObjectCacher: begin at the right place when iterating over BufferHeads
If the desired offset overlaps a BH, data.lower_bound() will return
the element after it, since it's indexed by the start of a range.

The confusingly similarly named data_lower_bound() method fixes this,
and returns the correct starting element.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
20a0c56da6 ObjectCacher: add debug function to check BufferHead consistency
This isn't called because it's potentially expensive, but calling it
in various places can help future debugging.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
5d760b776a ObjectCacher: more debugging for read completions
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
c054ad6d48 ObjectCacher: assert lock is held everywhere
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00
Josh Durgin
7570e6c894 ObjectCacher: debug read waiters
Now we can tell which ones will be called.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-11-16 15:22:38 -08:00