Commit Graph

32950 Commits

Author SHA1 Message Date
Sage Weil
072d3711d6 RWLock: make lockdep id mutable
This allows us to keep the lock/unlock methods const, as per commit
970d53fc0f.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-10 21:36:37 -07:00
Sage Weil
da0d38208b Revert "RWLock: don't assign the lockdep id more than once"
This reverts commit 957ac3cbe3.

It's important to assign these for all operations for cases where
g_lockdep isn't yet true when the constructor runs.  This is true
for the HeartbeatMap rwlock, among other things, as that thread
is created during early startup before lockdep is enabled.  All
of the lockdep hooks assume that they can assign ids on the fly
and not tracking them here breaks things.

Conflicts:

	src/common/RWLock.h
2014-04-10 21:34:51 -07:00
Sage Weil
632098f2a6 common_init: remove dup lockdep message
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-10 21:34:03 -07:00
John Wilkins
8c38ec7a7e Merge pull request #1646 from dmick/wip-erasure-doc
doc: Wordsmith the erasure-code doc a bit.
2014-04-10 20:02:56 -07:00
Dan Mick
3c54a49e39 Wordsmith the erasure-code doc a bit
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2014-04-10 19:55:52 -07:00
Yan, Zheng
f6c20730c1 mds: finish table servers recovery after creating newfs
Fixes: #8054
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-11 09:57:29 +08:00
Sage Weil
756e36260d Merge pull request #1643 from ceph/wip-8062
mon/OSDMonitor: ignore boot message from before last up_from

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-04-10 18:25:23 -07:00
Yan, Zheng
3db7486128 mds: issue new caps before starting log entry
Locker::issue_new_caps() calls Locker::eval(), which may dispatch
other requests.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-11 08:39:55 +08:00
David Zafman
07e8ee208e test: Add EC testing to ceph_test_rados_api_aio
Fixes: #7437

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
69afc59b3e test: Add multiple write test cases to ceph_test_rados_api_aio
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
d99f1d9f68 test, librados: aio read *return_value consistency, fix ceph_test_rados_api_aio
test:
  Add set_completion*PP() functions to cast arg to correct class
  Add return_value checks
  Add some reads with buffers larger than object size
  Check buffer length on reads
librados:
  Make sure *return_value() has bytes read in all cases

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
3d290c2fa6 test: Add EC unaligned append write test to ceph_test_rados_api_io
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
39bf68c3ce pybind, test: Add python binding for append and add to test
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
d211381470 pybind: Check that "key" is a string
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
98127202c2 librados, test: Have write, append and write_full return 0 on success
Fix consistency of write, append, write_full, all return 0 on success
Include C (rados_*) variants, C++ ctx variants
and aio get_return_value() and rados_aio_get_return_value()

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
Sage Weil
4c99e978a7 mon/OSDMonitor: ignore boot message from before last up_from
It is possible we will have a dup OSDBoot message queued up in the mon
and will process it again after that osd was marked up and then down.  If
that happens, we should ignore this message, not mark the osd back in with
the same address.

Fixes: #8062
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-10 13:34:58 -07:00
Sage Weil
28371a2463 Merge pull request #1624 from ceph/wip-6789
mon: Monitor: suicide on start if mon has been removed from monmap

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-10 11:01:43 -07:00
Sage Weil
a8f0953974 osd/ReplicatedPG: adjust obc + snapset_obc locking strategy
Prevoiusly we assumed that if we had snapset_obc set, !exists on the head
and if we got the snapdir lock we were good to take the head lock too.
This is no the case when:

 - delete queued
   - takes wr lock on both head and snapdir
 - delete commits (but not yet applied)
 - stat
   - tries to take wr lock on head
     - blocks, toggles w=1 state on *head only*
 - copy-from
   - tries to take wr lock on snapdir, succeeds
   - tries to take wr lock on head, fails because w=1
     - fails the assert(got)

The problem is that the read and write paths are taking different locks
and we are expecting them to operate in synchrony.

Fix this by using the same ordering for reads as well as write: if the
snapset_obc is defined, take the read lock on that too, just as we do with
a write.

Fixes: #8046
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-10 10:55:55 -07:00
Joao Eduardo Luis
86b85947a2 mon: Monitor: suicide on start if mon has been removed from monmap
If the monitor has been marked as having been part of an existing quorum
and is no longer in the monmap, then it is safe to assume the monitor
was removed from the monmap.  In that event, do not allow the monitor
to start, as it will try to find its way into the quorum again (and
someone clearly stated they don't really want them there), unless
'mon force quorum join' is specified.

Fixes: 6789
Backport: dumpling, emperor

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2014-04-10 15:14:19 +01:00
Yan, Zheng
02048dcc30 mds: guarantee message ordering when importing non-auth caps
Current code allow importing non-auth caps when inode is being exported.
This can breaks message ordering because the corresponding cap import
messages are sent after the flush session messages. So they can arrive
at clients after clients have already received cap import messages from
new auth MDS of the inode.

The quick fix is ignore MExportCaps when inode is frozen.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-10 19:46:56 +08:00
Sage Weil
cf69bdbd74 Merge pull request #1639 from ceph/wip-multimds
Wip multimds

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-09 21:19:42 -07:00
Yan, Zheng
ac51fcac6b mds: include truncate_seq/truncate_size in filelock's state
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-10 11:13:30 +08:00
Yan, Zheng
808ba130ef mds: remove wrong assertion for remote frozen authpin
For across authority rename, the MDS first freezes the source inode's
authpin. It happens while the source dentry isn't locked. So when the
inode's authpin become frozen, the source dentry may have changed and
be linked to a different inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-10 11:13:20 +08:00
Sage Weil
860d72770c osdc/Objecter: move mapping into struct, helper
Move the common bits of Op and LingerOp into op_target_t and separate the
actual mapping calculation into calc_target().  This hugely simplifies
recal_*op_target() by mostly just shuffling all of the same logic into
that helper.

There is one functional change in this patch: recalc_linger_op() now is
aware of the tiering logic that was previously only handled in
recalc_op_target().

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 18:02:27 -07:00
Gregory Farnum
5df98f47b9 Merge pull request #1637 from ceph/wip-8042
mon: fix election required_features checks

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-04-09 17:21:57 -07:00
Sage Weil
71d97f998a Merge pull request #1636 from ceph/wip-6480
fix auth races that may have lead to qemu crashes

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-09 16:25:24 -07:00
Sage Weil
18642ed351 mon: tell peers missing features during probe
Use a new probe op to inform mons that they are missing features during
the earliest probe phase.  This prevents them from getting as far as
the sync entirely if they are too old.

We still need to refuse to speak to them if they try to call an election,
which they could do based on their replies from other peers.

Note that old clients will assert on getting a message type string they
don't understand, so we need to be careful not to send the probe reply
to older clients.  The feature bit we use is not precise in that it does
not cover recent dev releases, but it does work for dumpling and emperor.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 16:03:05 -07:00
Sage Weil
39ca440bfd mon: move required_features back into Monitor
This is simpler and cleaner.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 16:01:14 -07:00
Sage Weil
c8039ab857 mon: ignore sync clients without required_features
If we let them sync data they don't understand they will get confused
and crash.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 14:40:44 -07:00
Josh Durgin
50ed65fba3 auth: remove unused get_global_id() method
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
b297689abf auth: make AuthClientHandler::validate_ticket() protected
It's just used internally. Make it private in the subclasses since
there's just one level of inheritance.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
3ccef66276 auth: AuthClientHandler const cleanup
get_protocol(), build_request(), build_rotating_request(), and
build_authorizer() can all be declared const now.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
9af10b2c9a auth: CephxProtocol const cleanup
need_key() and build_authorizer() can be const.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
75948357ce utime: declare is_zero(), ceph_timespec(), and sleep() as const
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
3119022dd4 auth: separate writes of build_request() into prepare_build_request()
validate_tickets() updates internal state, as does
tickets.get_handler(). Move them into a new method called before
build_request() so build_request() can be declared const.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:35 -07:00
Josh Durgin
970d53fc0f RWLock: make read locking methods const
This allows methods using RWLock for reading to be declared const.
There might be cases where we'd want to take a write lock in a const
method, but right now that's unnecessary, and I'd rather get a compile
error.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:35 -07:00
Josh Durgin
957ac3cbe3 RWLock: don't assign the lockdep id more than once
This never does anything since lockdep_register() assigns an id >= 0
in the RWLock constructor. This also prevents methods from being
declared const.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:35 -07:00
Josh Durgin
4d3d89bf24 auth: remove unused tick() method
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:35 -07:00
Josh Durgin
2cc76bcd12 auth: add rwlock to AuthClientHandler to prevent races
For cephx, build_authorizer reads a bunch of state (especially the
current session_key) which can be updated by the MonClient. With no
locks held, Pipe::connect() calls SimpleMessenger::get_authorizer()
which ends up calling RadosClient::get_authorizer() and then
AuthClientHandler::bulid_authorizer(). This unsafe usage can lead to
crashes like:

Program terminated with signal 11, Segmentation fault.
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
370 common/buffer.cc: No such file or directory.
in common/buffer.cc
(gdb) bt
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
0x00007fa0d2ddec00 in ~ptr (this=0x7f989c03b830) at ./include/buffer.h:171
ceph::buffer::list::rebuild (this=0x7f989c03b830) at common/buffer.cc:817
0x00007fa0d2ddecb9 in ceph::buffer::list::c_str (this=0x7f989c03b830) at common/buffer.cc:1045
0x00007fa0d2ea4dc2 in Pipe::connect (this=0x7fa0c4307340) at msg/Pipe.cc:907
0x00007fa0d2ea7d73 in Pipe::writer (this=0x7fa0c4307340) at msg/Pipe.cc:1518
0x00007fa0d2eb44dd in Pipe::Writer::entry (this=<value optimized out>) at msg/Pipe.h:59
0x00007fa0e0f5f9d1 in start_thread (arg=0x7f987a5e4700) at pthread_create.c:301
0x00007fa0de560b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

and

Error in `qemu-system-x86_64': invalid fastbin entry (free): 0x00007ff12887ff20
*** ======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7ff3dea1fa46]
/usr/lib/librados.so.2(+0x29eb03)[0x7ff3e3d43b03]
/usr/lib/librados.so.2(_ZNK9CryptoKey7encryptEP11CephContextRKN4ceph6buffer4listERS4_RSs+0x71)[0x7ff3e3d42661]
/usr/lib/librados.so.2(_Z21encode_encrypt_enc_blIN4ceph6buffer4listEEvP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xfe)[0x7ff3e3d417de]
/usr/lib/librados.so.2(_Z14encode_encryptIN4ceph6buffer4listEEiP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xa2)[0x7ff3e3d41912]
/usr/lib/librados.so.2(_ZN19CephxSessionHandler12sign_messageEP7Message+0x242)[0x7ff3e3d40de2]
/usr/lib/librados.so.2(_ZN4Pipe6writerEv+0x92b)[0x7ff3e3e61b2b]
/usr/lib/librados.so.2(_ZN4Pipe6Writer5entryEv+0xd)[0x7ff3e3e6c7fd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7ff3ded6ff8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ff3dea99a0d]

Fix this by adding an rwlock to AuthClientHandler. A simpler fix would
be to move RadosClient::get_authorizer() into the MonClient() under
the MonClient lock, but this would not catch all uses of other
Authorizer, e.g. for verify_authorizer() and it would serialize
independent connection attempts.

This mainly matters for cephx, but none and unknown can have the
global_id reset as well.

Partially-fixes: #6480
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:29:23 -07:00
David Zafman
2e8035fabc osd: Fix appending write to return any error back to caller
Also, correct double bumping of num_writes

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-09 11:35:38 -07:00
David Zafman
3371a25115 test: Fix Seg fault in ceph_test_rados
Fixes: #8049

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-09 11:35:12 -07:00
David Zafman
edd542e420 tools: Improve ceph_scratchtoolpp
Minor output improvements
Remove clone_range code that was asserting

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-09 11:35:05 -07:00
Sage Weil
34d69cdcfa mon: refresh elector required_features when they change
Currently we only refresh required_features on Elector::start().  This
does not prevent an old peer from calling an election (even though they
won't succeed in joining the resulting quorum).

Fix this by updating the elector's features when they change.  This way we
don't allow a useless election cycle just to trigger that update in
start().

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 11:13:31 -07:00
Sage Weil
b3b502f132 mon/Elector: ignore ACK from peers without required features
If an old peer gets a PROPOSE from us, we need to be sure to ignore their
ACK.  Ignoring their PROPOSEs isn't sufficient to keep them out of a
quorum.

Fixes: #8042
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 11:09:14 -07:00
Samuel Just
5a567c479f Merge pull request #1626 from ceph/wip-8031
osd: improve misdirected op checks

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-09 10:37:26 -07:00
Samuel Just
5b16650b42 Merge pull request #1627 from ceph/wip-8001
osd/PG: set CREATING pg state bit until we peer for the first time

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-09 10:34:54 -07:00
Samuel Just
2a9f5fd5ef Merge pull request #1631 from ceph/wip-8045
osd: fix check_osdmap_features deadlock

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-09 10:34:07 -07:00
Sage Weil
f1a8934060 Merge pull request #1632 from ceph/wip-5469
librbd: fix zero length request handling

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-09 08:14:28 -07:00
Alfredo Deza
dc4bbfa762 Merge pull request #1634 from ceph/wip-8028
rpm: add redhat-lsb dependency

Reviewed-by: Alfredo Deza <alfredo.deza@inktank.com>
2014-04-09 10:12:11 -04:00
Sage Weil
f1c6b65b47 ceph.spec.in: require redhat-lsb-core
We need this for /lib/lsb/init-functions.

Fixes: #8028
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 07:05:36 -07:00