Commit Graph

32869 Commits

Author SHA1 Message Date
John Wilkins
6650c0e839 doc: Added new docs to index.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-04-14 09:18:26 -07:00
John Wilkins
1310af2336 doc: Reworked the simple configuration guide to be more generic.
Changes include removing keystone and putting it into a separate document,
removing user config and putting it into an admin guide, and creating
separate config examples for CentOS/RHEL and Debian/Ubuntu. Needs
clarification on chown/chmod.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-04-14 09:18:07 -07:00
John Wilkins
6853d21a50 doc: New admin guide for Ceph Object Gateway. Needs some clarification (todo).
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-04-14 09:16:00 -07:00
John Wilkins
e02b84589e doc: Admin API usage for quotas. Needs additional clarification on syntax.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-04-14 09:15:20 -07:00
Sage Weil
cf69bdbd74 Merge pull request #1639 from ceph/wip-multimds
Wip multimds

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-09 21:19:42 -07:00
Yan, Zheng
ac51fcac6b mds: include truncate_seq/truncate_size in filelock's state
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-10 11:13:30 +08:00
Yan, Zheng
808ba130ef mds: remove wrong assertion for remote frozen authpin
For across authority rename, the MDS first freezes the source inode's
authpin. It happens while the source dentry isn't locked. So when the
inode's authpin become frozen, the source dentry may have changed and
be linked to a different inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-10 11:13:20 +08:00
Gregory Farnum
5df98f47b9 Merge pull request #1637 from ceph/wip-8042
mon: fix election required_features checks

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-04-09 17:21:57 -07:00
Sage Weil
71d97f998a Merge pull request #1636 from ceph/wip-6480
fix auth races that may have lead to qemu crashes

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-09 16:25:24 -07:00
Sage Weil
18642ed351 mon: tell peers missing features during probe
Use a new probe op to inform mons that they are missing features during
the earliest probe phase.  This prevents them from getting as far as
the sync entirely if they are too old.

We still need to refuse to speak to them if they try to call an election,
which they could do based on their replies from other peers.

Note that old clients will assert on getting a message type string they
don't understand, so we need to be careful not to send the probe reply
to older clients.  The feature bit we use is not precise in that it does
not cover recent dev releases, but it does work for dumpling and emperor.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 16:03:05 -07:00
Sage Weil
39ca440bfd mon: move required_features back into Monitor
This is simpler and cleaner.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 16:01:14 -07:00
Sage Weil
c8039ab857 mon: ignore sync clients without required_features
If we let them sync data they don't understand they will get confused
and crash.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 14:40:44 -07:00
Josh Durgin
50ed65fba3 auth: remove unused get_global_id() method
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
b297689abf auth: make AuthClientHandler::validate_ticket() protected
It's just used internally. Make it private in the subclasses since
there's just one level of inheritance.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
3ccef66276 auth: AuthClientHandler const cleanup
get_protocol(), build_request(), build_rotating_request(), and
build_authorizer() can all be declared const now.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
9af10b2c9a auth: CephxProtocol const cleanup
need_key() and build_authorizer() can be const.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
75948357ce utime: declare is_zero(), ceph_timespec(), and sleep() as const
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:36 -07:00
Josh Durgin
3119022dd4 auth: separate writes of build_request() into prepare_build_request()
validate_tickets() updates internal state, as does
tickets.get_handler(). Move them into a new method called before
build_request() so build_request() can be declared const.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:35 -07:00
Josh Durgin
970d53fc0f RWLock: make read locking methods const
This allows methods using RWLock for reading to be declared const.
There might be cases where we'd want to take a write lock in a const
method, but right now that's unnecessary, and I'd rather get a compile
error.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:35 -07:00
Josh Durgin
957ac3cbe3 RWLock: don't assign the lockdep id more than once
This never does anything since lockdep_register() assigns an id >= 0
in the RWLock constructor. This also prevents methods from being
declared const.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:35 -07:00
Josh Durgin
4d3d89bf24 auth: remove unused tick() method
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:31:35 -07:00
Josh Durgin
2cc76bcd12 auth: add rwlock to AuthClientHandler to prevent races
For cephx, build_authorizer reads a bunch of state (especially the
current session_key) which can be updated by the MonClient. With no
locks held, Pipe::connect() calls SimpleMessenger::get_authorizer()
which ends up calling RadosClient::get_authorizer() and then
AuthClientHandler::bulid_authorizer(). This unsafe usage can lead to
crashes like:

Program terminated with signal 11, Segmentation fault.
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
370 common/buffer.cc: No such file or directory.
in common/buffer.cc
(gdb) bt
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
0x00007fa0d2ddec00 in ~ptr (this=0x7f989c03b830) at ./include/buffer.h:171
ceph::buffer::list::rebuild (this=0x7f989c03b830) at common/buffer.cc:817
0x00007fa0d2ddecb9 in ceph::buffer::list::c_str (this=0x7f989c03b830) at common/buffer.cc:1045
0x00007fa0d2ea4dc2 in Pipe::connect (this=0x7fa0c4307340) at msg/Pipe.cc:907
0x00007fa0d2ea7d73 in Pipe::writer (this=0x7fa0c4307340) at msg/Pipe.cc:1518
0x00007fa0d2eb44dd in Pipe::Writer::entry (this=<value optimized out>) at msg/Pipe.h:59
0x00007fa0e0f5f9d1 in start_thread (arg=0x7f987a5e4700) at pthread_create.c:301
0x00007fa0de560b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

and

Error in `qemu-system-x86_64': invalid fastbin entry (free): 0x00007ff12887ff20
*** ======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7ff3dea1fa46]
/usr/lib/librados.so.2(+0x29eb03)[0x7ff3e3d43b03]
/usr/lib/librados.so.2(_ZNK9CryptoKey7encryptEP11CephContextRKN4ceph6buffer4listERS4_RSs+0x71)[0x7ff3e3d42661]
/usr/lib/librados.so.2(_Z21encode_encrypt_enc_blIN4ceph6buffer4listEEvP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xfe)[0x7ff3e3d417de]
/usr/lib/librados.so.2(_Z14encode_encryptIN4ceph6buffer4listEEiP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xa2)[0x7ff3e3d41912]
/usr/lib/librados.so.2(_ZN19CephxSessionHandler12sign_messageEP7Message+0x242)[0x7ff3e3d40de2]
/usr/lib/librados.so.2(_ZN4Pipe6writerEv+0x92b)[0x7ff3e3e61b2b]
/usr/lib/librados.so.2(_ZN4Pipe6Writer5entryEv+0xd)[0x7ff3e3e6c7fd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7ff3ded6ff8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ff3dea99a0d]

Fix this by adding an rwlock to AuthClientHandler. A simpler fix would
be to move RadosClient::get_authorizer() into the MonClient() under
the MonClient lock, but this would not catch all uses of other
Authorizer, e.g. for verify_authorizer() and it would serialize
independent connection attempts.

This mainly matters for cephx, but none and unknown can have the
global_id reset as well.

Partially-fixes: #6480
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-09 14:29:23 -07:00
Sage Weil
34d69cdcfa mon: refresh elector required_features when they change
Currently we only refresh required_features on Elector::start().  This
does not prevent an old peer from calling an election (even though they
won't succeed in joining the resulting quorum).

Fix this by updating the elector's features when they change.  This way we
don't allow a useless election cycle just to trigger that update in
start().

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 11:13:31 -07:00
Sage Weil
b3b502f132 mon/Elector: ignore ACK from peers without required features
If an old peer gets a PROPOSE from us, we need to be sure to ignore their
ACK.  Ignoring their PROPOSEs isn't sufficient to keep them out of a
quorum.

Fixes: #8042
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 11:09:14 -07:00
Samuel Just
5a567c479f Merge pull request #1626 from ceph/wip-8031
osd: improve misdirected op checks

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-09 10:37:26 -07:00
Samuel Just
5b16650b42 Merge pull request #1627 from ceph/wip-8001
osd/PG: set CREATING pg state bit until we peer for the first time

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-09 10:34:54 -07:00
Samuel Just
2a9f5fd5ef Merge pull request #1631 from ceph/wip-8045
osd: fix check_osdmap_features deadlock

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-09 10:34:07 -07:00
Sage Weil
f1a8934060 Merge pull request #1632 from ceph/wip-5469
librbd: fix zero length request handling

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-09 08:14:28 -07:00
Alfredo Deza
dc4bbfa762 Merge pull request #1634 from ceph/wip-8028
rpm: add redhat-lsb dependency

Reviewed-by: Alfredo Deza <alfredo.deza@inktank.com>
2014-04-09 10:12:11 -04:00
Sage Weil
f1c6b65b47 ceph.spec.in: require redhat-lsb-core
We need this for /lib/lsb/init-functions.

Fixes: #8028
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 07:05:36 -07:00
Sage Weil
1d0c62facf Merge pull request #1606 from ceph/wip-shrink-icache
client: try shrinking kernel inode cache when trimming session caps

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-09 06:55:33 -07:00
Sage Weil
e5f3eb8289 Merge pull request #1633 from ceph/wip-8004
client: wake up umount waiter if receiving session open message

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-08 20:48:12 -07:00
Yan, Zheng
02aedbc447 client: wake up umount waiter if receiving session open message
Wake up umount waiter if receiving session open message while
umounting. The umount waiter will re-close the session.

Fixes: #8004
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-09 11:23:48 +08:00
Josh Durgin
a8330f5cfd librbd: fix zero length request handling
Zero-length writes would hang because the completion was never
called. Reads would hit an assert about zero length in
Striper::file_to_exents().

Fix all of these cases by skipping zero-length extents. The completion
is created and finished when finish_adding_requests() is called. This
is slightly different from usual completions since it comes from the
same thread as the one scheduling the request, but zero-length aio
requests should never happen from things that might care about this,
like QEMU.

Writes and discards have had this bug since the beginning of
librbd. Reads might have avoided it until stripingv2 was added.

Fixes: #5469
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-08 17:39:00 -07:00
Sage Weil
22a0c1fd5e osd: do not block when updating osdmap superblock features
We are holding osd_lock in check_osdmap_features, which means we cannot
block while waiting for filestore operations to flush/apply without
risking deadlock.

The important constraint is that we commit that the feature is enabled
before also commiting anything that utilizes sharded objects.  The normal
commit sequencing does that already; there is no reason to block here.

Fixes: #8045
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-08 17:28:54 -07:00
John Wilkins
43f0519b98 doc: Made minor changes to quick start preflight for RHEL.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-04-08 15:54:17 -07:00
John Wilkins
ab7a25ce16 doc: Notes and minor modifications to gateway installation doc.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-04-08 15:53:32 -07:00
Josh Durgin
1d74170a4c pipe: only read AuthSessionHandler under pipe_lock
session_security, the AuthSessionHandler for a Pipe, is deleted and
recreated while the pipe_lock is held. read_message() is called
without pipe_lock held, and examines session_security. To make this
safe, make session_security a shared_ptr and take a reference to it
while the pipe_lock is still held, and use that shared_ptr in
read_message().

This may have caused crashes like:

*** Error in `qemu-system-x86_64': invalid fastbin entry (free): 0x00007f42a4002de0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7f452f1f3a46]
/usr/lib/x86_64-linux-gnu/libnss3.so(PK11_FreeSymKey+0xa8)[0x7f452e72ff98]
/usr/lib/librados.so.2(+0x2a18cd)[0x7f453451a8cd]
/usr/lib/librados.so.2(_ZNK9CryptoKey7encryptEP11CephContextRKN4ceph6buffer4listERS4_RSs+0x71)[0x7f4534519421]
/usr/lib/librados.so.2(_Z21encode_encrypt_enc_blIN4ceph6buffer4listEEvP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xfe)[0x7f453451859e]
/usr/lib/librados.so.2(_Z14encode_encryptIN4ceph6buffer4listEEiP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xa2)[0x7f45345186d2]
/usr/lib/librados.so.2(_ZN19CephxSessionHandler23check_message_signatureEP7Message+0x246)[0x7f4534516866]
/usr/lib/librados.so.2(_ZN4Pipe12read_messageEPP7Message+0xdcc)[0x7f453462ecbc]
/usr/lib/librados.so.2(_ZN4Pipe6readerEv+0xa5c)[0x7f453464059c]
/usr/lib/librados.so.2(_ZN4Pipe6Reader5entryEv+0xd)[0x7f4534643ecd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7f452f543f8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f452f26da0d]

Partially-fixes: #6480
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-08 15:40:28 -07:00
Josh Durgin
26907e3dcd Merge pull request #1628 from ceph/wip-5835
update package descriptions

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-08 14:47:21 -07:00
Sage Weil
277e7ac46b debian: update ceph description
Fixes: #5835
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-08 14:19:38 -07:00
Sage Weil
72dc7327e9 ceph.spec: update ceph description
Fixes: #5835
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-08 14:18:44 -07:00
Samuel Just
4bb0628acf Merge pull request #1625 from ceph/wip-8019
osd: fix journal umount/mount weirdness

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-08 12:45:28 -07:00
Sage Weil
79ac2f79d6 osd/PG: set CREATING pg state bit until we peer for the first time
We send PG state updates to the monitor while creating a PG before the
actual creation and been finalized and persisted.  Because those updates
do not include the CREATING bit, the mon will remove the pgid from it's
creating set.  If the OSD(s) crash before persisting that PG creation, the
PG will never get created.

Fix this by leaving the CREATING bit set on the primary as long as
last_epoch_started==0.  That is, until we successfully peer for the very
first time.  Only then do we clear the bit and tell the monitor it's duty
is complete.

Fixes: #8001
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-08 12:26:19 -07:00
Sage Weil
4de49e8676 os/FileStore: reset journal state on umount
We observed a sequence like:

 - replay journal
   - sets JournalingObjectStore applied_op_seq
 - umount
 - mount
   - initiate commit with prevous applied_op_seq
 - replay journal
   - commit finishes
   - on replay commit, we fail assert op > committed_seq

Although strictly speaking the assert failure is harmless here, in general
we should not let state leak through from a previous mount into this
mount or else assertions are in general more difficult to reason about.

Fixes: #8019
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-08 11:02:40 -07:00
Sage Weil
1cdb7381c7 vstart.sh: make crush location match up with what init-ceph does
This makes is to that ./init-ceph restart osd.0 won't modify the CRUSH
tree.  And in any case, the localhost/localrack thing we were doing before
was pretty useless.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-08 11:02:40 -07:00
Gregory Farnum
ddafcc3710 Merge pull request #1623 from ceph/wip-8026
mds: fix shared_ptr MDRequest bugs

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-04-08 10:43:14 -07:00
Sage Weil
667137cc91 Merge pull request #1621 from dachary/wip-7914
erasure-code: thread-safe initialization of gf-complete

This looks like a good interim solution until gf-complete exposes a simpler init function
that hides this.

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-08 10:14:46 -07:00
Sage Weil
d2edd9c1d3 osd: drop unused same_for_*() helpers
These were all identical and mostly served to obscure the actual logic,
which is now captured by can_discard_op() and the matching Objecter
code on the client side.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-08 09:06:32 -07:00
Sage Weil
5d6116199e osd: drop previous interval ops even if primary happens to be the same
If we have two consecutive intervals with the same primary, the client
will not resend the op and the same_primary_since epoch will not change,
and all is well.

If, however, we have 3 intervals, and the primary changes away and then
back to a particular OSD, the OSD will currently still process the old
request (assuming the timing works out) because it is currently the
primary.  This is unnecessary because the client will resend the request.
It may even introduce a hard-to-hit ordering problem since whether or not
the OSD processes the message becomes dependent on how many subsequent
maps it has consumed when the request is processed.

Instead, simplify the minor tangle of helpers by making a single simple
check that discards requests from before same_primary_since.  We can then
avoid using the same_for_*() helpers and drop the check from
handle_misdireted_op(), which is also nice because the name is now accurate
(it *only* deals with ops that are in fact misdirected, not just slow to
arrive).

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-08 09:06:32 -07:00
Sage Weil
d3833ddafe osd: make misdirected checks explicit about replicas, flags
Only allow read ops to target replicas if the necessary op flags are set.
The previous checks were very sloppy.

Fixes: #8031
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-08 09:06:32 -07:00