Commit Graph

23232 Commits

Author SHA1 Message Date
Sage Weil
483c6f76ad test_filejournal: optionally specify journal filename as an argument
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 13:39:05 -08:00
Sage Weil
c461e7fc1e test_filejournal: test journaling bl with >IOV_MAX segments
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 13:39:05 -08:00
Sage Weil
dda7b65189 os/FileJournal: limit size of aio submission
Limit size of each aio submission to IOV_MAX-1 (to be safe).  Take care to
only mark the last aio with the seq to signal completion.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 13:39:05 -08:00
Josh Durgin
e0858fa899 Revert "librbd: ensure header is up to date after initial read"
Using assert version for linger ops doesn't work with retries,
since the version will change after the first send.
This reverts commit e177680903.

Conflicts:

	qa/workunits/rbd/watch_correct_version.sh
2013-01-02 12:32:33 -08:00
John Wilkins
82297706da doc: Minor edits.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-02 11:24:39 -08:00
John Wilkins
d3b9803eab doc: Fixed typo, clarified usage.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-02 11:15:16 -08:00
Sage Weil
eb02eaede5 Merge remote-tracking branch 'gh/wip-bobtail-docs' 2013-01-01 10:36:57 -08:00
Gary Lowell
f1196c7e93 Merge branch 'master' of https://github.com/ceph/ceph 2012-12-31 21:35:03 -08:00
Gary Lowell
5dd6b19918 Merge branch 'next' 2012-12-31 21:31:17 -08:00
Sage Weil
8f77ec7d81 Merge branch 'next' 2012-12-31 18:37:12 -08:00
Sage Weil
94a5dd6b76 Merge remote-tracking branch 'gh/wip-3675'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-31 18:36:39 -08:00
Gary Lowell
1a32f0a0b4 v0.56 2012-12-31 17:10:11 -08:00
Sage Weil
49ebe1ee3a client: fix _create created ino condition
We get 8 bytes back for the created ino.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-31 15:28:25 -08:00
Sage Weil
a10054bc52 libcephfs: choose more unique nonce
We were using a per-process counter combined with the pid.  A short
running process can easily loop through and reuse the same pid later.
Instead, go for 48 bits of randomness and the pid.  This way if we get
a dup pid we'll only get a dup nonce once out of 2^48 tries.

Avoids #3630 when running a libcephfs test in a loop (so that the pid
is eventually reused).  This is a better fix than the broken
8b59908370.  The real solution on the MDS
side involves cleaning up the msgr/MDS interaction with session
shutdown.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-31 15:26:54 -08:00
Sage Weil
e2fef38dfd client: fix _create
make_request() clear out req->reply and frees req; we can't inspect
it here.

Instead, just assume that extra_bl is the create flag/ino if it is
present.  Old code does not include an extra_bl on CREATE, and new code
will have the same first bytes for compatibility.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-31 15:26:53 -08:00
Sage Weil
b4d3bd06d4 Merge remote-tracking branch 'gh/wip-3625' 2012-12-31 10:16:31 -08:00
Sage Weil
ec5288a312 Merge remote-tracking branch 'gh/wip-rbd-unprotect' into next
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-30 15:29:37 -08:00
Joao Eduardo Luis
82cec48e9f doc: add-or-rm-mons.rst: Add 'Changing Monitor's IPs' section
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-30 19:18:09 +00:00
Joao Eduardo Luis
379f07923c doc: add-or-rm-mons.rst: Clarify what the monitor name/id is.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-12-30 19:17:03 +00:00
Josh Durgin
8bbb4a364d doc: fix rbd permissions for unprotect
Unprotect examines all pools, so use blanket x before 0.54. After
that, use class-read restricted by object_prefix to rbd_children.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
d0a14d110d librbd: fix race between unprotect and clone
Clone needs to actually re-read the header to make sure the image is
still protected before returning. Additionally, it needs to consider
the image protected *only* if the protection status is protected -
unprotecting does not count. I thought I'd already fixed this, but
can't find the commit.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
958addc0c9 rbd: open (source) image as read-only
This allows users without write access to copy, export and list
information about an image.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
47bf519584 librbd: open parent as read-only during clone
We never write to the parent, and don't need to watch it during this process.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
c67c789de6 librbd: add {rbd_}open_read_only()
Since 58890cfad5, regular {rbd_}open()
would fail with -EPERM if the user did not have write access to the
pool, since a watch on the header was requested.

For many uses of read-only access, establishing a watch is not
necessary, since changes to the header do not matter. For example,
getting metadata about an image via 'rbd info' does not care if a new
snapshot is created while it is in progress.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
91e941aef9 OSD: remove RD flag from CALL ops
20496b8d2b forgot to do this. Without
this change, all class methods required regular read permission in
addition to class-read or class-write.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
85e9d4f000 cls_rbd: get_children does not need write permission
This prevented a read-only user from being able to unprotect a
snapshot without write permission on all pools. This was masked before
by the CLS_METHOD_PUBLIC flag.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Sage Weil
4aa6af76e1 doc/release-notes: link to upgrade doc
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-29 21:00:07 -08:00
Sage Weil
7b0dbeb0fa doc/install/upgrading: edits to upgrade document
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-29 21:00:07 -08:00
Sage Weil
6711a4c403 Revert "mds: replace closed sessions on connect"
This reverts commit 8b59908370.

This fix is not correct.  See #3696.
2012-12-29 08:38:52 -08:00
Sage Weil
82f8bcddb5 msg/Pipe: use state_closed atomic_t for _lookup_pipe
We shouldn't look at Pipe::state in SimpleMessenger::_lookup_pipe() without
holding pipe_lock.  Instead, use an atomic that we set to non-zero only
when transitioning to the terminal STATE_CLOSED state.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 17:21:01 -08:00
Sage Weil
a5d692a7b9 msgr: inject delays at inconvenient times
Exercise some rare races by injecting delays before taking locks
via the 'ms inject internal delays' option.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 17:21:01 -08:00
Sage Weil
e99b4a307b msgr: fix race on Pipe removal from hash
When a pipe is faulting and shutting down, we have to drop pipe_lock to
take msgr lock and then remove the entry.  The Pipe in this case will
have STATE_CLOSED.  Handle this case in all places we do a lookup on
the rank_pipe hash so that we effectively ignore entries that are
CLOSED.

This fixes a race introduced by the previous commit where we won't use
the CLOSED pipe and try to register a new one, but the old one is still
registered.

See bug #3675.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 17:21:00 -08:00
Sage Weil
6339c5d439 msgr: don't queue message on closed pipe
If we have a con that refs a pipe but it is closed, don't use it.  If
the ref is still there, it is only because we are racing with fault()
and it is about to (or just was) be detached.  Either way,

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 17:21:00 -08:00
Sage Weil
7bf0b0854d msgr: atomically queue first message with connect_rank
Atomically queue the first message on the new pipe, without dropping
and retaking pipe_lock.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 17:21:00 -08:00
Sage Weil
83c8025d12 Merge remote-tracking branch 'gh/next' 2012-12-28 17:19:46 -08:00
Joao Eduardo Luis
c2a75253e5 test: mon: workloadgen: debug when message fsid != monmap fsid
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-12-28 17:19:38 -08:00
Joao Eduardo Luis
b30ab51792 test: mon: workloadgen: assert if monmap's fsid is zero after authenticate
Fixes: #3629

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-12-28 17:19:35 -08:00
Noah Watkins
3583684776 doc: update Hadoop documentation
Updates configuration option names, and adds object.size,
localize.reads, and root.dir control options.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-12-28 17:19:31 -08:00
Sage Weil
942c71454b init-ceph: ok, 8K files
16K might be a bit many.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 17:12:06 -08:00
Sage Weil
0a5d6d8759 msg/Pipe: remove broken cephs signing requirement check
Remove the special-case check, which does not inform the peer what
protocol features are missing.  It also enforces this requirement even
when we negotiate auth none.

Reported as part of bug #3657.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 17:10:28 -08:00
Sage Weil
65b787ea2a msg/Pipe: include remote socket addr in debug output
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 16:00:47 -08:00
John Wilkins
9e5e08f84f doc: Added a new upgrade document.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-28 15:55:46 -08:00
John Wilkins
1553267ef1 doc: Minor edit.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-28 15:55:22 -08:00
John Wilkins
02b8bcd0be doc: Added upgrade link to index.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-28 15:54:57 -08:00
Sage Weil
076b418c7f os/FileJournal: logger is optional
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 15:44:51 -08:00
Sage Weil
3debf0cf3d client: fix fh leak in non-create case
We may take the O_CREAT path and get an fh from _create, but created can
still be false.  In that case, skip the final _open call.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 15:14:25 -08:00
Sam Lang
67bc849c68 mds: Return created inode in mds reply to create
If multiple clients race to create a file, multiple clients will send a
create request and get back a valid dentry+inode, but only one client
will actually win the race to create the file.  All other clients should
treat the reply as an open of an existing file and check permissions.
This patch adds the created inode number to the mds create reply if that
request actually created the inode/file (and the feature is supported),
so the client can properly check permissions if the inode number isn't
returned.  Fixes #3625.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-28 15:10:02 -08:00
Sam Lang
7f35e5dda6 client: Make ll_create use _create
This is a fix for bug #3625, where multiple clients race to create a
file, and the loser returns EEXIST instead of a valid file handle.
The patch modifies ll_create in the Client class to use _create(),
which sends the request to the MDS (where an atomic create/open is
performed).

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2012-12-28 15:10:02 -08:00
Sage Weil
813787af3d log: broadcast cond signals
We were using a single cond, and only signalling one waiter.  That means
that if the flusher and several logging threads are waiting, and we hit
a limit, we the logger could signal another logger instead of the flusher,
and we could deadlock.

Similarly, if the flusher empties the queue, it might signal only a single
logger, and that logger could re-signal the flusher, and the other logger
could wait forever.

Intead, break the single cond into two: one for loggers, and one for the
flusher.  Always signal the (one) flusher, and always broadcast to all
loggers.

Backport: bobtail, argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-12-28 15:08:29 -08:00
Sage Weil
ca34fc4d3c osd: allow RecoveryDone self-transition in RepNotRecovering
In a mixed cluster where some OSDs support the recovery reservations and
some don't, the replica may be new code in RepNotRecoverying and will
complete a backfill.  In that case, we want to just stayin
RepNotRecovering.

It may also be possible to make it infer what the primary is doing even
thought it is not sending recovery reservation messages, but this is much
more complicated and doesn't accomplish much.

Fixes: #3689
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 15:03:10 -08:00