Commit Graph

23142 Commits

Author SHA1 Message Date
Joao Eduardo Luis
3610e72e4f mon: OSDMonitor: only share osdmap with up OSDs
Try to share the map with a randomly picked OSD; if the picked monitor is
not 'up', then try to find the nearest 'up' OSD in the map by doing a
backward and a forward linear search on the map -- this would be O(n) in
the worst case scenario, as we only do a single iteration starting on the
picked position, incrementing and decrementing two different iterators
until we find an appropriate OSD or we exhaust the map.

Fixes: #3629
Backport: bobtail

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-12 01:09:01 +00:00
Alex Elder
aeb02061de qa/run_xfstests.sh: use cloned xfstests repository
Use our own copy of the xfstests repository rather than hitting
the upstream one repeatedly.

Signed-off-by: Alex Elder <elder@inktank.com>
2013-01-11 12:49:36 -06:00
Samuel Just
988f359778 rados: add truncate support
Signed-off-by: Samuel Just <sam.just@inktank.com>
Revewed-by: Greg Farnum <greg@inktank.com>
2013-01-10 13:54:53 -08:00
Samuel Just
0f42c37359 ReplicatedPG: fix snapdir trimming
The previous logic was both complicated and not correct.  Consequently,
we have been tending to drop snapcollection links in some cases.  This
has resulted in clones incorrectly not being trimmed.  This patch
replaces the logic with something less efficient but hopefully a bit
clearer.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-10 10:34:40 -08:00
Yehuda Sadeh
035caac551 Revert "rgw: fix handler leak in handle_request"
This reverts commit eba314a811.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-01-10 10:14:11 -08:00
Sylvain Munaut
e1da85f286 rgw: Fix crash when FastCGI frontend doesn't set SCRIPT_URI
Fixes: #3735
Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-08 18:29:27 -08:00
caleb miles
eba314a811 rgw: fix handler leak in handle_request
Fixes: #3682
Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-08 18:28:31 -08:00
Dan Mick
4483285c9f librbd: Allow get_lock_info to fail
If the lock class isn't present, EOPNOTSUPP is returned for lock calls
on newer OSDs, but sadly EIO on older; we need to treat both as
acceptable failures for RBD images.  rados lock list will still fail.

Fixes #3744.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-08 18:25:46 -08:00
Samuel Just
f83fcf63a9 PG: set DEGRADED in Active AdvMap handler based on pool size
Otherwise, if the acting set does not change, the pg might
not show up as degraded if the pool size now exceeds the
acting set size.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-07 15:16:09 -08:00
Sage Weil
1b39b31678 Merge branch 'wip-3678-b' into next
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-01-07 13:04:13 -08:00
Sage Weil
d16ad9263d msg/Pipe: prepare Message data for wire under pipe_lock
We cannot trust the Message bufferlists or other structures to be
stable without pipe_lock, as another Pipe may claim and modify the sent
list items while we are writing to the socket.

Related to #3678.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-07 13:02:58 -08:00
Sage Weil
40706afc66 msgr: update Message envelope in encode, not write_message
Fill out the Message header, footer, and calculate CRCs during
encoding, not write_message().  This removes most modifications from
Pipe::write_message().

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-07 13:02:58 -08:00
Sage Weil
62586884af osdc/Objecter: fix linger_ops iterator invalidation on pool deletion
The call to check_linger_pool_dne() may unregister the linger request,
invalidating the iterator.  To avoid this, increment the iterator at
the top of the loop.

This mirror the fix in 4bf9078286 for
regular non-linger ops.

Fixes: #3734
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-01-07 12:58:39 -08:00
Sage Weil
4cfc4903c6 msg/Pipe: encode message inside pipe_lock
This modifies bufferlists in the Message struct, and it is possible
for multiple instances of the Pipe to get references on the Message;
make sure they don't modify those bufferlists concurrently.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-06 20:38:28 -08:00
Sage Weil
a058f16113 msg/Pipe: associate sending msgs to con inside lock
Associate a sending message with the connection inside the pipe_lock.
This way if a racing thread tries to steal these messages it will
be sure to reset the con point *after* we do such that it the con
pointer is valid in encode_payload() (and later).

This may be part of #3678.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-06 20:38:25 -08:00
Sage Weil
2a1eb466d3 msg/Pipe: fix msg leak in requeue_sent()
The sent list owns a reference to each message.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-06 20:38:22 -08:00
Sage Weil
ce49968938 os/FileJournal: include limits.h
Needed for IOV_MAX.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-05 20:54:09 -08:00
Sage Weil
988a521735 osd: special case CALL op to not have RD bit effects
In commit 20496b8d2b we treat a CALL as
different from a normal "read", but we did not adjust the behavior
determined by the RD bit in the op.  We tried to fix that in
91e941aef9, but changing the op code breaks
compatibility, so that was reverted.

Instead, special-case CALL in the helper--the only point in the code that
actually checks for the RD bit.  (And fix one lingering user to use that
helper appropriately.)

Fixes: #3731
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-01-04 20:46:56 -08:00
Sage Weil
d3abd0fe0b Revert "OSD: remove RD flag from CALL ops"
This reverts commit 91e941aef9.

We cannot change this op code without breaking compatibility
with old code (client and server).  We'll have to special case
this op code instead.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-01-04 20:46:48 -08:00
Samuel Just
e89b6ade63 ReplicatedPG: remove old-head optization from push_to_replica
This optimization allowed the primary to push a clone as a single push in the
case that the head object on the replica is old and happens to be at the same
version as the clone.  In general, using head in clone_subsets is tricky since
we might be writing to head during the push.  calc_clone_subsets does not
consider head (probably for this reason).  Handling the clone from head case
properly would require blocking writes on head in the interim which is probably
a bad trade off anyway.

Because the old-head optimization only comes into play if the replica's state
happens to fall on the last write to head prior to the snap that caused the
clone in question, it's not worth the complexity.

Fixes: #3698
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-04 13:44:18 -08:00
Sage Weil
28d59d374b os/FileStore: fix non-btrfs op_seq commit order
The op_seq file is the starting point for journal replay.  For stable btrfs
commit mode, which is using a snapshot as a reference, we should write this
file before we take the snap.  We normally ignore current/ contents anyway.

On non-btrfs file systems, however, we should only write this file *after*
we do a full sync, and we should then fsync(2) it before we continue
(and potentially trim anything from the journal).

This fixes a serious bug that could cause data loss and corruption after
a power loss event.  For a 'kill -9' or crash, however, there was little
risk, since the writes were still captured by the host's cache.

Fixes: #3721
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-01-03 17:15:07 -08:00
Samuel Just
4ae4dce5c5 OSD: for old osds, dispatch peering messages immediately
Normally, we batch up peering messages until the end of
process_peering_events to allow us to combine many notifies, etc
to the same osd into the same message.  However, old osds assume
that the actiavtion message (log or info) will be _dispatched
before the first sub_op_modify of the interval.  Thus, for those
peers, we need to send the peering messages before we drop the
pg lock, lest we issue a client repop from another thread before
activation message is sent.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-03 14:18:00 -08:00
Sage Weil
7e94f6f1a7 Merge remote-tracking branch 'gh/wip-3714-b' into next
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-01-03 12:53:07 -08:00
Sage Weil
a32d6c5dca osd: move common active vs booting code into consume_map
Push osdmaps to PGs in separate method from activate_map() (whose name
is becoming less and less accurate).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 22:39:10 -08:00
Sage Weil
0bfad8ef20 osd: let pgs process map advances before booting
The OSD deliberate consumes and processes most OSDMaps from while it
was down before it marks itself up, as this is can be slow.  The new
threading code does this asynchronously in peering_wq, though, and
does not let it drain before booting the OSD.  The OSD can get into
a situation where it marks itself up but is not responsive or useful
because of the backlog, and only makes the situation works by
generating more osdmaps as result.

Fix this by calling activate_map() even when booting, and when booting
draining the peering_wq on each call.  This is harmless since we are
not yet processing actual ops; we only need to be async when active.

Fixes: #3714
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 22:20:06 -08:00
Sage Weil
5fc94e89a9 osd: drop oldest_last_clean from activate_map
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 22:04:34 -08:00
Sage Weil
67f7ee6799 osd: drop unused variables from activate_map
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 22:04:08 -08:00
Sage Weil
a14a36ed78 OSDMap: fix modifed -> modified typo
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 21:09:07 -08:00
Sage Weil
43cba617aa log: fix locking typo/stupid for dump_recent()
We weren't locking m_flush_mutex properly, which in turn was leading to
racing threads calling dump_recent() and garbling the crash dump output.

Backport: bobtail, argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-01-02 17:01:32 -08:00
Sam Lang
d8940d15c3 fuse: Fix cleanup code path on init failure
With the changes from 856f32ab, the cfuse.init call returns
a _positive_ errno, which was getting ignored.  Also, if an
error occurs during cfuse.init(), we need to teardown the client
mount.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-01-02 16:38:28 -06:00
Sage Weil
9a1cf51888 Merge branch 'wip-journal-aio' into next
Reviewed-by: Samuel Just <sam.just@inktank.com>
Backport: bobtail
2013-01-02 13:42:22 -08:00
Sage Weil
483c6f76ad test_filejournal: optionally specify journal filename as an argument
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 13:39:05 -08:00
Sage Weil
c461e7fc1e test_filejournal: test journaling bl with >IOV_MAX segments
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 13:39:05 -08:00
Sage Weil
dda7b65189 os/FileJournal: limit size of aio submission
Limit size of each aio submission to IOV_MAX-1 (to be safe).  Take care to
only mark the last aio with the seq to signal completion.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-02 13:39:05 -08:00
Gary Lowell
f1196c7e93 Merge branch 'master' of https://github.com/ceph/ceph 2012-12-31 21:35:03 -08:00
Gary Lowell
5dd6b19918 Merge branch 'next' 2012-12-31 21:31:17 -08:00
Sage Weil
8f77ec7d81 Merge branch 'next' 2012-12-31 18:37:12 -08:00
Sage Weil
94a5dd6b76 Merge remote-tracking branch 'gh/wip-3675'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-31 18:36:39 -08:00
Gary Lowell
1a32f0a0b4 v0.56 2012-12-31 17:10:11 -08:00
Sage Weil
49ebe1ee3a client: fix _create created ino condition
We get 8 bytes back for the created ino.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-31 15:28:25 -08:00
Sage Weil
a10054bc52 libcephfs: choose more unique nonce
We were using a per-process counter combined with the pid.  A short
running process can easily loop through and reuse the same pid later.
Instead, go for 48 bits of randomness and the pid.  This way if we get
a dup pid we'll only get a dup nonce once out of 2^48 tries.

Avoids #3630 when running a libcephfs test in a loop (so that the pid
is eventually reused).  This is a better fix than the broken
8b59908370.  The real solution on the MDS
side involves cleaning up the msgr/MDS interaction with session
shutdown.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-31 15:26:54 -08:00
Sage Weil
e2fef38dfd client: fix _create
make_request() clear out req->reply and frees req; we can't inspect
it here.

Instead, just assume that extra_bl is the create flag/ino if it is
present.  Old code does not include an extra_bl on CREATE, and new code
will have the same first bytes for compatibility.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-31 15:26:53 -08:00
Sage Weil
b4d3bd06d4 Merge remote-tracking branch 'gh/wip-3625' 2012-12-31 10:16:31 -08:00
Sage Weil
ec5288a312 Merge remote-tracking branch 'gh/wip-rbd-unprotect' into next
Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-30 15:29:37 -08:00
Joao Eduardo Luis
82cec48e9f doc: add-or-rm-mons.rst: Add 'Changing Monitor's IPs' section
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-30 19:18:09 +00:00
Joao Eduardo Luis
379f07923c doc: add-or-rm-mons.rst: Clarify what the monitor name/id is.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-12-30 19:17:03 +00:00
Josh Durgin
8bbb4a364d doc: fix rbd permissions for unprotect
Unprotect examines all pools, so use blanket x before 0.54. After
that, use class-read restricted by object_prefix to rbd_children.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
d0a14d110d librbd: fix race between unprotect and clone
Clone needs to actually re-read the header to make sure the image is
still protected before returning. Additionally, it needs to consider
the image protected *only* if the protection status is protected -
unprotecting does not count. I thought I'd already fixed this, but
can't find the commit.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
958addc0c9 rbd: open (source) image as read-only
This allows users without write access to copy, export and list
information about an image.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
47bf519584 librbd: open parent as read-only during clone
We never write to the parent, and don't need to watch it during this process.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00