Commit Graph

20871 Commits

Author SHA1 Message Date
Josh Durgin
bf2e489248 cls_lock_client: change modified reference parameters to pointers
This makes it clear which parameters are modified,
as our style guide states.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:39:43 -07:00
Josh Durgin
2dca3a8616 cls_lock_client: clean up reference parameters
These should all be const. The remaining reference parameters
will be converted to pointers in another commit.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:36:11 -07:00
Josh Durgin
e71fdc75be cls_lock: fix some spacing
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:36:08 -07:00
Yehuda Sadeh
b69a9599d8 cls_lock: specify librados namespace explicitly
librados namespace was not specified, hence required including
source files to add using namespace. This fixes it.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-09-18 15:35:30 -07:00
Josh Durgin
3372f1471c rbd: only open the destination pool for import
Otherwise importing into another pool when the default pool, rbd,
doesn't exist results in an error trying to open the rbd pool.

Reported-by: Sébastien Han <han.sebastien@gmail.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:21:49 -07:00
Josh Durgin
ad2ba8e606 qa: test args for rbd import
Make sure that --pool/--dest-pool and --image/--dest all work
interchangeably.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:21:43 -07:00
Josh Durgin
d14a31d387 rbd: make --pool/--image args easier to understand for import
There's no need to set the default pool in set_pool_image_name - this
is done later, in a way that doesn't ignore --pool if --dest-pool
is not specified.

This means --pool and --image can be used with import, just like
the rest of the commands. Without this change, --dest and --dest-pool
had to be used, and --pool would be silently ignored for rbd import.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:21:39 -07:00
Josh Durgin
a583a605aa librbd, cls_rbd: close snapshot creation race with old format
If two clients created a snapshot at the same time, the one with the
higher snapshot id might be created first, so the lower snapshot id
would be added to the snapshot context and the snaphot seq would be
set to the lower one.

Instead of allowing this to happen, return -ESTALE if the snapshot id
is lower than the currently stored snapshot sequence number. On the
client side, get a new id and retry if this error is encountered.

Backport: argonaut
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-09-18 15:21:21 -07:00
Sage Weil
a4833bb293 librbd: fix delete[]
CID 716902: Non-array delete for scalars (DELETE_ARRAY)
At (15): Deleting array variable "buf" with non-array delete in "delete buf".

Signed-off-by: Sage Weil <sage@inktank.com>
2012-09-18 15:19:47 -07:00
Josh Durgin
3401f004cd doc: clarify rbd man page (esp. layering)
* a clone's size can't be overridden
* note which commands require format 2
* clarify details of copy
* add examples for cloning
* add pool to map example for consistency
* fix a couple warnings and re-sync man page with rst

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:19:17 -07:00
Josh Durgin
582001eb49 rbd: add --format option
This chooses whether to use the original (supported by krbd)
or the new (supports layering) format.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:19:07 -07:00
Josh Durgin
a1124193c2 librbd: prevent racing clone and snap unprotect
If the following sequence of events occured,
a clone could be created of an unprotected snapshot:

1. A: begin clone - check that snap foo is protected
2. B: rbd unprotect snap foo
3. B: check that all pools have no clones of foo
4. B: unprotect snap foo
5. A: finish creating clone of foo, add it as a child

To stop this from happening, check at the beginning and end of
cloning that the parent snapshot is protected. If it is not,
or checking protection status fails (possibly because the parent
snapshot was removed), remove the clone and return an error.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:18:59 -07:00
Dan Mick
e85a238303 rbd: add "children" command, update cli test files
Fixes: #2720
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-09-18 15:18:50 -07:00
Dan Mick
bd9405844b librbd: add {rbd_}list_children() methods
These iterate over all pools and check for children of a
particular snapshot.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-09-18 15:18:27 -07:00
Samuel Just
4e5283d476 ReplicatedPG: do not start_recovery_op if we are already pushing
Should fix bug #2761.

If we are already pushing soid, recovery_ops will only be decremented once for
all current pushes, so only increment recovery_ops if we are not currently
pushing it.

This bug causes us to leak a recovery op and get stuck in backfill.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-09-11 13:37:03 -07:00
Sage Weil
656ab158ce osd: fill in user log entry last after snapdir tran
Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff.  Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).

The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps.  Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.

This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-09-11 13:37:03 -07:00
Sage Weil
5f36b8d784 osd: fix waiting_for_disk assertion
If requeue is false, we won't have cleared out waiting_for_ondisk; adjust
assert placement as appropriate.  Also, make sur we handle the requeue
and !op case properly (although I'm not sure offhand if/when it would
come up).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-28 15:14:41 -07:00
Mike Ryan
745a3c9ba0 rados_bench: wait for completion callbacks before returning
If we don't wait for the callback, the finisher may cleanup the callback
context before the callback is actually invoked, causing a
use-after-free error.

This fixes #3048.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-28 14:21:13 -07:00
Sage Weil
15995ea1c4 Merge branch 'wip-objecter' into next
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-08-27 17:26:13 -07:00
Sage Weil
2a3b7961c0 objecter: fix skipped map handling
If we skip a map, we want to translate NO_ACTION to NEED_RESEND, but leave
POOL_DNE alone.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 17:25:57 -07:00
Sage Weil
8d1efd1b82 objecter: send queued requests when we get first osdmap
If we get our first osdmap and already have requests queued, send them.

Fixes: #3050
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 17:25:57 -07:00
Sage Weil
e59b9daefc objecter: fix is_latest_map() retry on mon session restart
If the mon session drops, we get an EAGAIN callback, which we already
correctly ignored.  (Clean this up and comment so it's clearer what is
going on.)

Fix ms_handle_connect() to resubmit those requests.

Noticed while fixing #3049.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 17:25:57 -07:00
Sage Weil
7d40cba241 monclient: pass EAGAIN to is_latest_map() callers
If our map get_version check needs to be retried, tell the
is_latest_map() callers instead of giving returning 0 ("no").

Fixes: #3049
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 17:25:56 -07:00
Sage Weil
0adc2289d6 monclient: document get_version(), and fix return value
Return -EAGAIN instead of -1, since that's more meaningful, and
document it.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 17:25:54 -07:00
Sage Weil
17ceec0d10 osd: requeue dup ops inline with in-progress ops
We should requeue the dups along with the originals.  This avoids
situations where, after requeue, the dups are reordered with respect to
each other.  For example:

 - client sends A, B, C
 - osd receives A
 - connection drops
 - client sends A', B', C'
 - osd puts A' in waiting_for_ondisk, starts B' and C'
 - on_change() requeues everything

Final queue order (before this patch) is
    A, B', C', A'

After this patch, the resulting queue order is
    A, A', B', C'

Or somewhat more generally, it might be:

    A, A', B, B', B'', C', C'', D'', ....

Fixes (another source of): #2947
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-27 16:47:36 -07:00
Sage Weil
c7d11cd7b8 osd: turn off lockdep during shutdown signal handler
We don't shut down all threads, and the surviving ones fight with
exit()'s teardown.  Kludge until we have a clean shutdown process.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-26 08:42:06 -07:00
Sage Weil
0e091d81a1 v0.51
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQOVjIAAoJEH6/3V0X7TFtdFIQAKyU+6kldJE2YZO5GOO7jPb2
 vGAhsYpvuS/Vx87yrSa7Xavz/C/frKz5m+5SsmxbZl+ditRLCGAD/BlQIuj0UWAW
 MxURFK9hjwJK23fuuuXUXEbMmABmRP8XlzG9IGl5yRM07+IUl8aMGy7+i4yGzGFX
 QVMHC1qMD70SAQ+q2/JVXlVxkVPzqzf9iT+xuFk28V8A0ZLlSAfTuSHD9YLJiWaV
 SjR/vVLpajaTR3ytkSxrG1fwuqENf9OThLXxHuyplZvTUIuAxbxBlWSMJmuLQ3JF
 JNX/N0/z9Omw+ipJAvM/nS6TbT0X2KhMYjObINOVUiDkwC9jBznCl8A1b/hy7wJX
 haTUat6OW3taGysP3AkOddwkyDHLJxz/UoUtPbEgT/mDOB8CwWEdpgkL8wsvNUgK
 n2yEJNpjhQ2QG2LC/n0x67jVlt0B4IRMbijFAoySyklfnJjU8J+Uyjl4bentDvM7
 cQrIIBobQMbc9urcSWzxMd6+fCvxEvtXY027LVP7K3hS3thS2tPRT3WT6vAZ7vih
 foOyc2a9SQxwDWa3bf7d5yoL7nLB9KfRXIbHu31EKgM5pw8Lgy1vRtaqEKOh9Lup
 l8pk5/2ABmy2pYaeLGyTnZN+8BsR5ZYyqJ2nUL/VbSmReto1BIRrI4zhEAAlNWWN
 nKrNwX4xOZjDX9ghsMUv
 =t5SY
 -----END PGP SIGNATURE-----

Merge tag 'v0.51'

v0.51
2012-08-26 08:18:45 -07:00
Sage Weil
c03ca95d23 v0.51 2012-08-25 15:58:39 -07:00
Sage Weil
aa91cf81af mon: require --id
Fixes: #2997
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-25 15:29:56 -07:00
Sage Weil
3996076722 interval_set: predeclare const_iterator
This makes the coverity build happier.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-08-24 14:55:41 -07:00
Sage Weil
ef4ab901b3 Makefile: update coverity rules
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-08-24 14:55:40 -07:00
Gary Lowell
6b1f23cb48 librbd-dev.install: package new rbd/features.h header file. 2012-08-24 15:16:05 -07:00
Sage Weil
e7b8f7ba07 Merge branch 'next' 2012-08-24 14:38:58 -07:00
Sage Weil
d9bd61304b mon: describe how pgs are stuck in 'health detail'
Showing the current state and saying it is stuck doesn't tell you how it
is stuck (e.g. stuck unclean, stuck inactive, etc.).  Also include the
stuck duration.

Fixes: #2876
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-24 14:43:56 -07:00
Sage Weil
bcd4b09ba9 osd: fix use-after-free in handle_notify_timeout
Valgrind turned this up.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-24 13:38:05 -07:00
Gary Lowell
e97f1c575e ceph.spec.in: package new rados library. 2012-08-23 21:35:21 -07:00
Sage Weil
02c6544b35 Merge remote-tracking branch 'gh/wip-mon-report'
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-08-23 16:11:58 -07:00
Sage Weil
ce0fa2d10a Merge remote-tracking branch 'gh/wip_rados_bench_really_final'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-23 16:07:32 -07:00
Mike Ryan
551628e2ae obj_bencher: use async remove during slow remove-by-prefix
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:40 -07:00
Mike Ryan
4bef576543 obj_bencher: remove all benchmark files matching a prefix
This is a fallback for when a user wishes to delete ALL benchmark files
matching a particular prefix. In the fast case, a metadata file tells us
enough to quickly delete the files in parallel. This is the slow case,
where each file's name must be checked against the prefix.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:31 -07:00
Mike Ryan
048c7dc4c8 obj_bencher: cleanup files in parallel using aio
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:27 -07:00
Mike Ryan
9e58d1b79b obj_bencher: remove benchmark objects by prefix
This intelligently removes objects from a rados or rest benchmark run by
using parameters from the metadata file.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:16 -07:00
Mike Ryan
fab73c3edc obj_bencher: store per-benchmark metadata
Store metadata for each benchmark run so that the objects can be
efficiently removed at a later point.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:04 -07:00
Mike Ryan
fb7238eacc obj_bencher: clean up objects after a write benchmark
Per #2477, objects created during rados or rest write benchmark are
automatically cleaned up after the test. They can optionally be left in
place.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:51:39 -07:00
Mike Ryan
4f1b04ca2d obj_bencher: announce prefix during write benchmark
Per #2477 this can be used during a post-benchmark cleanup in rest and
rados bench.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:51:11 -07:00
Gary Lowell
e43ba81fc6 Don't package crush header files. 2012-08-23 15:43:38 -07:00
Gary Lowell
1cd89d1cdd ceph.spec.in: package new rbd header and rados library. 2012-08-23 13:40:18 -07:00
Sage Weil
d47c9af6b2 Merge branch 'wip-msgr' 2012-08-23 13:29:10 -07:00
Sage Weil
e229f8451d msg/Pipe: conditionally detect session reset
Lossless peers (osd<->osd, mds<->mds, mon<->mon) never reset sessions
to each other.  In the osd and mds cases, there is no need to check for
session resets.  More significantly, these checks can trigger with an
unfortunately sequence of socket failures.  In particular,

 - A sends connect request to B
 - B accepts, increments connect_seq, then has a socket failure
   before telling A
 - A reconnects, stil with connect_seq == 0
 - B sees connect_seq == 0 and thinks there was a reset

This warrants a closer look in the fs client <-> mds case, but for now,
in the cluster-internal communications, it is moot, since reset
detection is unnecessary.

In the monitor case: we do need to check with resets because the peers
reuse the same entity_addr_t's (nonce==0), which means that a daemon
restart is effectively a reset.  In that case, use a different policy
that continues to check for resets.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-08-23 13:28:57 -07:00
Sage Weil
1c3111f25b osd: prefer acting osds in calc_acting()
We currently prefer up osds, and then pull sequentially from peer_info
(strays we know about at the time).  This adds an additional preference
for the current acting, which means we can avoid changes to acting when
they are largely useless.

In particular, I observed that we chose [5,3] and later (when recovery
completed) chose [5,1] because we had since heard about an eligible stray
on 1.  That switch was basically a waste...

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-23 13:27:26 -07:00