Commit Graph

21068 Commits

Author SHA1 Message Date
Sage Weil
7d40cba241 monclient: pass EAGAIN to is_latest_map() callers
If our map get_version check needs to be retried, tell the
is_latest_map() callers instead of giving returning 0 ("no").

Fixes: #3049
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 17:25:56 -07:00
Sage Weil
0adc2289d6 monclient: document get_version(), and fix return value
Return -EAGAIN instead of -1, since that's more meaningful, and
document it.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 17:25:54 -07:00
caleb miles
0a1f4a97da Implement multi-object delete.
An implimentation of multi-object delete described in
the latest Amazon S3 API provied at

http://docs.amazonwebservices.com/AmazonS3/latest/API

This commit is in response to tracker issue 2797

http://tracker.newdream.net/issues/2797

Signed-off-by: caleb miles <caleb.miles@inktank.com>
2012-08-27 17:08:44 -07:00
Sage Weil
17ceec0d10 osd: requeue dup ops inline with in-progress ops
We should requeue the dups along with the originals.  This avoids
situations where, after requeue, the dups are reordered with respect to
each other.  For example:

 - client sends A, B, C
 - osd receives A
 - connection drops
 - client sends A', B', C'
 - osd puts A' in waiting_for_ondisk, starts B' and C'
 - on_change() requeues everything

Final queue order (before this patch) is
    A, B', C', A'

After this patch, the resulting queue order is
    A, A', B', C'

Or somewhat more generally, it might be:

    A, A', B, B', B'', C', C'', D'', ....

Fixes (another source of): #2947
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-27 16:47:36 -07:00
Sage Weil
c7054933e0 Merge remote-tracking branch 'gh/wip-mon-intparsing'
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-08-27 15:10:35 -07:00
Sage Weil
d5cacaca50 osd: include notif pointer in notify debug output
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 14:57:39 -07:00
Sage Weil
0a2ec988f1 config: add 'fatal signal handlers' option
This will let us disable the sighandlers for SEGV, etc.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 14:57:35 -07:00
Eleanor Cawthon
bc90c9aaed test/: renamed omap_bench.hpp to .h, fixed histogram formatting
Signed-off-by: Eleanor Cawthon <eleanor.cawthon@inktank.com>
2012-08-27 14:55:45 -07:00
Samuel Just
aaeb55194a librados,ReplicatedPG: add omap_cmp
Allows atomic checking of omap values.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-08-27 14:55:45 -07:00
Sage Weil
7a631f9476 cls_rgw_client: fix #include path
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 14:41:09 -07:00
Yehuda Sadeh
fa74e0476c Merge remote-tracking branch 'origin/master' into wip-gc2 2012-08-27 12:43:27 -07:00
Yehuda Sadeh
6f68ff5cca cls_rgw: add cls_rgw unitest, test gc api
Test various cls_rgw gc related functionality.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-27 12:42:43 -07:00
Yehuda Sadeh
a30f7140f0 rgw-admin: get rid of lazy remove option, other fixes
was mishandling parsing of binary flag arguments.
also, fix argument parsing and update radosgw-admin
cli test reference.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-27 12:41:27 -07:00
Yehuda Sadeh
721a6bef9e rgw: implement garbage collector
Add a garbage collector thread that is responsible for clean
up of clutter. When removing an object, store info about the
leftovers in a special gc map (via rgw objclass). A new
radosgw-admin commands to list objects in gc, and to run the
gc process manually. Also, gc processors can run in parallel,
however, each will handle a single gc shard (synchronized
using lock objclass).

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-27 12:41:19 -07:00
Sage Weil
bd534bf328 mon: make parse_pos_long() error message more helpful
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-27 08:36:41 -07:00
Sage Weil
c7d11cd7b8 osd: turn off lockdep during shutdown signal handler
We don't shut down all threads, and the surviving ones fight with
exit()'s teardown.  Kludge until we have a clean shutdown process.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-26 08:42:06 -07:00
Sage Weil
0e091d81a1 v0.51
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQOVjIAAoJEH6/3V0X7TFtdFIQAKyU+6kldJE2YZO5GOO7jPb2
 vGAhsYpvuS/Vx87yrSa7Xavz/C/frKz5m+5SsmxbZl+ditRLCGAD/BlQIuj0UWAW
 MxURFK9hjwJK23fuuuXUXEbMmABmRP8XlzG9IGl5yRM07+IUl8aMGy7+i4yGzGFX
 QVMHC1qMD70SAQ+q2/JVXlVxkVPzqzf9iT+xuFk28V8A0ZLlSAfTuSHD9YLJiWaV
 SjR/vVLpajaTR3ytkSxrG1fwuqENf9OThLXxHuyplZvTUIuAxbxBlWSMJmuLQ3JF
 JNX/N0/z9Omw+ipJAvM/nS6TbT0X2KhMYjObINOVUiDkwC9jBznCl8A1b/hy7wJX
 haTUat6OW3taGysP3AkOddwkyDHLJxz/UoUtPbEgT/mDOB8CwWEdpgkL8wsvNUgK
 n2yEJNpjhQ2QG2LC/n0x67jVlt0B4IRMbijFAoySyklfnJjU8J+Uyjl4bentDvM7
 cQrIIBobQMbc9urcSWzxMd6+fCvxEvtXY027LVP7K3hS3thS2tPRT3WT6vAZ7vih
 foOyc2a9SQxwDWa3bf7d5yoL7nLB9KfRXIbHu31EKgM5pw8Lgy1vRtaqEKOh9Lup
 l8pk5/2ABmy2pYaeLGyTnZN+8BsR5ZYyqJ2nUL/VbSmReto1BIRrI4zhEAAlNWWN
 nKrNwX4xOZjDX9ghsMUv
 =t5SY
 -----END PGP SIGNATURE-----

Merge tag 'v0.51'

v0.51
2012-08-26 08:18:45 -07:00
Sage Weil
c03ca95d23 v0.51 2012-08-25 15:58:39 -07:00
Sage Weil
aa91cf81af mon: require --id
Fixes: #2997
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-25 15:29:56 -07:00
Sage Weil
5fd2f10266 mon: fix int parsing in monmon
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-24 16:05:07 -07:00
Sage Weil
31c8ccb849 mon: check for int parsing errors in mdsmon
Fixes: #3014
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-24 16:03:02 -07:00
Sage Weil
304c08efbe mon: check for int parsing errors in osdmon
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-24 16:02:02 -07:00
Sage Weil
3996076722 interval_set: predeclare const_iterator
This makes the coverity build happier.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-08-24 14:55:41 -07:00
Sage Weil
ef4ab901b3 Makefile: update coverity rules
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-08-24 14:55:40 -07:00
Gary Lowell
6b1f23cb48 librbd-dev.install: package new rbd/features.h header file. 2012-08-24 15:16:05 -07:00
Sage Weil
d9bd61304b mon: describe how pgs are stuck in 'health detail'
Showing the current state and saying it is stuck doesn't tell you how it
is stuck (e.g. stuck unclean, stuck inactive, etc.).  Also include the
stuck duration.

Fixes: #2876
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-24 14:43:56 -07:00
Sage Weil
e7b8f7ba07 Merge branch 'next' 2012-08-24 14:38:58 -07:00
Sage Weil
bcd4b09ba9 osd: fix use-after-free in handle_notify_timeout
Valgrind turned this up.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-24 13:38:05 -07:00
Gary Lowell
e97f1c575e ceph.spec.in: package new rados library. 2012-08-23 21:35:21 -07:00
Sage Weil
02c6544b35 Merge remote-tracking branch 'gh/wip-mon-report'
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-08-23 16:11:58 -07:00
Sage Weil
ce0fa2d10a Merge remote-tracking branch 'gh/wip_rados_bench_really_final'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-23 16:07:32 -07:00
Mike Ryan
551628e2ae obj_bencher: use async remove during slow remove-by-prefix
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:40 -07:00
Mike Ryan
4bef576543 obj_bencher: remove all benchmark files matching a prefix
This is a fallback for when a user wishes to delete ALL benchmark files
matching a particular prefix. In the fast case, a metadata file tells us
enough to quickly delete the files in parallel. This is the slow case,
where each file's name must be checked against the prefix.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:31 -07:00
Mike Ryan
048c7dc4c8 obj_bencher: cleanup files in parallel using aio
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:27 -07:00
Mike Ryan
9e58d1b79b obj_bencher: remove benchmark objects by prefix
This intelligently removes objects from a rados or rest benchmark run by
using parameters from the metadata file.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:16 -07:00
Mike Ryan
fab73c3edc obj_bencher: store per-benchmark metadata
Store metadata for each benchmark run so that the objects can be
efficiently removed at a later point.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:04 -07:00
Mike Ryan
fb7238eacc obj_bencher: clean up objects after a write benchmark
Per #2477, objects created during rados or rest write benchmark are
automatically cleaned up after the test. They can optionally be left in
place.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:51:39 -07:00
Mike Ryan
4f1b04ca2d obj_bencher: announce prefix during write benchmark
Per #2477 this can be used during a post-benchmark cleanup in rest and
rados bench.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:51:11 -07:00
Gary Lowell
e43ba81fc6 Don't package crush header files. 2012-08-23 15:43:38 -07:00
Gary Lowell
1cd89d1cdd ceph.spec.in: package new rbd header and rados library. 2012-08-23 13:40:18 -07:00
Sage Weil
d47c9af6b2 Merge branch 'wip-msgr' 2012-08-23 13:29:10 -07:00
Sage Weil
e229f8451d msg/Pipe: conditionally detect session reset
Lossless peers (osd<->osd, mds<->mds, mon<->mon) never reset sessions
to each other.  In the osd and mds cases, there is no need to check for
session resets.  More significantly, these checks can trigger with an
unfortunately sequence of socket failures.  In particular,

 - A sends connect request to B
 - B accepts, increments connect_seq, then has a socket failure
   before telling A
 - A reconnects, stil with connect_seq == 0
 - B sees connect_seq == 0 and thinks there was a reset

This warrants a closer look in the fs client <-> mds case, but for now,
in the cluster-internal communications, it is moot, since reset
detection is unnecessary.

In the monitor case: we do need to check with resets because the peers
reuse the same entity_addr_t's (nonce==0), which means that a daemon
restart is effectively a reset.  In that case, use a different policy
that continues to check for resets.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-08-23 13:28:57 -07:00
Sage Weil
1c3111f25b osd: prefer acting osds in calc_acting()
We currently prefer up osds, and then pull sequentially from peer_info
(strays we know about at the time).  This adds an additional preference
for the current acting, which means we can avoid changes to acting when
they are largely useless.

In particular, I observed that we chose [5,3] and later (when recovery
completed) chose [5,1] because we had since heard about an eligible stray
on 1.  That switch was basically a waste...

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-23 13:27:26 -07:00
Mike Ryan
af15ba69c5 librados: implement aio_remove
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 13:11:28 -07:00
Dan Mick
fed8aea662 rbd: force all exiting paths through main()/return
This properly destroys objects.  In the process, remove usage_exit();
also kill error-handling in set_conf_param (never relevant for rbd.cc,
and if you call it with both pointers NULL, well...)
Also switch to EXIT_FAILURE for consistency.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Fixes: #2948
2012-08-23 13:03:00 -07:00
Sage Weil
9f9dfd9c18 Merge branch 'wip-mon-mkfs'
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-08-23 12:59:28 -07:00
Sage Weil
f0e746ab1a mon: name cluster uuid file 'cluster_uuid'
Begin the transition.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-23 12:58:52 -07:00
Sage Weil
cada8a6f02 objecter: use ordered map<> for tracking tids to preserve order on resend
We are using a hash_map<> to map tids to Op*'s.  In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order.  These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.

Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.

Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map().  This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests.  A simpler but slower fix would be to just use map<> instead.

This is one of many bugs contributing to #2947.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-23 11:53:59 -07:00
Gary Lowell
91d5c1958a Don't package crush header files. 2012-08-23 11:48:50 -07:00
Sage Weil
4905c06ff2 mon: create cluster_fsid on startup if not present
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-23 10:06:33 -07:00