Commit Graph

21044 Commits

Author SHA1 Message Date
Gary Lowell
6b1f23cb48 librbd-dev.install: package new rbd/features.h header file. 2012-08-24 15:16:05 -07:00
Sage Weil
d9bd61304b mon: describe how pgs are stuck in 'health detail'
Showing the current state and saying it is stuck doesn't tell you how it
is stuck (e.g. stuck unclean, stuck inactive, etc.).  Also include the
stuck duration.

Fixes: #2876
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-24 14:43:56 -07:00
Sage Weil
e7b8f7ba07 Merge branch 'next' 2012-08-24 14:38:58 -07:00
Sage Weil
bcd4b09ba9 osd: fix use-after-free in handle_notify_timeout
Valgrind turned this up.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-24 13:38:05 -07:00
Gary Lowell
e97f1c575e ceph.spec.in: package new rados library. 2012-08-23 21:35:21 -07:00
Sage Weil
02c6544b35 Merge remote-tracking branch 'gh/wip-mon-report'
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-08-23 16:11:58 -07:00
Sage Weil
ce0fa2d10a Merge remote-tracking branch 'gh/wip_rados_bench_really_final'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-23 16:07:32 -07:00
Mike Ryan
551628e2ae obj_bencher: use async remove during slow remove-by-prefix
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:40 -07:00
Mike Ryan
4bef576543 obj_bencher: remove all benchmark files matching a prefix
This is a fallback for when a user wishes to delete ALL benchmark files
matching a particular prefix. In the fast case, a metadata file tells us
enough to quickly delete the files in parallel. This is the slow case,
where each file's name must be checked against the prefix.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:31 -07:00
Mike Ryan
048c7dc4c8 obj_bencher: cleanup files in parallel using aio
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:27 -07:00
Mike Ryan
9e58d1b79b obj_bencher: remove benchmark objects by prefix
This intelligently removes objects from a rados or rest benchmark run by
using parameters from the metadata file.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:16 -07:00
Mike Ryan
fab73c3edc obj_bencher: store per-benchmark metadata
Store metadata for each benchmark run so that the objects can be
efficiently removed at a later point.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:52:04 -07:00
Mike Ryan
fb7238eacc obj_bencher: clean up objects after a write benchmark
Per #2477, objects created during rados or rest write benchmark are
automatically cleaned up after the test. They can optionally be left in
place.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:51:39 -07:00
Mike Ryan
4f1b04ca2d obj_bencher: announce prefix during write benchmark
Per #2477 this can be used during a post-benchmark cleanup in rest and
rados bench.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 15:51:11 -07:00
Gary Lowell
e43ba81fc6 Don't package crush header files. 2012-08-23 15:43:38 -07:00
Gary Lowell
1cd89d1cdd ceph.spec.in: package new rbd header and rados library. 2012-08-23 13:40:18 -07:00
Sage Weil
d47c9af6b2 Merge branch 'wip-msgr' 2012-08-23 13:29:10 -07:00
Sage Weil
e229f8451d msg/Pipe: conditionally detect session reset
Lossless peers (osd<->osd, mds<->mds, mon<->mon) never reset sessions
to each other.  In the osd and mds cases, there is no need to check for
session resets.  More significantly, these checks can trigger with an
unfortunately sequence of socket failures.  In particular,

 - A sends connect request to B
 - B accepts, increments connect_seq, then has a socket failure
   before telling A
 - A reconnects, stil with connect_seq == 0
 - B sees connect_seq == 0 and thinks there was a reset

This warrants a closer look in the fs client <-> mds case, but for now,
in the cluster-internal communications, it is moot, since reset
detection is unnecessary.

In the monitor case: we do need to check with resets because the peers
reuse the same entity_addr_t's (nonce==0), which means that a daemon
restart is effectively a reset.  In that case, use a different policy
that continues to check for resets.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2012-08-23 13:28:57 -07:00
Sage Weil
1c3111f25b osd: prefer acting osds in calc_acting()
We currently prefer up osds, and then pull sequentially from peer_info
(strays we know about at the time).  This adds an additional preference
for the current acting, which means we can avoid changes to acting when
they are largely useless.

In particular, I observed that we chose [5,3] and later (when recovery
completed) chose [5,1] because we had since heard about an eligible stray
on 1.  That switch was basically a waste...

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-23 13:27:26 -07:00
Mike Ryan
af15ba69c5 librados: implement aio_remove
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-08-23 13:11:28 -07:00
Dan Mick
fed8aea662 rbd: force all exiting paths through main()/return
This properly destroys objects.  In the process, remove usage_exit();
also kill error-handling in set_conf_param (never relevant for rbd.cc,
and if you call it with both pointers NULL, well...)
Also switch to EXIT_FAILURE for consistency.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Fixes: #2948
2012-08-23 13:03:00 -07:00
Sage Weil
9f9dfd9c18 Merge branch 'wip-mon-mkfs'
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-08-23 12:59:28 -07:00
Sage Weil
f0e746ab1a mon: name cluster uuid file 'cluster_uuid'
Begin the transition.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-23 12:58:52 -07:00
Sage Weil
cada8a6f02 objecter: use ordered map<> for tracking tids to preserve order on resend
We are using a hash_map<> to map tids to Op*'s.  In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order.  These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.

Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.

Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map().  This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests.  A simpler but slower fix would be to just use map<> instead.

This is one of many bugs contributing to #2947.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-23 11:53:59 -07:00
Gary Lowell
91d5c1958a Don't package crush header files. 2012-08-23 11:48:50 -07:00
Sage Weil
4905c06ff2 mon: create cluster_fsid on startup if not present
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-23 10:06:33 -07:00
Sage Weil
7fde8e90ae mon: create, verify cluster_fsid file in mon_data dir on mkfs
Having this present is convenient for external tools.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-23 10:06:33 -07:00
Sage Weil
5aa9521ad2 Merge remote-tracking branch 'gh/next' 2012-08-22 20:23:02 -07:00
Sage Weil
b207b15f17 cephfs: add 'map' command to dump file mapping onto objects, osds
Closes: #3010
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-22 17:23:10 -07:00
Sage Weil
0f9f63ab4b perf-watch: initial version
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-22 17:22:58 -07:00
Sage Weil
1113a6c567 objecter: use ordered map<> for tracking tids to preserve order on resend
We are using a hash_map<> to map tids to Op*'s.  In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order.  These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.

Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.

Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map().  This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests.  A simpler but slower fix would be to just use map<> instead.

This is one of many bugs contributing to #2947.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-22 10:51:57 -07:00
Tommi Virtanen
a5901c6d6c doc: Either use a backslash and a newline, or neither.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
2012-08-22 10:50:22 -07:00
Sage Weil
59dbf5998c Merge remote-tracking branch 'gh/wip-crypto' 2012-08-21 15:47:57 -07:00
Yehuda Sadeh
ec90d3f5d9 cls_rgw: add gc commands handling
add the various functionality required for the gc: set entry,
defer entry, list

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-21 15:39:12 -07:00
Yehuda Sadeh
e4a78d2aae config_opts: add gc configurables
rgw_gc_max_objs: num of objects to used for gc shards
rgw_gc_obj_min_wait: min time for an object to become visible to gc
rgw_gc_processor_max_time: max time a for a single gc processor cycle
rgw_gc_processor_period: period between processors start

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-21 15:33:36 -07:00
Yehuda Sadeh
7dd5d06d0d cls_lock: specify librados namespace explicitly
librados namespace was not specified, hence required including
source files to add using namespace. This fixes it.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-21 15:33:36 -07:00
Yehuda Sadeh
eda5a76f94 cls_rgw: cleanups
move stuff to cls/rgw, create needed helpers.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-21 15:33:32 -07:00
Sage Weil
e7c492b182 mon: implement 'ceph report <tag ...>' command
Generate a simple "signed" report of the current cluster status.  Include
a simple crc so that the report is vaguely verifiable.

This is part of #2829.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-21 14:22:20 -07:00
Sage Weil
8f95c1fa8f config: remove dead osd options
The read balancing/shedding stuff is old.  Same goes for class timeouts and
the raid options.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-21 13:24:55 -07:00
Dan Mick
bfb24a7059 Fix compilation warnings on squeeze; can't printf() snapid_t directly 2012-08-21 11:57:38 -07:00
Sage Weil
bb1e65eb45 rgw: use sizeof() for snprintf
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-21 11:01:11 -07:00
Sage Weil
9883da6960 Merge branch 'next' 2012-08-21 10:51:54 -07:00
Sage Weil
4a0704e64a osd: fix requeue order for waiting_for_ondisk
We are calling requeue_ops() on each individual op, which means we need
to requeue in reverse order (newest first, oldest last).

Fixes: #2947
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-08-21 10:48:48 -07:00
Yehuda Sadeh
1a09423e27 rgw: dump content_range using 64 bit formatters
Fixes: #2961
Also make sure that size is 64 bit.

backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-21 10:48:48 -07:00
Sage Weil
ddbef4750c Revert "rgw: dump content_range using 64 bit formatters"
This reverts commit cc435e9980.

Wrong fix; fcgi doesn't do %lld
2012-08-21 10:48:48 -07:00
Sage Weil
2e8689a499 mon: fix monitor cluster contraction race
If we contract to 1 monitor, we win_standalone_election() without bumping
the election epoch.  Racing paxos updates can then reach us without being
ignored and trigger an assert:

mon/Paxos.cc: In function 'void Paxos::handle_accept(MMonPaxos*)' thread 7f85eae05700 time 2012-08-20 16:01:00.843937
mon/Paxos.cc: 468: FAILED assert(state == STATE_UPDATING)

Fixes: #3003
Reported-by: John Wilkins <john.wilkins@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-21 09:03:33 -07:00
Dan Mick
81694c39d0 Add manpage sections for flatten, snap {un}protect
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: John Wilkins <john.wilkins@inktank.com>
2012-08-20 18:14:29 -07:00
Tommi Virtanen
6a9bcc09a3 mkcephfs, init-ceph: Warn if hostname "localhost" is seen in ceph.conf.
Given a ceph.conf that looks like

  [osd.42]
  host = localhost

mkcephfs used to exit with an obscure error message:

  cat: /tmp/mkcephfs.MCBIHvn4Ru/key.*: No such file or directory

"localhost" was never intended to be a valid hostname to use there.
Warn if we see it, and skip the entry. You should use the proper short
hostname of the box.

As init-ceph and mkcephfs share this library, this change affects the
sysvinit scripts too. The behavior *shouldn't* change there (localhost
entries were ignored earlier, too), but you may see this extra
warning. Which is good.

Closes: #3001
Signed-off-by: Tommi Virtanen <tv@inktank.com>
2012-08-20 17:16:41 -07:00
tamil
5ad013b575 "Removed 274 from xfstests"
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2012-08-20 16:53:18 -07:00
Dan Mick
5642a5ee2b test_rbd.py: remove clone before image it depends on
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2012-08-20 16:27:49 -07:00