Commit Graph

27427 Commits

Author SHA1 Message Date
Sage Weil
aa7448cd17 Merge pull request #415 from ceph/rgw-next 2013-07-09 15:34:05 -07:00
Sage Weil
00ae543b3e mon: do not scrub if scrub is in progress
This prevents an assert from unexpected scrub results from the previous
scrub on the leader.

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 14:12:15 -07:00
Sage Weil
8638fb64b1 unittest_pglog: fix unittest
This was broken by the pg_stat_t::reported cleanup.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 14:12:07 -07:00
Sage Weil
ae866426ca Merge branch 'wip-mon-osdmap-trim'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-07-09 13:43:25 -07:00
Sage Weil
8e075146f9 osd: change pg_stat_t::reported from eversion_t to a pair of fields
This rarely represents an actual eversion_t as the epoch and seq values are
bumped semi-independently to ensure it is always unique.  Break it into
two separate fields to avoid confusion.

Drop now-unused and slightly curious inc() method.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 13:42:34 -07:00
Sage Weil
cc0006deee mon: be smarter about calculating last_epoch_clean lower bound
We need to take PGs whose mapping has not changed in a long time into
account.  For them, the pg state will indicate it was clean at the time of
the report, in which case we can use that as a lower-bound on their actual
latest epoch clean.  If they are not currently clean (at report time), use
the last_epoch_clean value.

Fixes: #5519
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 13:42:34 -07:00
Sage Weil
da81228cc7 osd: report pg stats to mon at least every N (=500) epochs
The mon needs a moderately accurate last_epoch_clean value in order to trim
old osdmaps.  To prevent a PG that hasn't peered or received IO in forever
from preventing this, send pg stats at some minimum frequency.  This will
increase the pg stat report workload for the mon over an idle pool, but
should be no worse that a cluster that is getting actual IO and sees these
updates from normal stat updates.

This makes the reported update a bit more aggressive/useful in that the epoch
is the last map epoch processed by this PG and not just one that is >= the
currenting interval.  Note that the semantics of this field are pretty useless
at this point.

See #5519

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 13:42:34 -07:00
Sage Weil
449283f802 mon/OSDMonitor: allow osdmap trimming to be forced via a config option
In certain cases the admin may know that it is safe to trim old osdmaps but
a bug or other issue is preventing the Monitor from deciding on its own.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 13:42:34 -07:00
Sage Weil
18a624fd8b mon/OSDMonitor: make 'osd crush rm ...' slightly more idempotent
This particular failure is easily triggered by the crush_ops.sh
workunit.  Make it a bit less likely to fail.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-09 12:24:19 -07:00
Sage Weil
c5157dde9f doc/release-notes: v0.66
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:45:34 -07:00
Sage Weil
8bdd86a12d Merge branch 'wip-mon-trim'
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-09 11:10:30 -07:00
Sage Weil
8799872d0e mon/PaxosService: update docs a bit
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:09:46 -07:00
Sage Weil
44db2ac548 mon/PaxosService: inline trim()
This is now trivial; pull it into the caller.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:09:46 -07:00
Sage Weil
cab8eeecd1 mon/PaxosService: move paxos_service_trim_max into caller, clean up
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:09:46 -07:00
Sage Weil
d97f31424e mon/PaxosService: simplify paxos_service_trim_min check
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:09:46 -07:00
Sage Weil
da248a9e1d mon: make service trim_to stateless
Call get_trim_to() when we need to know how much to trim (if any), and
calculate it then.  No need to keep this in a hidden trim_version
variable and remember to update it.  This drops several helpers and
accessors and makes get_trim_to() a single method that services need to
override.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:09:46 -07:00
Sage Weil
886b637b64 mon/PaxosService: pass trim target into encode_trim()
This will help us in a few patches...

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:09:44 -07:00
Yehuda Sadeh
5faa4ac1e5 rgw: warn on the lack of curl_multi_wait()
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-09 09:19:16 -07:00
Yehuda Sadeh
76e79266a0 rgw: fix args parsing
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-09 00:35:00 -07:00
Yehuda Sadeh
395262cd82 rgw: call appropriate curl calls for waiting on sockets
If libcurl supports curl_multi_wait() then use it, otherwise
use select() and force a timeout, even if it has been disabled.
Otherwise we may wait forever for events that we can't wait for
as select() only uses fds < 1024.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-09 00:04:01 -07:00
Yehuda Sadeh
73c2a3dcd3 configure.ac: detect whether libcurl supports curl_multi_wait()
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-08 23:43:00 -07:00
Gary Lowell
d4ca36b317 Merge branch 'next' 2013-07-08 23:17:50 -07:00
Sage Weil
8ff62ae42e Merge remote-tracking branch 'gh/next' 2013-07-08 22:17:51 -07:00
Sage Weil
d08b6d6df7 mon/PaxosService: prevent reads until initial service commit is done
Do not process reads (or, by PaxosService::dispatch() implication, writes)
until we have committed the initial service state.  This avoids things like
EPERM due to missing keys when we race with mon creation, triggered by
teuthology tests doing their health check after startup.

Fixes: #5515
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-08 22:17:41 -07:00
Sage Weil
63fe8635ae mon/PaxosService: unwind should_trim()
Inline the single-caller helper.  This will help us in a moment...

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:44:05 -07:00
Sage Weil
d600dc9321 mon/PaxosService: unwind service_should_trim() helper
Nobody overloads it; put it inline in should_trim().

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:41:55 -07:00
Sage Weil
6aa023048a mon/MDSMonitor: remove unnecessary service_should_trim()
We never set_trim_to(), so this is unnecessary.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:41:34 -07:00
Sage Weil
b71a00966c mon/OSDMonitor: remove dup service_should_trim() implementation
This matches the parent.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:40:36 -07:00
Sage Weil
39b71c5826 mon/PaxosService: trim periodically instead of via propose_pending
We want to trim old states even if there is no update activity.  For
example, if a long-running rebalance finishes all osdmap updates will
stop and we won't trim out old maps to free space.

Instead, trim at the same time as tick().  Remove the trim during
propose_pending() to force all trims through this path and avoid
introducing a new and rarely-exercised behavior.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-08 21:38:11 -07:00
Sage Weil
2f8ff2de17 mon/PaxosService: reorder definitions
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:33:37 -07:00
Sage Weil
50ffe324e3 mon/PaxosService: uninline should_trim()
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:33:22 -07:00
John Wilkins
f1b4398dd2 Merge branch 'master' of https://github.com/ceph/ceph 2013-07-08 18:11:57 -07:00
John Wilkins
5edc1ff7ea doc: Added Ceph Object Storage installation instructions for CentOS/RHEL 6.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-07-08 18:11:25 -07:00
Sage Weil
43fa7aabf1 mon/OSDMonitor: fix base case for loading full osdmap
Right after cluster creation, first_committed is 1 and latest stashed in 0,
but we don't have the initial full map yet.  Thereafter, we do (because we
write it with trim).  Fixes afd6c7d824.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-08 17:47:11 -07:00
Samuel Just
0e93dd93e5 Merge branch 'wip-small-object-recovery'
Conflicts:
	src/include/ceph_features.h

Reviewed-by: Sage Weil <sage@inktank.com>
Fixes: #5278
2013-07-08 16:53:17 -07:00
Samuel Just
ad65de40ff ReplicatedPG: send compound messages to enlightened peers
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:32 -07:00
Samuel Just
ae1b2e97f5 ReplicatedPG: add handlers for MOSDPG(Push|Pull|PushReply)
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:32 -07:00
Samuel Just
c0bd831ace OSD: add handlers for MOSDPG(Push|PushReply|Pull)
MOSDPG(Push|PushReply|Pull|SubOp|SubOpReply) need the
same thing checked prior to queueing the op, so they
share a templated handler.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
264dbf3f9e messages/,osd_types: add messages for Push, PushReply, Pull
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
c56f16d4dc ReplicatedPG: split handle_pull out of sub_op_pull
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
175c0777ed ReplicatedPG: split handle_push_reply out of sub_op_push_reply
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
54e5f6423a ReplicatedPG: send pulls en masse in recover_primary
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
c41d4dc4bb ReplicatedPG: send pushes en mass in recover_replicas, recover_backfill
This way, the pushes might be later merged into a smaller number of
messages.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
eec86b8d3c OSD: convert handle_push to use PushOp
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
a4984328be ReplicatedPG: pass a PushOp into handle_pull_response
This is the first step toward packaging multiple
pushes/pulls into a single message.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:30 -07:00
Samuel Just
82cb922e89 ReplicatedPG: split send_push into build_push_op and send_push_op
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:30 -07:00
Samuel Just
31e19a64b0 ReplicatedPG: _committed_pushed_object don't pass op
Add a separate callback to handle marking the event and
the stats.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:30 -07:00
Samuel Just
0f51b60cba ReplicatedPG: submit_push_data must take recovery_info as non-const
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:30 -07:00
Gary Lowell
b6b48dbefa v0.66 2013-07-08 15:45:00 -07:00
Sage Weil
a9906641a1 mon: implement simple 'scrub' command
Compare all keys within the sync'ed prefixes across members of the quorum
and compare the key counts and CRC for inconsistencies.

Currently this is a one-shot inefficient hammer.  We'll want to make this
work in chunks before it is usable in production environments.

Protect with a feature bit to avoid sending MMonScrub to mons who can't
decode it.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-07-08 15:34:32 -07:00