Commit Graph

27103 Commits

Author SHA1 Message Date
Sage Weil
d97f31424e mon/PaxosService: simplify paxos_service_trim_min check
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:09:46 -07:00
Sage Weil
da248a9e1d mon: make service trim_to stateless
Call get_trim_to() when we need to know how much to trim (if any), and
calculate it then.  No need to keep this in a hidden trim_version
variable and remember to update it.  This drops several helpers and
accessors and makes get_trim_to() a single method that services need to
override.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:09:46 -07:00
Sage Weil
886b637b64 mon/PaxosService: pass trim target into encode_trim()
This will help us in a few patches...

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-09 11:09:44 -07:00
Sage Weil
63fe8635ae mon/PaxosService: unwind should_trim()
Inline the single-caller helper.  This will help us in a moment...

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:44:05 -07:00
Sage Weil
d600dc9321 mon/PaxosService: unwind service_should_trim() helper
Nobody overloads it; put it inline in should_trim().

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:41:55 -07:00
Sage Weil
6aa023048a mon/MDSMonitor: remove unnecessary service_should_trim()
We never set_trim_to(), so this is unnecessary.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:41:34 -07:00
Sage Weil
b71a00966c mon/OSDMonitor: remove dup service_should_trim() implementation
This matches the parent.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:40:36 -07:00
Sage Weil
39b71c5826 mon/PaxosService: trim periodically instead of via propose_pending
We want to trim old states even if there is no update activity.  For
example, if a long-running rebalance finishes all osdmap updates will
stop and we won't trim out old maps to free space.

Instead, trim at the same time as tick().  Remove the trim during
propose_pending() to force all trims through this path and avoid
introducing a new and rarely-exercised behavior.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-08 21:38:11 -07:00
Sage Weil
2f8ff2de17 mon/PaxosService: reorder definitions
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:33:37 -07:00
Sage Weil
50ffe324e3 mon/PaxosService: uninline should_trim()
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 21:33:22 -07:00
John Wilkins
f1b4398dd2 Merge branch 'master' of https://github.com/ceph/ceph 2013-07-08 18:11:57 -07:00
John Wilkins
5edc1ff7ea doc: Added Ceph Object Storage installation instructions for CentOS/RHEL 6.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-07-08 18:11:25 -07:00
Sage Weil
43fa7aabf1 mon/OSDMonitor: fix base case for loading full osdmap
Right after cluster creation, first_committed is 1 and latest stashed in 0,
but we don't have the initial full map yet.  Thereafter, we do (because we
write it with trim).  Fixes afd6c7d824.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-08 17:47:11 -07:00
Samuel Just
0e93dd93e5 Merge branch 'wip-small-object-recovery'
Conflicts:
	src/include/ceph_features.h

Reviewed-by: Sage Weil <sage@inktank.com>
Fixes: #5278
2013-07-08 16:53:17 -07:00
Samuel Just
ad65de40ff ReplicatedPG: send compound messages to enlightened peers
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:32 -07:00
Samuel Just
ae1b2e97f5 ReplicatedPG: add handlers for MOSDPG(Push|Pull|PushReply)
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:32 -07:00
Samuel Just
c0bd831ace OSD: add handlers for MOSDPG(Push|PushReply|Pull)
MOSDPG(Push|PushReply|Pull|SubOp|SubOpReply) need the
same thing checked prior to queueing the op, so they
share a templated handler.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
264dbf3f9e messages/,osd_types: add messages for Push, PushReply, Pull
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
c56f16d4dc ReplicatedPG: split handle_pull out of sub_op_pull
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
175c0777ed ReplicatedPG: split handle_push_reply out of sub_op_push_reply
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
54e5f6423a ReplicatedPG: send pulls en masse in recover_primary
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
c41d4dc4bb ReplicatedPG: send pushes en mass in recover_replicas, recover_backfill
This way, the pushes might be later merged into a smaller number of
messages.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
eec86b8d3c OSD: convert handle_push to use PushOp
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:31 -07:00
Samuel Just
a4984328be ReplicatedPG: pass a PushOp into handle_pull_response
This is the first step toward packaging multiple
pushes/pulls into a single message.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:30 -07:00
Samuel Just
82cb922e89 ReplicatedPG: split send_push into build_push_op and send_push_op
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:30 -07:00
Samuel Just
31e19a64b0 ReplicatedPG: _committed_pushed_object don't pass op
Add a separate callback to handle marking the event and
the stats.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:30 -07:00
Samuel Just
0f51b60cba ReplicatedPG: submit_push_data must take recovery_info as non-const
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-07-08 16:43:30 -07:00
Sage Weil
a9906641a1 mon: implement simple 'scrub' command
Compare all keys within the sync'ed prefixes across members of the quorum
and compare the key counts and CRC for inconsistencies.

Currently this is a one-shot inefficient hammer.  We'll want to make this
work in chunks before it is usable in production environments.

Protect with a feature bit to avoid sending MMonScrub to mons who can't
decode it.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-07-08 15:34:32 -07:00
Sage Weil
afd6c7d824 mon: fix osdmap stash, trim to retain complete history of full maps
The current interaction between sync and stashing full osdmaps only on
active mons means that a sync can result in an incomplete osdmap_full
history:

 - mon.c starts a full sync
 - during sync, active osdmap service should_stash_full() is true and
   includes a full in the txn
 - mon.c sync finishes
 - mon.c update_from_paxos gets "latest" stashed that it got from the
   paxos txn
 - mon.c does *not* walk to previous inc maps to complete it's collection
   of full maps.

To fix this, we disable the periodic/random stash of full maps by the
osdmap service.

This introduces a new problem: we must have at least one full map (the first
one) in order for a mon that just synced to build it's full collection.
Extend the encode_trim() process to allow the osdmap service to include
the oldest full map with the trim txn.  This is more complex than just
writing the full maps in the txn, but cheaper--we only write the full
map at trim time.

This *might* be related to previous bugs where the full osdmap was
missing, or case where leveldb keys seemed to 'disappear'.

Fixes: #5512
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-07-08 15:04:59 -07:00
Yehuda Sadeh
9f8bfb4b22 Merge pull request #397 from kri5/wip-5478
rgw: Add explicit messages in radosgw init script

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-07-08 12:23:36 -07:00
Sage Weil
e9d19b38c8 common/crc32c: skip cpu detection incantation on not x86_64
On i386 this fails to build with

common/crc32c-intel.c: In function 'ceph_have_crc32c_intel':
error: common/crc32c-intel.c:79:9: PIC register clobbered by 'ebx' in 'asm'

ARM had more to complain about.

Not sure where this test came from, but it is clearly not meant for
anything other than x86_64.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-07-08 10:54:53 -07:00
athanatos
0471b719c2 Merge pull request #407 from dachary/wip-5487
unit tests for ObjectContext read/write locks

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-07-08 10:44:43 -07:00
Sage Weil
956fafc7f2 qa/workunits/rbd/simple_big.sh: don't ENOSPC every time
Set the count on the initial dd so we don't always ENOSPC.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 10:14:08 -07:00
Sage Weil
d423cf8c4f qa/workunits/rbd/kernel.sh: move modprobe up
Needs to happen before cleanup.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 09:58:16 -07:00
Sage Weil
672f51be3a qa/workunits/fs/test_o_trunc.sh: fix .sh to match new bin location
To match 83f308962c.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-08 09:56:29 -07:00
Loic Dachary
7b7f752c69 unit tests for ObjectContext read/write locks
unit tests for the ObjectContext methods ondisk_write_lock,
ondisk_write_unlock, ondisk_read_lock and ondisk_read_unlock.

A class derived from ::testing::Test is created with two sub-classes (
Thread_read_lock & Thread_write_lock ) to provide a separate thread
that can block with cond.Wait(). usleep(3) is used in the main thread
to wait for the expected side effect with increasing delays ( up to
MAX_DELAY ).

http://tracker.ceph.com/issues/5487 refs #5487

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-07-08 16:45:12 +02:00
Sage Weil
8bc50626c5 Merge branch 'next' 2013-07-07 21:20:34 -07:00
Sage Weil
85a1d6cc5d mon: remove bad assert about monmap version
It is possible to start a sync when our newest monmap is 0.  Usually we see
e0 from probe, but that isn't always published as part of the very first
paxos transaction due to the way PaxosService::_active generates it's
first initial commit.

In any case, having e0 here is harmless.

Fixes: #5509
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-07 21:19:41 -07:00
Sage Weil
3f5a96236b qa: write a somewhat <1tb image
1TB is enough to fill up 6 plana osds.  And it takes forever.  Write less.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-05 11:24:06 -07:00
Sage Weil
54aa797acd qa/workunits/rbd/kernel.sh: modprobe rbd
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-05 11:20:43 -07:00
Sage Weil
83f308962c qa: move test_o_trunc.sh into fs dir
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-05 11:17:29 -07:00
Sage Weil
507a4ec87b qa: move fs test binary into workunits dir so teuthology can build it
Teuthology does a make in the workunits dir, so move this in there.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-05 11:16:08 -07:00
Sage Weil
a84e6d1824 mds/MDSTable: gracefully suicide on EBLACKLIST
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-07-05 11:04:37 -07:00
Christophe Courtaut
8b4cb8f372 rgw: Add explicit messages in radosgw init script
http://tracker.ceph.com/issues/5478 fixes #5478

Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
2013-07-05 14:41:04 +02:00
Sage Weil
22227cd1c1 qa: add O_TRUNC test
From: Yan, Zheng <yan.zheng@intel.com>

Simple reproducer for #5453, modified to run for a finite number of
iterations.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-03 21:53:48 -07:00
Sage Weil
71ebfe7e1a mon/Paxos: make 'paxos trim disabled max versions' much much larger
108000 is about 3 hours if paxos is going full-bore (1 proposal/second).
That ought to be pretty safe.  Otherwise, we start trimming to soon and a
slow sync will just have to restart when it finishes.

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-07-03 16:56:06 -07:00
Sage Weil
ab93696e30 mon: be less chatty about discarding messages
Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-03 16:23:56 -07:00
Sage Weil
e8b42a6998 osd/OSDMap: handle case where some new osds have hb_front and others don't
Do not assume that because at least one OSD has an hb_front addr that they
all do, or else we will end up assigning garbage here and later thinking
it is a addr (or, more precisely, != entity_addr_t()).

Fixes: #5460
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-07-03 15:37:16 -07:00
Sage Weil
81343f1df4 osd: clear hb_front if it was previously non-NULL and is now NULL
If we have a real addr for hb_front for a given osd and then a new map
has the osd coming up without an hb_front, we need to clear the addr
field.

Also, improve the debug output in add_heartbeat_peer() so we can tell if
we have no connection or a connection to a blank addr.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-07-03 15:37:05 -07:00
John Wilkins
e960e1bb6a Merge branch 'master' of https://github.com/ceph/ceph 2013-07-03 15:27:26 -07:00