Commit Graph

15086 Commits

Author SHA1 Message Date
Sage Weil
bc4e78ddfa mds: use new tmap_get pbl argument
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-14 13:41:29 -08:00
Sage Weil
dd32285816 librados: need prval for tmap_get
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-14 13:39:46 -08:00
Samuel Just
7842bf1246 librados: add aio_operate for reads and tmap_get for ObjectWriteOp
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-14 13:37:08 -08:00
Sage Weil
704509637f osd: remove unused need_size
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 13:35:04 -08:00
Samuel Just
34145d5dd2 Merge branch 'wip_push_refactor'
Reviewed-by: Sage Weil <sage@newdream.net>
2012-02-14 13:03:38 -08:00
Samuel Just
a53a01740f ReplicatedPG: pull() should return PULL_NONE, not false
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-14 12:56:32 -08:00
Samuel Just
5a3ef17c39 ReplicatedPG: clean up push/pull
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-14 12:55:43 -08:00
Samuel Just
f9b7529fd6 osd_types.h: Add constructors for ObjectRecovery*
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-14 12:52:59 -08:00
Sage Weil
7b1c144f21 test_filestore_idempotent: fix test to create initial object
Filestore now properly fails to clone a non-existent object, which means
we should create one.

Fixes: #2062
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-14 11:53:05 -08:00
Sage Weil
6b30cd3ba3 libcephfs: define CEPH_SETATTR_*
These are also defined internally in ceph_fs.h, so use a guard.  Annoying,
but gives us consistent naming (ceph_*/CEPH_*, not LIBCEPHFS_SETATTR_*).

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-14 09:06:32 -08:00
Sage Weil
b54bac3061 test/encoding/readable.sh: drop bashisms
=, not ==!

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 14:43:18 -08:00
Sage Weil
ffa1de32c5 filejournal: drop unused variable
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 14:35:01 -08:00
Sage Weil
ccf8867f15 filejournal: aio off by default
For now, until we have a better handle on the ext4 bug, and demonstrate
that it is a clear performance win with the full stack.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 14:32:07 -08:00
Sage Weil
12035cd4e3 Merge remote-tracking branch 'gh/wip-journal-aio-rebased' 2012-02-13 14:31:17 -08:00
Sage Weil
3d3237fef4 Merge remote-tracking branch 'gh/wip-osd'
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-13 14:09:04 -08:00
Sage Weil
9fded38f53 test/encoding/readable.sh: skip old version with known incompatibilities
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 14:08:25 -08:00
Sage Weil
3e1cc0b951 ceph-dencoder: add osd_peer_stat_t
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 12:41:18 -08:00
Yehuda Sadeh
9065dbd36d rgw: remove extra useless info in bucket entry encoding
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-13 12:08:19 -08:00
Samuel Just
1bf037bf76 ReplicatedPG: refactor push and pull
Now, push progress is represented by ObjectRecoveryProgress.  In
particular, rather than tracking data_subset_*ing, we track the furthest
offset before which the data will be consistent once cloning is complete.
sub_op_push now separates the pull response implementation from the
replica push implementation.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-13 12:07:39 -08:00
Sage Weil
fbbbd01bfe add CEPH_FEATURE_OSDENC
Require it for osd <-> osd and osd <-> mon communication.

This covers all the new encoding changes, except hobject_t, which is used
between the rados command line tool and the OSD for a object listing
position marker.  We can't distinguish between specific types of clients,
though, and we don't want to introduce any incompatibility with other
clients, so we'll just have to make do here.  :(

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 11:27:11 -08:00
Samuel Just
af38ce1f7c ReplicatedPG: consider backfill_pos to be degraded
A write may trigger via make_writeable the creation of a clone which
sorts before the object being written.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 11:26:52 -08:00
Samuel Just
d0ccf28086 ReplicatedPG: add debugging for in flight backfill ops
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-13 11:26:52 -08:00
Samuel Just
94a198c87c ReplicatedPG: is_degraded may return true for backfill
If is_degraded returns true for backfill, the object may not be
in any replica's missing set.  Only call start_recovery_op if
we actually started an op.  This bug could cause a stuck
in backfill error.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 11:26:52 -08:00
Samuel Just
2476dd7127 MOSDSubOp: Add new object recovery state
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-13 11:24:18 -08:00
Samuel Just
f80e0c715b ReplicatedPG: consider backfill_pos to be degraded
A write may trigger via make_writeable the creation of a clone which
sorts before the object being written.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-13 11:18:02 -08:00
Samuel Just
4785ae39db ReplicatedPG: add debugging for in flight backfill ops
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-13 11:18:02 -08:00
Samuel Just
d43d5d9ff0 ReplicatedPG: is_degraded may return true for backfill
If is_degraded returns true for backfill, the object may not be
in any replica's missing set.  Only call start_recovery_op if
we actually started an op.  This bug could cause a stuck
in backfill error.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-13 11:18:02 -08:00
Sage Weil
389653e63d osd: remove peer_stat from MOSDOp entirely
We haven't used this feature for years and years, and don't plan to.  It
was there to facilitate "read shedding", where the primary OSD would
forward a read request to a replica.  However, replicas can't reply back
to the client in that case because OSDs don't initiate connections (they
used to).

Rip this out for now, especially since osd_peer_stat_t just changed.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 11:06:34 -08:00
Sage Weil
b1162a37d8 Merge remote-tracking branch 'gh/wip-mon-lag'
Reviewed-by: Sage Weil <sage@newdream.net>
2012-02-13 10:01:32 -08:00
Sage Weil
4dfa4dc27d osd: new osd_peer_stat_t shell type
We weren't using this, and it had broken (raw) encoding.  The constructor
also didn't initialize fields properly.

Clear out the struct and use the new encoding scheme, so we can cleanly
add fields moving forward.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 09:42:37 -08:00
Yehuda Sadeh
508be8e3b3 rgw: don't use SCRIPT_NAME and QUERY_STRING vars
REQUEST_URI holds everything we need, and it's encoded correctly.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-11 22:43:35 -08:00
Sage Weil
3796c4ab8f osd: flush pg on activate _after_ we queue our transaction
We recently added a flush on activate, but we are still building the
transaction (the caller queues it), so calling osr.flush() here is totally
useless.

Instead, set a flag 'need_flush', and do the flush the next time we receive
some work.

This has the added benefit of doing the flush in the worker thread, outside
of osd_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 21:47:42 -08:00
Sage Weil
4d8e9a5ebb osd: do OpRequest dispatch into PG::do_request
This simplifies the external PG interface, and gives us a single path into
the PG...

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 21:46:17 -08:00
Sage Weil
eba609be65 filestore: make flush() block forever if blackholed
If we are blackholing the disk, we need to make flush() wait forever, or
else the flush() logic will return (the IO wasn't queued!) and higher
layers will continue and (eventually) misbehave.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 21:24:54 -08:00
Yehuda Sadeh
610da665d2 Revert "rgw: don't treat plus as a space in url decode"
This reverts commit a6d7629c17.
2012-02-11 21:16:50 -08:00
Sage Weil
053dc33c52 osd: emit useful scrub error on missing clone
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 21:15:11 -08:00
Sage Weil
43828dffe7 filestore: return error from CLONE
Aie!

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 21:14:53 -08:00
Sage Weil
7c6dff4871 osd: filter trimming|purged snaps out of op SnapContext
We can receive an op with an old SnapContext that includes snaps that we've
already trimmed or are in the process of trimming.  Filter them out!
Otherwise we will recreate and add links into collections we've already
marked as removed, and we'll get things like ENOTEMPTY when we try to
remove them.  Or just leave them laying around.

Fixes: #1949
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 15:09:02 -08:00
Sage Weil
02bda42ff1 mon: add {mon,quorum}_status admin socket commands
These dump some json with the current monitor/quorum status.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 14:32:46 -08:00
Sage Weil
e4258ce04b mon: move quorum_status into helper
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 14:30:11 -08:00
Sage Weil
60067f842b mon: move mon_status into a helper
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 14:10:53 -08:00
Sage Weil
a414fd51c7 init-ceph, mkcephfs: try 'btrfs device scan' before 'btrfsctl -a'
Fixes: #2023
Reported-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 13:44:19 -08:00
Sage Weil
a391b0d177 osd: fix MOSDPGCreate version setting
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 11:57:01 -08:00
Sage Weil
6e0e33e72d Merge remote branch 'gh/wip-osd-encoding' 2012-02-11 11:48:29 -08:00
Sage Weil
e09c90fd4b osd: queue pg removal under pg's epoch
The PG may be doing work relative to a different epoch than what the osd
has.  Make sure the PG removal message is queued under that epoch to avoid
confusing/crashing the recipient like so:

2012-02-10 23:26:35.691793 7f387281f700 osd.3 514 queue_pg_for_deletion: 0.0
osd/OSD.cc: In function 'void OSD::handle_pg_remove(OpRequest*)' thread 7f387281f700 time 2012-02-10 23:26:35.691820
osd/OSD.cc: 4860: FAILED assert(pg->get_primary() == m->get_source().num())

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 11:48:27 -08:00
Sage Weil
4834c4c746 osd: check for valid snapc _before_ doing op work
Check this early to avoid wasting effort, or causing side-effects from
do_osd_op_effects().

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 11:48:27 -08:00
Sage Weil
a0caa851ed osd: some cleanup
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 11:48:27 -08:00
Sage Weil
7eff37be49 mon: validate osmdap input
And clean up some error return paths while we're here.

Fixes: #1493
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 09:49:42 -08:00
Yehuda Sadeh
7e32a3d4bc rgw: objects can contain '%'
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-10 17:05:01 -08:00
Sage Weil
bd1a956757 mon: fix MMonElection encoding version
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-10 15:17:28 -08:00