Commit Graph

18146 Commits

Author SHA1 Message Date
Yehuda Sadeh
9065dbd36d rgw: remove extra useless info in bucket entry encoding
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-13 12:08:19 -08:00
Sage Weil
fbbbd01bfe add CEPH_FEATURE_OSDENC
Require it for osd <-> osd and osd <-> mon communication.

This covers all the new encoding changes, except hobject_t, which is used
between the rados command line tool and the OSD for a object listing
position marker.  We can't distinguish between specific types of clients,
though, and we don't want to introduce any incompatibility with other
clients, so we'll just have to make do here.  :(

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 11:27:11 -08:00
Samuel Just
af38ce1f7c ReplicatedPG: consider backfill_pos to be degraded
A write may trigger via make_writeable the creation of a clone which
sorts before the object being written.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 11:26:52 -08:00
Samuel Just
d0ccf28086 ReplicatedPG: add debugging for in flight backfill ops
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-13 11:26:52 -08:00
Samuel Just
94a198c87c ReplicatedPG: is_degraded may return true for backfill
If is_degraded returns true for backfill, the object may not be
in any replica's missing set.  Only call start_recovery_op if
we actually started an op.  This bug could cause a stuck
in backfill error.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 11:26:52 -08:00
Sage Weil
389653e63d osd: remove peer_stat from MOSDOp entirely
We haven't used this feature for years and years, and don't plan to.  It
was there to facilitate "read shedding", where the primary OSD would
forward a read request to a replica.  However, replicas can't reply back
to the client in that case because OSDs don't initiate connections (they
used to).

Rip this out for now, especially since osd_peer_stat_t just changed.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 11:06:34 -08:00
Sage Weil
b1162a37d8 Merge remote-tracking branch 'gh/wip-mon-lag'
Reviewed-by: Sage Weil <sage@newdream.net>
2012-02-13 10:01:32 -08:00
Sage Weil
4dfa4dc27d osd: new osd_peer_stat_t shell type
We weren't using this, and it had broken (raw) encoding.  The constructor
also didn't initialize fields properly.

Clear out the struct and use the new encoding scheme, so we can cleanly
add fields moving forward.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 09:42:37 -08:00
Sage Weil
1f351cdbe9 qa/btrfs/.gitignore: ignore targets
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-13 09:42:05 -08:00
Yehuda Sadeh
508be8e3b3 rgw: don't use SCRIPT_NAME and QUERY_STRING vars
REQUEST_URI holds everything we need, and it's encoded correctly.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-11 22:43:35 -08:00
Yehuda Sadeh
610da665d2 Revert "rgw: don't treat plus as a space in url decode"
This reverts commit a6d7629c17.
2012-02-11 21:16:50 -08:00
Sage Weil
7c6dff4871 osd: filter trimming|purged snaps out of op SnapContext
We can receive an op with an old SnapContext that includes snaps that we've
already trimmed or are in the process of trimming.  Filter them out!
Otherwise we will recreate and add links into collections we've already
marked as removed, and we'll get things like ENOTEMPTY when we try to
remove them.  Or just leave them laying around.

Fixes: #1949
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 15:09:02 -08:00
Sage Weil
02bda42ff1 mon: add {mon,quorum}_status admin socket commands
These dump some json with the current monitor/quorum status.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 14:32:46 -08:00
Sage Weil
e4258ce04b mon: move quorum_status into helper
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 14:30:11 -08:00
Sage Weil
60067f842b mon: move mon_status into a helper
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 14:10:53 -08:00
Sage Weil
a414fd51c7 init-ceph, mkcephfs: try 'btrfs device scan' before 'btrfsctl -a'
Fixes: #2023
Reported-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 13:44:19 -08:00
Sage Weil
a391b0d177 osd: fix MOSDPGCreate version setting
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 11:57:01 -08:00
Sage Weil
6e0e33e72d Merge remote branch 'gh/wip-osd-encoding' 2012-02-11 11:48:29 -08:00
Sage Weil
e09c90fd4b osd: queue pg removal under pg's epoch
The PG may be doing work relative to a different epoch than what the osd
has.  Make sure the PG removal message is queued under that epoch to avoid
confusing/crashing the recipient like so:

2012-02-10 23:26:35.691793 7f387281f700 osd.3 514 queue_pg_for_deletion: 0.0
osd/OSD.cc: In function 'void OSD::handle_pg_remove(OpRequest*)' thread 7f387281f700 time 2012-02-10 23:26:35.691820
osd/OSD.cc: 4860: FAILED assert(pg->get_primary() == m->get_source().num())

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 11:48:27 -08:00
Sage Weil
4834c4c746 osd: check for valid snapc _before_ doing op work
Check this early to avoid wasting effort, or causing side-effects from
do_osd_op_effects().

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 11:48:27 -08:00
Sage Weil
a0caa851ed osd: some cleanup
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-11 11:48:27 -08:00
Sage Weil
7eff37be49 mon: validate osmdap input
And clean up some error return paths while we're here.

Fixes: #1493
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-11 09:49:42 -08:00
Yehuda Sadeh
7e32a3d4bc rgw: objects can contain '%'
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-10 17:05:01 -08:00
Sage Weil
bd1a956757 mon: fix MMonElection encoding version
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-10 15:17:28 -08:00
Greg Farnum
22eca41005 mon: remove the last_consumed setting in Paxos
This was only ever used while initializing the Paxos machine, and it
doesn't need to be. Its existence is just an invitation to have races
between updating the stashed data and the stashed version.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-02-10 15:07:10 -08:00
Yehuda Sadeh
6e6c34f9d6 objecter: LingerOp is refcounted
this should fix Bug #2050, where a linger op was used after being freed.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-02-10 15:06:29 -08:00
Greg Farnum
aecf4e0224 mon: handle inconsistent disk states on startup.
This lets us recover from an interrupted slurp while still noticing
other corruption issues. Rather than running init() and then
update_from_paxos() on each instance, we run init() and check
consistency. If it is consistent, we update_from_paxos as before. If
it is not, we do nothing and detect the slurping state
in handle_probe_reply(). (This assumes the disk was in a slurping state. If not, the
daemon crashes because something else went horribly wrong.)

While we're at it, remove unnecessary sets of first_committed. These
are done in the call to pax->trim_to().

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-02-10 15:02:03 -08:00
Sage Weil
631650b1d6 Merge branch 'wip-encoding'
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-02-10 14:39:44 -08:00
Sage Weil
3c5dcf8958 qa/btrfs/create_async_snap
Stupid tool to call the async snap ioctl.  Until the btrfs tool does it.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-10 14:39:12 -08:00
Sage Weil
7b5689acfc messages: populate header.version in constructor
Define a HEAD_VERSION and COMPAT_VERSION for any versioned message.  Pass
to Message constructor so that it is always initialized, even from the
the default constructor.  That's needed because we use that to check
decoding compatibility when receiving/decoding messages.

If we are conditionally encoding an old version, explicitly set
header.version in encode_payload().

We also set compat_version to demonstrate what will happen for future
revisions.  In this case, it's moot, because no old code understands
compat_version yet: nobody with old decode code will see these values
anyway.  But use this opportunity to demonstrate how it would be used in
the future.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-10 14:38:13 -08:00
Greg Farnum
0bd545f5d6 mon: add a slurping flag to the Paxos state
Set it before we start slurping, and clear it when we end slurping.
This allows us to differentiate between deliberately inconsistent
disk states, and broken disk states. Run simple checks in a new
is_consistent() call.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-02-10 13:37:22 -08:00
Samuel Just
e369ec1589 ReplicatedPG: don't put the op on -EAGAIN
EAGAIN indicates that the op is
waiting_for_missing or waiting_for_degraded

Reviewed-by: Greg Farnum <greg.farnum@dreamhost.com>
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-02-10 09:57:01 -08:00
Greg Farnum
3a7bb9999b mon: initialize paxos state in constructor
These should all be initialized in init() anyway
(except accepted_pn_from, which is set in collect and handle_collect),
but initializing them to safe defaults in the constructor provides
a safety net.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-02-10 09:16:58 -08:00
Sage Weil
8d90856a1f msg: check compat_version before decoding
If the newly constructed message's version is older than the
compat_version, don't even try to decode; just fail.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 22:09:05 -08:00
Sage Weil
989d6786bf msg: populate compat_version for encoded messages
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 22:09:05 -08:00
Sage Weil
811e6298ba msg: include compat_version in version header
header.version is the version we encoded.
header.compat_version is the oldest version of code that can decode it.

If the value is 0, we don't know anything about backward compatibility.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 22:09:05 -08:00
Sage Weil
5b8d0c734c new encoding for Log{Entry,Summary}
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 22:06:38 -08:00
Sage Weil
cb15eb8826 os: new encoding for hobject_t
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 22:06:38 -08:00
Sage Weil
7d85c48129 osd: new encoding for pg_create_t
There was no version encoding previously, so this is an incompatible
change.  Fortunately this type is only used in one place, MOSDPGCreate,
so we'll rev that encoding and compensate there.  All is well!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 22:05:57 -08:00
Sage Weil
f9d67f1ae9 osd: new encoding for osd_stat_t
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 21:58:08 -08:00
Sage Weil
757e3b0587 osd: new encoding for object_locator_t
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 21:58:08 -08:00
Sage Weil
4c3a41f786 osd: new encoding for osd_reqid_t
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 21:58:08 -08:00
Sage Weil
7a68fd9fd6 osd: new ScrubMap::object encoding
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-09 21:58:08 -08:00
Sage Weil
92a058aa52 mon: set last_changed when creating new pgs
This will help us identify PGs that are stuck in creating state.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 21:58:08 -08:00
Sage Weil
a65586cab7 mon: set last_unstale when marking PGs stale
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 21:58:08 -08:00
Sage Weil
88f1fbc1ba osd: include state timestamps, mapping_epoch in pg_stat_t
Track the time when the pg state last changed (or was refreshed) in
interesting ways.

Also track the epoch when the mapping last changed (same_interval_since).

Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-09 21:58:08 -08:00
Sage Weil
00997f93c7 osd: new encoding for PG::Interval
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 21:58:08 -08:00
Sage Weil
9f3f119798 osd: new encoding for PG::OndiskLog
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 21:58:08 -08:00
Sage Weil
14d6ed4980 objectstore: new encoding for Transaction
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 21:58:08 -08:00
Sage Weil
fa779dba09 osd: new encoding for ScrubMap
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-02-09 21:58:08 -08:00