Commit Graph

28627 Commits

Author SHA1 Message Date
Greg Farnum
efb7ab2ae4 qa/workunits/cephtool/test.sh: test osd tier CLI
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
e3fb912131 Objecter: respect read_tier & write_tier for initial op submission
We overwrite target_oloc.pool with the appropriate [read|write]_tier.
write_tier wins if it matches both.
We don't handle any sort of redirect yet.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Sage Weil
4e439857a6 mon/OSDMonitor: 'osd tier cache-mode <pool> <mode>'
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
b76953c626 Objecter: be careful about precalculated pgids
The only current user of the precalc_pgid field is list_objects. That's
fine, but we don't want new users to inadvertently appear and somehow
break the caching/tiering stuff by forcing us to go to the base pool
when we should be talking to somebody else. Add an assert to catch
these cases.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
665acc11ac Objecter: add an Op::target_oloc and use it instead of base_oloc in send_op()
For now we simply set target_oloc = base_oloc in recalc_op_target(), but
we will shortly be doing more interesting things with it there.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
e2fcad09d9 Objecter: rename Op::oloc -> Op::base_oloc
We want to be able to target other pools for caching and tiering, so
we need to take an oloc from the client and translate it into an
actual target. Rename oloc to base_oloc to make clear which one it is.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
João Eduardo Luís
12c8850a7c Merge pull request #530 from ceph/wip-monc-leak
mon/MonClient: release pending outgoing messages on shutdown

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-08-30 10:36:07 -07:00
Joao Eduardo Luis
64774e5792 os: LevelDBStore: ignore ENOENT files when estimating store size
While iterating over the store files we race against leveldb, which may
be shuffling data around thus removing some files.

By ignoring missing files on stat, we'll get to not account those files
but that's okay -- this is just an estimate.

Fixes: #6178

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-30 18:05:33 +01:00
Sage Weil
e60d4e09e9 ceph-post-file: use mktemp instead of tempfile
tempfile is a debian thing, apparently; mktemp is present everywhere.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 09:41:29 -07:00
Sage Weil
56ff4101a1 Merge pull request #559 from ceph/wip-osd-rollback
fixes a few osd dout bugs; make rados model behave with rollback

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-29 16:34:42 -07:00
Sage Weil
96aaa5e3a3 ceph_test_rados: rollback bumps user_version
Sigh.  This doesn't make much intuitive sense to me, but this is how it
currently works.

Switch to using the async api while we are at it.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-29 16:08:44 -07:00
Samuel Just
42d65b0a70 PGLog: initialize writeout_from in PGLog constructor
Fixes: 6151
Backport: dumpling
Signed-off-by: Samuel Just <sam.just@inktank.com>
Introduced: f808c205c503f7d32518c91619f249466f84c4cf
Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-29 15:12:44 -07:00
Sage Weil
af0a0cd74a mon/OSDMonitor: 'osd pool tier <add|remove> <pool> <tierpool>'
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-29 15:00:17 -07:00
Sage Weil
5e2c86adb0 osd/OSDMonitor: avoid polluting pending_inc on error for 'osd pool set ...'
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-29 15:00:17 -07:00
Sage Weil
ed62c457b5 osd_types: add pg_pool_t cache-related fields
We add fields sufficient to specify
* many pools have a tiering relationship with pool foo
* pool foo is a tier pool for pool bar
* the tiering relationship between foo and bar is specified
  by cache_mode
* client reads and writes for pool foo should be directed to
  pools bar and baz, respectively (where probably, but not
  necessarily, baz == bar or baz == foo).

This lets us specify very sophisticated caching policies on
the server side that all clients going forward can handle
simply by directing the messages as the read_tier and write_tier
flags, and the (not-yet-implemented) redirect replies
from OSDs, specify.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-29 15:00:17 -07:00
Sage Weil
4f7fce5240 osd/ReplicatedPG: drop dout from object_context_destructor_callback
We don't hold the pg lock; cannot call dout here.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-29 14:28:11 -07:00
Sage Weil
00b6a94c2d osd/ReplicatedPG: remove debug lines from snapset_context get/put
The dout() prefix does get_osdmap(), which requires (and asserts) that we
hold the pg lock, but in some cases we do not, notably
ReplicatedPG::object_context_destructor_callback.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-29 14:27:46 -07:00
Sage Weil
9cc40a52f8 Merge pull request #556 from ceph/wip-user-version
make ceph_test_rados / RadosModel validate the versions exposed by librados

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-29 11:39:33 -07:00
Sylvain Munaut
7a7361d7e7 rgw: Fix S3 auth when using response-* query string params
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Sylvain Munaut <s.munaut@whatever-company.com>
2013-08-29 10:56:23 -07:00
Gary Lowell
91616ce4ef ceph.spec.in: remove trailing paren in previous commit
Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-08-29 09:12:49 -07:00
Gary Lowell
b03f24173b ceph.spec.in: Don't invoke debug_package macro on centos.
If the redhat-rpm-config package is installed, the debuginfo rpms will
be built by default.   The build will fail when the package installed
and the specfile also invokes the macro.

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-08-29 09:12:26 -07:00
Yehuda Sadeh
02659cd522 Merge pull request #361 from atwardowski/patch-1
Update adminops.rst add capabilities

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-08-28 17:54:26 -07:00
Sage Weil
e20d1f8e9b ceph_test_rados: validate user_version
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 17:05:10 -07:00
Sage Weil
c8dcd2ea71 osd/ReplicatedPG: set version, user_version correctly on reads
Set the user version to the *current* object version, not the version
we would use if we were to modify it.  We move the assignments inside
the reply (read or error) block to make it more obvious which paths
are possible.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 17:05:10 -07:00
Sage Weil
9374dc8bf3 messages/MOSDOpReply: fix user_version in reply (add missing braces)
Presumbly a mismerge somewhere back around
de20997445803dca4225ed0dac1bad6a8a1e6512.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 17:05:10 -07:00
Sage Weil
985a1405db librados: add get_version64()
The C++ AioCompletion::get_version() method only returns 32-bits.  Sigh.

Add a get_version64() method that returns all 64-bits. Do not touch the
32-bit version to avoid breaking the ABI.

Backport: dumpling, cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 17:05:00 -07:00
Joao Eduardo Luis
7e722245a7 rbd.cc: propagate some errors to user-space when they're available
There was a bunch of situations in which we would have a proper error to
propagate to user-space but we would always return '1' (EXIT_FAILURE).

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-29 00:48:34 +01:00
Joao Eduardo Luis
b2b0f202ea qa: workunits: mon: test snaps ops using rbd.
Regression test for #6047

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-29 00:48:34 +01:00
Joao Eduardo Luis
0e85074402 mon: OSDMonitor: return earlier on no-ops over currently committed state
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-29 00:48:31 +01:00
Joao Eduardo Luis
274b4b96cf mon: OSDMonitor: don't propose on prepare_pool_op()
Except in very special cases, we should let PaxosService take its course
and trigger the proposals itself.  In this case, we were proposing right
before returning to PaxosService, and we were returning false on top of it
(most likely to guarantee that PaxosService wouldn't try to propose).

This doesn't make much sense, so let's do it like all the other cool kids
are doing and let PaxosService decide what's best for us.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-29 00:48:20 +01:00
Joao Eduardo Luis
fab79543c5 mon: OSDMonitor: check if pool is on unmanaged snaps mode on mk/rmsnap
Backport: dumpling
Fixes: #6047

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-29 00:48:17 +01:00
athanatos
3e63c1a4af Merge pull request #550 from ceph/wip-6040
Wip 6040

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.com>
2013-08-28 14:10:37 -07:00
Samuel Just
f808c205c5 PGLog: maintain writeout_from and trimmed
This way, we can avoid omap_rmkeyrange in the common append
and trim cases.

Fixes: #6040
Backport: Dumpling
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-08-28 13:18:11 -07:00
Sage Weil
fd3fd59698 doc/release-notes: v0.56.6 and .7 bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 10:39:11 -07:00
Sage Weil
cb2abad901 Merge pull request #539 from dachary/master
doc : erasure code developer notes updates
2013-08-28 10:29:17 -07:00
João Eduardo Luís
f271a73ca5 Merge pull request #552 from ceph/wip-4924-master
mon: discover mon addrs, names during election state too

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-08-28 10:08:31 -07:00
Sage Weil
c240285700 mon: discover mon addrs, names during election state too
Currently we only detect new mon addrs and names during the probing phase.
For non-trivial clusters, this means we can get into a sticky spot when
we discover enough peers to form an quorum, but not all of them, and the
undiscovered ones are enough to break the mon ranks and prevent an
election.

One way to work around this is to continue addr and name discovery during
the election.  We should also consider making the ranks less sensitive to
the undefined addrs; that is a separate change.

Fixes: #4924
Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
Tested-by: Bernhard Glomm <bernhard.glomm@ecologic.eu>
2013-08-28 09:50:11 -07:00
Sage Weil
61b40f481b doc/dev/cache-pool: document cache pool management interface
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 09:34:03 -07:00
Sage Weil
b91c1c52c7 add CEPH_FEATURE_OSD_CACHEPOOL
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 09:33:59 -07:00
Gregory Farnum
be9a39b766 Merge pull request #549 from ceph/wip-6029
Make user_version a first-class citizen
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
2013-08-28 09:15:36 -07:00
Samuel Just
1c0d75db10 PGLog: don't maintain log_keys_debug if the config is disabled
Fixes: #6040
Backport: Dumpling
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-08-27 18:45:02 -07:00
Samuel Just
fe68b15a3d PGLog: move the log size check after the early return
There really are stl implementations (like the one on my ubuntu 12.04
machine) which have a list::size() which is linear in the size of the
list.  That assert, therefore, is quite expensive!

Fixes: #6040
Backport: Dumpling
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-08-27 18:44:45 -07:00
Greg Farnum
9101433a88 Merge remote-tracking branch 'origin/master' into wip-6029
Conflicts:
	src/librados/AioCompletionImpl.h
2013-08-27 17:26:36 -07:00
Greg Farnum
6c432f1932 doc: update to describe new OSD version support as it actually exists
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:51 -07:00
Greg Farnum
c119afa075 ReplicatedPG: add OpContext::user_at_version
Set this up with the existing at_version member, but only increase
it for user_modify ops. Use this when logging the PG's user_version. In
order to maintain compatibility with old clients on classic pools, we
force user_version to follow at_version whenever it's updated.

Now that we have and are maintaining this PG user version, use it
for the user version on ops that get ENOENT back, when short-circuiting
replies as part of reply_op_error()[1], or when replying to repops
in eval_repop; further use it for the cls_current_version() function. This
is a small semantic change for that function, as previously it would
generally return the same value as the user would get sent back via
MOSDOpReply -- but I don't think it was something you could count on.
We now define it as being the user version of the PG at the start of the
op, and as a bonus it is defined even for read ops (the at_version is
only filled in on write operations).

[1]: We tweak PGLog to make it easier to retrieve both user and PG versions.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
7db71fc270 MOSDOpReply: stop filling in replay_version from the MOSDOp to begin with
It's just asking for trouble.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
2c05b4fea2 MOSDOpReply: switch to comprehensive instead of individual version setters
There's little point to updating versions individually when we can
do so en masse and avoid mistakes in duplication.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
de20997445 MOSDOpReply: add enough fields to be backwards compatible.
The system we've been building up works out very nicely for new clients,
but they could not have interoperated with old clients that were only
referring to our replay_version. In order to deal with this, we add
a bad_replay_version to MOSDOpReply which is encoded where we used
to encode replay_version. bad_replay_version will follow the same semantics
as reassert_version used to (except that it is filled in on reads), but
is not accessible to new clients, who can see only our properly-controlled
replay_version and user_version. This will let old and new clients
interoperate correctly when communicating about watches, etc.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
dc9d3fc357 osd: actually fill in user_version in pg_log_entry_t
We now require it when creating a pg_log_entry_t. The user_version
is the version which info.last_user_version should be set to
after the transaction is applied, which for everything except for
a user-modify op is going to be the version it was already at.
For now we are filling in the user-modify op's changing user_version
to be ctx->at_version.version

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
cc1c4a752f osd: add last_user_version to pg_info_t
We add a corresponding user_version to pg_log_entry_t, and the logic
to assign from one to the other and to recover last_user_version from
a master's log. We aren't yet setting it to anything, though.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00