Commit Graph

28506 Commits

Author SHA1 Message Date
Sage Weil
00b6a94c2d osd/ReplicatedPG: remove debug lines from snapset_context get/put
The dout() prefix does get_osdmap(), which requires (and asserts) that we
hold the pg lock, but in some cases we do not, notably
ReplicatedPG::object_context_destructor_callback.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-29 14:27:46 -07:00
Sage Weil
9cc40a52f8 Merge pull request #556 from ceph/wip-user-version
make ceph_test_rados / RadosModel validate the versions exposed by librados

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-29 11:39:33 -07:00
Sylvain Munaut
7a7361d7e7 rgw: Fix S3 auth when using response-* query string params
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Sylvain Munaut <s.munaut@whatever-company.com>
2013-08-29 10:56:23 -07:00
Gary Lowell
91616ce4ef ceph.spec.in: remove trailing paren in previous commit
Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-08-29 09:12:49 -07:00
Gary Lowell
b03f24173b ceph.spec.in: Don't invoke debug_package macro on centos.
If the redhat-rpm-config package is installed, the debuginfo rpms will
be built by default.   The build will fail when the package installed
and the specfile also invokes the macro.

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-08-29 09:12:26 -07:00
Yehuda Sadeh
02659cd522 Merge pull request #361 from atwardowski/patch-1
Update adminops.rst add capabilities

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-08-28 17:54:26 -07:00
Sage Weil
e20d1f8e9b ceph_test_rados: validate user_version
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 17:05:10 -07:00
Sage Weil
c8dcd2ea71 osd/ReplicatedPG: set version, user_version correctly on reads
Set the user version to the *current* object version, not the version
we would use if we were to modify it.  We move the assignments inside
the reply (read or error) block to make it more obvious which paths
are possible.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 17:05:10 -07:00
Sage Weil
9374dc8bf3 messages/MOSDOpReply: fix user_version in reply (add missing braces)
Presumbly a mismerge somewhere back around
de20997445.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 17:05:10 -07:00
Sage Weil
985a1405db librados: add get_version64()
The C++ AioCompletion::get_version() method only returns 32-bits.  Sigh.

Add a get_version64() method that returns all 64-bits. Do not touch the
32-bit version to avoid breaking the ABI.

Backport: dumpling, cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 17:05:00 -07:00
athanatos
3e63c1a4af Merge pull request #550 from ceph/wip-6040
Wip 6040

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.com>
2013-08-28 14:10:37 -07:00
Samuel Just
f808c205c5 PGLog: maintain writeout_from and trimmed
This way, we can avoid omap_rmkeyrange in the common append
and trim cases.

Fixes: #6040
Backport: Dumpling
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-08-28 13:18:11 -07:00
Sage Weil
fd3fd59698 doc/release-notes: v0.56.6 and .7 bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 10:39:11 -07:00
Sage Weil
cb2abad901 Merge pull request #539 from dachary/master
doc : erasure code developer notes updates
2013-08-28 10:29:17 -07:00
João Eduardo Luís
f271a73ca5 Merge pull request #552 from ceph/wip-4924-master
mon: discover mon addrs, names during election state too

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-08-28 10:08:31 -07:00
Sage Weil
c240285700 mon: discover mon addrs, names during election state too
Currently we only detect new mon addrs and names during the probing phase.
For non-trivial clusters, this means we can get into a sticky spot when
we discover enough peers to form an quorum, but not all of them, and the
undiscovered ones are enough to break the mon ranks and prevent an
election.

One way to work around this is to continue addr and name discovery during
the election.  We should also consider making the ranks less sensitive to
the undefined addrs; that is a separate change.

Fixes: #4924
Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
Tested-by: Bernhard Glomm <bernhard.glomm@ecologic.eu>
2013-08-28 09:50:11 -07:00
Sage Weil
61b40f481b doc/dev/cache-pool: document cache pool management interface
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 09:34:03 -07:00
Sage Weil
b91c1c52c7 add CEPH_FEATURE_OSD_CACHEPOOL
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-28 09:33:59 -07:00
Gregory Farnum
be9a39b766 Merge pull request #549 from ceph/wip-6029
Make user_version a first-class citizen
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
2013-08-28 09:15:36 -07:00
Samuel Just
1c0d75db10 PGLog: don't maintain log_keys_debug if the config is disabled
Fixes: #6040
Backport: Dumpling
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-08-27 18:45:02 -07:00
Samuel Just
fe68b15a3d PGLog: move the log size check after the early return
There really are stl implementations (like the one on my ubuntu 12.04
machine) which have a list::size() which is linear in the size of the
list.  That assert, therefore, is quite expensive!

Fixes: #6040
Backport: Dumpling
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-08-27 18:44:45 -07:00
Greg Farnum
9101433a88 Merge remote-tracking branch 'origin/master' into wip-6029
Conflicts:
	src/librados/AioCompletionImpl.h
2013-08-27 17:26:36 -07:00
Greg Farnum
6c432f1932 doc: update to describe new OSD version support as it actually exists
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:51 -07:00
Greg Farnum
c119afa075 ReplicatedPG: add OpContext::user_at_version
Set this up with the existing at_version member, but only increase
it for user_modify ops. Use this when logging the PG's user_version. In
order to maintain compatibility with old clients on classic pools, we
force user_version to follow at_version whenever it's updated.

Now that we have and are maintaining this PG user version, use it
for the user version on ops that get ENOENT back, when short-circuiting
replies as part of reply_op_error()[1], or when replying to repops
in eval_repop; further use it for the cls_current_version() function. This
is a small semantic change for that function, as previously it would
generally return the same value as the user would get sent back via
MOSDOpReply -- but I don't think it was something you could count on.
We now define it as being the user version of the PG at the start of the
op, and as a bonus it is defined even for read ops (the at_version is
only filled in on write operations).

[1]: We tweak PGLog to make it easier to retrieve both user and PG versions.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
7db71fc270 MOSDOpReply: stop filling in replay_version from the MOSDOp to begin with
It's just asking for trouble.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
2c05b4fea2 MOSDOpReply: switch to comprehensive instead of individual version setters
There's little point to updating versions individually when we can
do so en masse and avoid mistakes in duplication.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
de20997445 MOSDOpReply: add enough fields to be backwards compatible.
The system we've been building up works out very nicely for new clients,
but they could not have interoperated with old clients that were only
referring to our replay_version. In order to deal with this, we add
a bad_replay_version to MOSDOpReply which is encoded where we used
to encode replay_version. bad_replay_version will follow the same semantics
as reassert_version used to (except that it is filled in on reads), but
is not accessible to new clients, who can see only our properly-controlled
replay_version and user_version. This will let old and new clients
interoperate correctly when communicating about watches, etc.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
dc9d3fc357 osd: actually fill in user_version in pg_log_entry_t
We now require it when creating a pg_log_entry_t. The user_version
is the version which info.last_user_version should be set to
after the transaction is applied, which for everything except for
a user-modify op is going to be the version it was already at.
For now we are filling in the user-modify op's changing user_version
to be ctx->at_version.version

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
cc1c4a752f osd: add last_user_version to pg_info_t
We add a corresponding user_version to pg_log_entry_t, and the logic
to assign from one to the other and to recover last_user_version from
a master's log. We aren't yet setting it to anything, though.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
69280e2aeb ReplicatedPG: remove OpContext::reply_user_version
ctx->new_obs.oi.user_version is initialized to ctx->obs.oi.user_version,
and for read ops it won't be changed. That means
reply_user_version == ctx->new_obs.oi.user_version in all cases, which
means we don't want it.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
2e764a8183 osd: switch object_info_t::user_version to be a version_t
We never expose the full eversion_t data to users, and do not want to.
However, we pull some tricks in the encode/decode functions to avoid
having to change the object_info_t disk format for this change.
When we can break compatibility, we should simplify this.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
57e346b169 ReplicatedPG: Fill in the MOSDOpReply's user_version
As part of this, rename OpContext::reply_version->reply_user_version.
The semantics that necessitate the reply_version are only for user versions,
so rename it for clarity. Then use the reply_user_version in
set_user_version() (if the op succeeded).
For now we use the PG version for ENOENT (preserving the previous
semantics), but that will get changed to the pg's user_version soon
as well.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:50 -07:00
Greg Farnum
9b998a960a ReplicatedPG: set the replay version based on the at_version
The replay version is not for users to consume, so we don't want
to use the user_version for it.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:49 -07:00
Greg Farnum
e42ef0c079 Objecter: expose MOSDOp's new user_version instead of the replay_version
We don't want users to ever see the replay_version, which is about
to become private RADOS data.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:49 -07:00
Greg Farnum
ff1a573025 Objecter: librados: mass switch from eversion_t to version_t
There are a lot of pointers throughout our request infrastructure used solely
for exporting the version to users. The interfaces we actually expose only
provide a uint64_t (leaving off eversion_t's epoch), and that's all we're
going to maintain in our new user_version scheme, so don't pretend we'll
have more in our internal interfaces.

I audited this pretty carefully; in particular:
Op::objver is only used for passing data back to users via the calling
functions IoCtxImpl::last_objver, etc
IoCtxImpl::last_objver is used only for the set_sync_op_version() call, which
provides data only for the uint64_t get_last_version() and
rados_get_last_version() calls.
AioCompletionImpl::objver is used only for the uint64_t get_version() call.
LingerOp::pobjver is used only for referencing things that are now version_t.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:49 -07:00
Greg Farnum
931bf7e8a8 Objecter: rename Op::version to Op::replay_version
This is used for replay, so let's be more precise!

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:49 -07:00
Greg Farnum
17e32f9506 MOSDOpReply: add user_version field
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:24:49 -07:00
Greg Farnum
295a84b9d9 doc: include plan for new user_version support
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:00:44 -07:00
Greg Farnum
1f608bd967 ReplicatedPG: do not do a redundant set of ctx->new_obs.oi.version
We set this in the if below for writes, and for reads it doesn't need to
be updated (and isn't). Remove the confusing double-set so future code
inspectors don't get concerned there's a bug like I did.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:00:44 -07:00
Greg Farnum
37bba41be6 ReplicatedPG: remove long-dead branch
This was confusing the heck out of me when trying to figure out
why I was hitting an assert. So replace the if-else block with
a more appropriate assert and don't include any misleading calls
to prepare_transaction() from sub_op_modify().

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:00:44 -07:00
Greg Farnum
f400816471 MOSDOpReply: rename *_version() -> *_replay_version()
We have been returning the object's "user version" and using that
for replay, but that is in fact incorrect. In preparation for fixing
up the user version semantics, rename get_version to get_replay_version
and set_version to set_replay_version.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 17:00:38 -07:00
Greg Farnum
7a7ae60851 MOSDOpReply: rename reassert_version -> replay_version
Because that's what it's for. reassert_version is a bit ambiguous.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 16:56:40 -07:00
Greg Farnum
b5ea74cec4 docs: document how the current OSD PG/object versions work
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-27 15:08:28 -07:00
Sage Weil
ec297ec660 Merge pull request #548 from dmick/next
ceph.in: add to $PATH if needed regardless of LD_LIBRARY_PATH state

Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-27 14:02:26 -07:00
Dan Mick
37850e1be6 ceph.in: add to $PATH if needed regardless of LD_LIBRARY_PATH state
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-08-27 13:40:23 -07:00
athanatos
7cc2eb246d Merge pull request #545 from dachary/wip-6117
SharedPtrRegistry: get_next must not delete while holding the lock

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-27 10:56:49 -07:00
John Wilkins
3266862491 doc: Updated to accurately reflect that upstart applies to a single node.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-08-27 10:25:50 -07:00
Gary Lowell
8df504c157 ceph.spec.in: radosgw package doesn't require mod_fcgi
Fixes #5702

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-08-27 09:53:12 -07:00
Sage Weil
a10ca4b5e0 librbd: fix debug print in aio_write
Reported-by: James Harper <james.harper@bendigoit.com.au>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-27 08:30:50 -07:00
Roald J. van Loon
228510ff17 cleanup: removed last references to g_conf from auth
Trivial cleanup. There were still 3 references to g_conf in CephxKeyServer.
Replaced them in favor of cct->_conf.

Signed-off-by: Roald J. van Loon <roaldvanloon@gmail.com>
2013-08-27 08:17:19 -07:00