Commit Graph

28506 Commits

Author SHA1 Message Date
Sage Weil
99793d970b osd/ReplicatedPG: do not log a user_version on the snapdir object
Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-01 08:42:57 -07:00
Sage Weil
72c6c30255 osd/ReplicatedPG: log previous user_version on clone
Nothing relies on this, but it makes sense to me.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-01 08:42:57 -07:00
Sage Weil
cc8e901138 osd/ReplicatedPG: do not log user_version on deletion events
Or snap trim events where we are adjusting the head's snapdir attr.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-01 08:42:57 -07:00
Sage Weil
7d72e559b1 osd/PG: only raise PG's last_user_version if entry is >
We may have pg entries that do not increase the user_version at all (i.e.,
they may be 0).  Do not update the last_user_version in that case as we
need it to remain an upper bound.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-01 08:42:56 -07:00
Sage Weil
1610768d4a osd: debug user_versions a bit
Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-01 08:42:56 -07:00
Sage Weil
1c5e58a85e osdc/Objecter: fix dereference of NULL pg_pool_t
Make sure we don't dereference a NULL pointer.  Note that we check a
bit further down if the target pool does not exist and return the proper
error.

Bug was reliably reproduced by

 ./ceph_test_rados_api_watch_notify --gtest_filter=LibRadosWatchNotify.WatchNotifyTimeoutTestPP

Signed-off-by: Sage Weil <sage@inktank.com>
2013-09-01 08:42:41 -07:00
Roald J. van Loon
a200e184b1 Validate S3 tokens against Keystone
- Added config option to allow S3 to use Keystone auth
- Implemented JSONDecoder for KeystoneToken
- RGW_Auth_S3::authorize now uses rgw_store_user_info on keystone auth
- Minor fix in get_canon_resource; dout is now after the assignment

Reviewed-by: Yehuda Sadeh<yehuda@inktank.com>
Signed-off-by: Roald J. van Loon <roaldvanloon@gmail.com>
2013-08-31 17:43:26 -07:00
Sage Weil
9636722a67 Merge pull request #561 from ceph/wip-6178
os: LevelDBStore: ignore ENOENT files when estimating store size

Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-31 16:46:52 -07:00
Sage Weil
c7f2def8aa Merge branch 'next' 2013-08-31 10:31:31 -07:00
Roald J. van Loon
e48d6cb402 mon: fix uninitialized Op field
- Uninitialized field in MonitorLevelDB::Op causes random build errors.

Signed-off-by: Roald J. van Loon <roaldvanloon@gmail.com>
2013-08-31 10:30:30 -07:00
Roald J. van Loon
a5d815d233 automake cleanup: uninitialized version_t
This sometimes gives a completely random uint64_t value, because it is
potentially used uninitialized.

Signed-off-by: Roald J. van Loon <roaldvanloon@gmail.com>
2013-08-31 10:28:19 -07:00
Sage Weil
46279327cb Merge pull request #541 from ceph/wip-6036
osd objecter; copy-get

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-30 17:02:49 -07:00
Sage Weil
4f6c6b2d74 osd/ReplicatedPG: do not requeue if not primary
This saves us a bit of work, since we will discard the op anyway if
we aren't primary (or even if we become primary again before we get to
it).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:57:25 -07:00
Sage Weil
b0a30a55eb osd: COPY_GET operation
Add new rados operation to copy all user-visible content for an object
in a simple, safe way.  Use a new object_copy_cursor_t to keep track of
our position.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:57:25 -07:00
Sage Weil
8d74f417ea osd/ReplicatedPG: factor {execute,reply}_ctx() out of do_op()
Separate the processing of an OpContext from the preamble and
allocation, so that we can delay the execution for some ops (like the
COPYFROM operation we're about to add).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:57:25 -07:00
Sage Weil
bc99437ef6 osd: feed OSDMaps to the Objecter
Feed every map message we see (that isn't discarded for some other
reason) to the Objecter.  It has the same continuity requirements that
the OSD has, so it should be satisfied with what we get.  It can also
request maps via our MonClient.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:32 -07:00
Sage Weil
34709447e1 osd: add an Objecter instance
It gets its own lock, timer, and osdmap.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:32 -07:00
Sage Weil
a6b04c5d8b osd: discriminate based on connection messenger, not peer type
Replace ->get_source().is_osd() checks and instead see if it is the
cluster_messenger so that we do not confuse ourselves when we get
legit requests from other OSDs on our public interface.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:32 -07:00
Sage Weil
20b25c6f65 ceph-osd: rename msgr vars
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:17 -07:00
Sage Weil
a1dd98d7ca osd: add a separate messenger for the Objecter
We will give the OSD's Objecter its own messenger so that it does not
interfere with the OSD when it marks things up or down.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:16 -07:00
Sage Weil
ea61abad91 osd/ReplicatedPG: add whitespace
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:16 -07:00
Sage Weil
0712d958eb osd: less whitespace
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:16 -07:00
Sage Weil
36d6e6fa40 osdc/Objecter: allow ops to be canceled
This is useful in general, and specifically will be useful for the
rados COPY operation.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:16 -07:00
Sage Weil
42b3d55ddb osdc/Objecter: only request map on startup if epoch == 0
Normal clients have no map and need one to get started.  If we are the
OSD, we will already have one and will get fed maps as they come in.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:16 -07:00
Sage Weil
c6d0b10ed7 osd, objecter: clean up assert_ver()
Create a separate union in the args and clean up the code a bit so that
this doesn't reuse the (unrelated) watch helpers.  No change in
protocol.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:16 -07:00
Sage Weil
8ba50c0e95 osd/ReplicatedPG: drop src_obc.clear() calls
These are all about to go out of scope; no need to clear them
explicitly.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:16 -07:00
Sage Weil
6473060e69 os/ObjectStore: add bufferlist variant of setattrs
And hopefully we can kill the bufferptr ones someday!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 16:54:15 -07:00
David Zafman
7ec0b4fb78 unittest_lfnindex testing older HASH_INDEX_TAG
Switch to work with new HOBJECT_WITH_POOL

fixes: #6196

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-30 16:18:45 -07:00
Sage Weil
8a65ae8e17 doc/rados/operations/pools: remove experimental note about pg splitting
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 15:41:02 -07:00
Sage Weil
b882aa2ace Merge pull request #560 from ceph/wip-6032-cache-objecter
Wip 6032 cache objecter

Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-30 15:24:41 -07:00
Gregory Farnum
b30a1b2889 Merge pull request #554 from ceph/wip-tier-interface
Specify a user and pg_pool_t interface for tiering/caching specifications

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-08-30 14:13:25 -07:00
Greg Farnum
13aac48f25 workunits: add a test for caching redirects
This may need to change since it exploits some of the loose
consistency we currently have with caching pools, but for now
it checks that the Objecter does what we want.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
3516996bb3 mon/OSDMonitor: 'osd tier {set,remove}-overlay <pool> [tierpool]'
Also prevent 'osd tier remove ...' if the tierpool is the current overlay.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
dae9a34b4f osd_types: note that write_tier wins if read_tier is different
For pg_pool_t.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
efb7ab2ae4 qa/workunits/cephtool/test.sh: test osd tier CLI
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
e3fb912131 Objecter: respect read_tier & write_tier for initial op submission
We overwrite target_oloc.pool with the appropriate [read|write]_tier.
write_tier wins if it matches both.
We don't handle any sort of redirect yet.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Sage Weil
4e439857a6 mon/OSDMonitor: 'osd tier cache-mode <pool> <mode>'
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
b76953c626 Objecter: be careful about precalculated pgids
The only current user of the precalc_pgid field is list_objects. That's
fine, but we don't want new users to inadvertently appear and somehow
break the caching/tiering stuff by forcing us to go to the base pool
when we should be talking to somebody else. Add an assert to catch
these cases.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
665acc11ac Objecter: add an Op::target_oloc and use it instead of base_oloc in send_op()
For now we simply set target_oloc = base_oloc in recalc_op_target(), but
we will shortly be doing more interesting things with it there.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
Greg Farnum
e2fcad09d9 Objecter: rename Op::oloc -> Op::base_oloc
We want to be able to target other pools for caching and tiering, so
we need to take an oloc from the client and translate it into an
actual target. Rename oloc to base_oloc to make clear which one it is.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-30 14:06:33 -07:00
João Eduardo Luís
12c8850a7c Merge pull request #530 from ceph/wip-monc-leak
mon/MonClient: release pending outgoing messages on shutdown

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-08-30 10:36:07 -07:00
Joao Eduardo Luis
64774e5792 os: LevelDBStore: ignore ENOENT files when estimating store size
While iterating over the store files we race against leveldb, which may
be shuffling data around thus removing some files.

By ignoring missing files on stat, we'll get to not account those files
but that's okay -- this is just an estimate.

Fixes: #6178

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2013-08-30 18:05:33 +01:00
Sage Weil
e60d4e09e9 ceph-post-file: use mktemp instead of tempfile
tempfile is a debian thing, apparently; mktemp is present everywhere.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-30 09:41:29 -07:00
Sage Weil
56ff4101a1 Merge pull request #559 from ceph/wip-osd-rollback
fixes a few osd dout bugs; make rados model behave with rollback

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-08-29 16:34:42 -07:00
Sage Weil
96aaa5e3a3 ceph_test_rados: rollback bumps user_version
Sigh.  This doesn't make much intuitive sense to me, but this is how it
currently works.

Switch to using the async api while we are at it.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-29 16:08:44 -07:00
Samuel Just
42d65b0a70 PGLog: initialize writeout_from in PGLog constructor
Fixes: 6151
Backport: dumpling
Signed-off-by: Samuel Just <sam.just@inktank.com>
Introduced: f808c205c5
Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-29 15:12:44 -07:00
Sage Weil
af0a0cd74a mon/OSDMonitor: 'osd pool tier <add|remove> <pool> <tierpool>'
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-29 15:00:17 -07:00
Sage Weil
5e2c86adb0 osd/OSDMonitor: avoid polluting pending_inc on error for 'osd pool set ...'
Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-29 15:00:17 -07:00
Sage Weil
ed62c457b5 osd_types: add pg_pool_t cache-related fields
We add fields sufficient to specify
* many pools have a tiering relationship with pool foo
* pool foo is a tier pool for pool bar
* the tiering relationship between foo and bar is specified
  by cache_mode
* client reads and writes for pool foo should be directed to
  pools bar and baz, respectively (where probably, but not
  necessarily, baz == bar or baz == foo).

This lets us specify very sophisticated caching policies on
the server side that all clients going forward can handle
simply by directing the messages as the read_tier and write_tier
flags, and the (not-yet-implemented) redirect replies
from OSDs, specify.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-08-29 15:00:17 -07:00
Sage Weil
4f7fce5240 osd/ReplicatedPG: drop dout from object_context_destructor_callback
We don't hold the pg lock; cannot call dout here.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-08-29 14:28:11 -07:00