Commit Graph

29598 Commits

Author SHA1 Message Date
Sage Weil
ca86656e74 osd/ReplicatedPG: use finish_ctx for finish_promote
Use the common code here to avoid duplicating this logic.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:55 -08:00
Sage Weil
66263bb6ff osd/ReplicatedPG: use get_next_version() in finish_promote
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:55 -08:00
Sage Weil
56ad14ec1f osd/ReplicatedPG: split off finish_ctx from execute_ctx
The second part of execute_ctx() is doing some somewhat generic work to
make the prepared updates in the ctx apply, updating the obc's cached
values.  Factor it out.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:55 -08:00
Sage Weil
3ef731068c osd/ReplicatedPG: add SKIPRWLOCKS flag
Flush puts us in an conundrum:

 - the flush eventually writes, behaving like a write
 - writes take the write lock at the start
 - to flush, we send copy-from to the base pool, which does a copy-get on
   our object
 - the copy-get is a read, that blocks on the write.

This flag will allow an op to skip the initial locking step.  It will need
to take it later, of course.

Signed-off-by: Sage Weil <sage@inktank.com>

Conflicts:

	src/osd/ReplicatedPG.cc
2013-12-13 16:35:55 -08:00
Sage Weil
5e547f8772 osd/ReplicatedPG: be consistent about ctx->obs vs ctx->obc->obs
Just for consistency (ctx->obs =- &ctx->obc->obs).

Signed-off-by: Sage Weil <sage@inktank.com>

Conflicts:

	src/osd/ReplicatedPG.cc
2013-12-13 16:35:55 -08:00
Sage Weil
36bbcf8e55 osd/ReplicatedPG: drop unnecessary temp vars in execute_ctx()
Both of these are pulled out of ctx->obs, which is not updated until the
very end; use that instead!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:55 -08:00
Sage Weil
10c9be3401 osd/ReplicatedPG: allow osds to issue writes to osds
We asserted that the client was not an OSD years ago when we separated out
the client and cluster networks.  Now, we are about to allow an OSD to
trigger a copy_from on another pool (for cache flush) and the assert can
go away.  We've long since verified that the messages are going out on
the correct interfaces.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:55 -08:00
Sage Weil
20d149e198 osd/ReplcatedPG: maybe_handle_cache style
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:54 -08:00
Sage Weil
0b81ff68c0 osd/ReplicatedPG: skip promote for DELETE
If an op starts with DELETE there is no need to promote the old content
from the base tier.  Note that this only works if the FAILOK flag is
set.  Otherwise, we need to know whether the object existed or not to
return either 0 or -ENOENT.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:54 -08:00
Sage Weil
4c014eddbe osd/ReplicatedPG: implement cache_evict
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:54 -08:00
Sage Weil
8b9b7136ba librados: add an aio_operate that takes a write and flags
Until now you could only pass flags to read operations.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:54 -08:00
Greg Farnum
85282319ee osd/osd_types: introduce helper for osd op flags -> string conversion
Signed-off-by: Sage Weil <sage@inktank.com>

Conflicts:

	src/osd/osd_types.h
2013-12-13 16:35:54 -08:00
Sage Weil
181cb8e83c librados, osd: add IGNORE_OVERLAY flag
Add a flag that will make the OSD bypass the cache overlay logic.  This is
needed in order to handle operations like CACHE_EVICT and CACHE_FLUSH.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:54 -08:00
Sage Weil
387e224aa2 librados: add cache_flush(), cache_try_flus(), cache_evict() methods
Not yet implemented by the OSD.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:54 -08:00
Sage Weil
78df1c37df osd/ReplicatedPG: set object_info and snapset xattrs on promote
For the normal write path, prepare_transaction() handles this for us.  In
this case, we need to do it explicitly.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:54 -08:00
Sage Weil
dd079e2a5f osd/ReplicatedPG: handle is_whiteout in do_osd_ops()
Most of the time we handle whiteouts by returning ENOENT before we even
get this far. However, for a mixed read/write transaction (e.g., a guard)
or certain ops (like create exclusive) we need to deal with the
exists == true and whiteout flag set case explicitly.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:53 -08:00
Sage Weil
fd8f7d295a osd/ReplicatedPG: clear whiteout when writing into cache tier
If we have a whiteout object and then write over it, clear the whiteout
flag.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:53 -08:00
Sage Weil
fabc6ba161 osd/ReplicatedPG: set whiteout in cache pool on delete
If we delete an object in the cache pool, set the whiteout flag instead of
removing the on-disk object.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:53 -08:00
Sage Weil
2aea631c4c ceph_test_rados_api_tier: verify delete creates whiteouts
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:53 -08:00
Sage Weil
e0a49698ec osd/ReplicatedPG: ENOENT when deleting a whiteout
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:53 -08:00
Sage Weil
0b085b174a osd/ReplicatedPG: create whiteout on promote ENOENT
If we try to fetch an object from the base tier and it is not present, we
can create a whiteout object.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:53 -08:00
Sage Weil
0b7b16d7fa ceph_test_rados_api_tier: add simple promote-on-read test
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:53 -08:00
Sage Weil
be29f47ac0 ceph_test_rados_api_tier: rename tests
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:53 -08:00
Sage Weil
66f2e7489d osd/ReplicatedPG: use simple_repop_{create,submit} for finish_promote
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:52 -08:00
Sage Weil
654d8c3334 osd/ReplicatedPG: UNDIRTY is not a user_modify
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:52 -08:00
Sage Weil
4a29b22e2b osd/ReplicatedPG: move r<0 handling into finish_promote()
Let logic in header, and will let us handle ENOENT with a whiteout.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:52 -08:00
Greg Farnum
920c0bff5b workunits: break down cache pool tests to be more precise; expand some
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:52 -08:00
Greg Farnum
0caa02c5af workunits: check errors propagate on cache pools in caching_redirects.sh
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:52 -08:00
Greg Farnum
5fa08fb8de ReplicatedPG: promote: handle failed promotes
If we get an error back, reply to the client directly and remove
the op which triggered promotion from our blocked op queue.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:52 -08:00
Greg Farnum
d15aedbd25 ReplicatedPG: promote: add the OpRequest to the Callback
This way we can do stuff to it, and we're about to.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:52 -08:00
Greg Farnum
b371dd8b71 ReplicatedPG: promote: first draft pass at doing object promotion
This is not yet at all complete -- among other things, it will
retry forever on any object which doesn't exist in the underlying
pool. But it demonstrates the approach reasonably clearly.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>y
2013-12-13 16:35:52 -08:00
Greg Farnum
0699fc5c36 ReplicatedPG: copy: don't return from finish_copyfrom
The return value is meaningless; nothing in this function can fail.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:52 -08:00
Greg Farnum
325aae3652 ReplicatedPG: copy: switch out the CopyCallback interface
The tuple was already unwieldy with 4 members; I didn't want to add
more. Instead, create a new CopyResults struct which contains all the
object info and completion data, and pass the retval and a CopyResults*
in the CopyCallbackResults tuple.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-13 16:35:51 -08:00
Sage Weil
29cc722998 test_ipaddr: add another unit test
Was checking something for kbader.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:02:22 -08:00
Sage Weil
026b724b0d osd/ReplicatedPG: drop unused hit_set_start_stats
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:02:02 -08:00
Sage Weil
3d768d2301 osd/ReplicatedPG: maintain stats for the hit_set_* objects
We also make hit_set.current_info reflect only the on-disk 'current', not
anything that is not persisted.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 16:01:48 -08:00
Sage Weil
9814b93aa2 osd/ReplicatedPG: set object_info_t, SnapSet on hit_set objects
These are first-class user-visible rados objects and need these attrs.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 14:54:16 -08:00
Sage Weil
dabd5d6e34 vstart.sh: --hitset <pool> <type>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 14:50:34 -08:00
Sage Weil
a865fece58 osd/ReplicatedPG: debug: improve hit_set func banners
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-12 18:14:12 -08:00
Sage Weil
b6871cf8bf osd/ReplicatedPG: do not update current_last_update on activate
Don't update this when we apply the log to our in-memory hitset!  We should
only update this when we persist something to disk.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-12 18:13:58 -08:00
Sage Weil
990b2b5df8 ceph_test_rados_api_tier: make HitSetWrite handle pg splits
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-09 20:53:07 -08:00
Sage Weil
a6d66f9c7f common/bloom_filter: fix copy ctor
We should not delete[] an uninitialized pointer.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:31 -08:00
Sage Weil
638b27447a ceph_test_rados_api_tier: add HitSetRead
Verify that the HitSet reflects a read (and never written) object.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
01cbbfaae6 ceph_test_rados_api_tier: HitSetRead -> HitSetWrite
This way it will pass despite thrashing.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
456daf2a61 ceph_test_rados_api_tier: add HitSet trim test
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
3ea9230a74 osd/HitSet: fix sealed initialization in Params ctor
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
f0cfd22975 ceph_test_rados_api_tier: make HitSetRead test less noisy
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
bf96a7eae0 osd/HitSet: fix copy ctor
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
01f3ff72d9 osd/HitSet: fix dump() of fpp
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
c941e82902 test/encoding/check-generated: test copy ctor, operator=
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00