Commit Graph

29547 Commits

Author SHA1 Message Date
Sage Weil
c0eb95b888 mds/Capability: no copying
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Greg Farnum
1d0af14a5e test: add a HitSet unit test
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
c365cca4f3 osd/HitSet: track BloomHitSet::Params fpp in micros, not as a double
...and store it as a 32-bit value, so that it actually works!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
146e6aa777 osd/ReplicatedPG: archive hit_set if it is old and not full
This matches the condition under which we call _persist().

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
737533f270 osd: prevent zero BloomHitSet fpp
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
a72094d504 osd/HitSet: take Params as const ref to avoid confusion about ownership
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
68c44cbbdc mon/OSDMonitor: non-zero default bloom fpp
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
41e0f97005 osd/HitSet: make pg_pool_t and Params operator<< less parenthetical
pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 8 owner 0 crash_replay_interval 45 hit_set bloom{false_positive_probability: 0, target size: 0, seed: 0} 10s x8

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
5da128581a osd/ReplicatedPG: apply log to new HitSet to capture writes after peering
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Greg Farnum
fa76d5e4a6 ReplicatedPG: do not seal() HitSets until we're done with them
We don't want to seal HitSets just because we're writing a
snapshot to disk; it potentially shrinks the in-memory one
we want to keep adding stuff to!

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:28 -08:00
Greg Farnum
3c2d2d7616 pg_hit_set_info_t: remove unused size, target_size members
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
1e94e27fd9 ceph_test_rados: hit hit_set_{list,get} rados operations
This will do a list, and then get a random HitSet.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
4a743fb63e osd/ReplicatedPG: trim old hit_set objects on persist
Any time we persist a hit_set object, take the opportunity to remove any
old ones that we don't want any more.

Note that this means if the admin decreases the number of objects to track,
we won't remove them until the next time we persist something.  We also
don't clean up if the HitSet tracking is disabled entirely.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
916313c344 osd/ReplicatedPG: put hit_set objects in a configurable namespace
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
a0cfbfd742 librados: create new ceph_test_rados_api_tier target
Move the dirty/undirty test to it, and add one for HitSets.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Greg Farnum
0c43b778e2 librados, osd: list and get HitSets via librados
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
904859e929 osd/ReplicatedPG: use vectorized osd_op outdata for pg ops
This lets us put PGLS in a compound operation.  Nothing does that yet, but
this would allow it.
Despite appearances, this is not a protocol change and does not require
a feature bit for clients: using the osd_ops vector mechanisms store all
the data in the same places as before, it just fills in some of the
already-decoded-but-empty data structures in the MOSDOpReply header.
<Greg note:> We may need a feature bit to let clients know they can send
compound PG ops to OSDs, though? Or maybe we can let it be covered
by supporting hitset ops.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
a97129f197 osd/ReplicatedPG: add basic HitSet tracking
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
b92f431100 mon/OSDMonitor: set hit_set fields
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
db3fd1152a osd: add hit_set_* parameters to pg_pool_t
Add pool properties to control what type of HitSet we want to use, along with
some (mostly generic) parameters.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
e8ef72490b osd/osd_types: include pg_hit_set_history_t in pg_info_t
Track metadata about the currently accumulating HitSet as well as
previously archived ones in the pg_info_t.  This will not scale well for
extremely long histories, but does let us avoid explicitly sharing this
metadata during recovery or other normal update activity.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
a430525ca7 osd/osd_types: add pg_hit_set_{info,history}_t
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:26 -08:00
Sage Weil
b5ea47008b common/bloom_filter: fix operator=
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:26 -08:00
Sage Weil
c01b183da0 osd_types: add generic HitSet type with bloom and explicit implementations
Track a set of hash values, either explicitly or using a bloom_filter. Hide
the implementation and allow us to transparently encode and decode.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:26 -08:00
Sage Weil
0b9874cd02 osd/ReplicatedPG: factor out simple_repop_{create,submit} helpers
This makes it easier to create repops correctly, and should help
prevent bugs like the one we remove here in process_copy_op (we were
serializing on the wrong object!)

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:26 -08:00
Greg Farnum
9776e97af2 osd/PG: factor out get_next_version()
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:26 -08:00
Greg Farnum
0b0d1e8e42 librados: add wait_for_latest_osdmap()
There are times when users may need to make sure the client has the
latest osdmap, for example after sending a mon command modifying
pool properties.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>

squash "librados: add wait_for_latest_osdmap()"
2013-12-06 14:37:26 -08:00
Sage Weil
828590688f librados: expose methods for calculating object hash position
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:26 -08:00
Sage Weil
4b5ab3f106 osdc/Objecter: expose methods for getting object hash position and pg
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:26 -08:00
Sage Weil
92879f7787 osd: capture hashing of objects to hash positions/pgs in pg_pool_t
The hashing is dependent on pool properties; capture (more of) it in a
method instead of having it in OSDMap.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:25 -08:00
Sage Weil
76e0b88f56 osd/OSDMap: use new object_locator_t::hash to place object in a pg
The hash value, if provided, becomes the ps (placement seed) portion of the
pg_t, skipping any hashing of the object name (or locator key).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:25 -08:00
Greg Farnum
d692da34ab osd/osd_types: add explicit hash to object_locator_t
Instead of hashing the object name or key, we allow the hash position to be
provided explicitly.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:25 -08:00
Greg Farnum
0d4ea9f746 encoding: allow users to specify a different compatv after encoding
This way we can set the compatv preferentially depending on whether
we've actually encoded new information or not.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:25 -08:00
Sage Weil
d2963c0a3d librados: add mon_command to C++ API
This way librados users can execute monitor commands.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:25 -08:00
Sage Weil
468fffa529 librados: document aio_flush()
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:25 -08:00
Sage Weil
bc7ace2eef librados: constify inbl command args
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:25 -08:00
Sage Weil
a29d4fc3fd osdc/Objecter: constify inbl command args
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:24 -08:00
Sage Weil
fb49065fe7 mon/MonClient: constify inbl command args
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:24 -08:00
Sage Weil
ef0f255a4a osdc/Objecter: reimplement list_objects
Return to caller at the end of each PG.  This allows the caller to look at
the [pg_]hash_position and get something meaningful.

If there are no objects in the PG, we skip it so that every callback has
*some* data (unless the pool is totally empty!).  So the real difference
here is that we don't move on to the next PG just to reach max_entries.

This gives the client some data sooner, but may mean more callbacks into
client code.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:36:52 -08:00
Sage Weil
d2e6cc635f librados: add get_pg_hash_position to determine pg while listing objects
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:36:49 -08:00
Sage Weil
eff932c60a osdc/Objecter: stick bl inside ListContext
This is simpler and less error-prone.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:36:45 -08:00
Sage Weil
8e5803abf7 osdc/Objecter: factor pg_read out of list_objects code
This will get used later for other ops against PGs (instead of objects).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:36:41 -08:00
Sage Weil
dd8c939841 osdc/Objecter: separate explicit pg target from current target
The pgid field is used to store the pg the op mapped to.  We were just
setting it directly for PGLS.  Instead, fill in a new base_pgid, and copy that
to pgid in recalc_op_target(), the same way we do when we map an object
name to a PG.

In particular, we take this opportunity to map a raw pgid to an actual
pgid.  This means the base_pg could come from a raw hash value (although
it doesn't, yet).

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:36:37 -08:00
Sage Weil
9381b69378 osdc/Objecter: drop redundant condition
We are inside an if (response_size) block.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:36:34 -08:00
Sage Weil
bffcca6a0a osd/osd_types: make pref optional in pg_t constructor
We don't use preferred placements any more, so this will
make it easier to start dropping references to it in new code.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:36:31 -08:00
Gary Lowell
5832e2603c v0.72 2013-11-07 20:27:35 +00:00
Yehuda Sadeh
84fb1bf3ee rgw: deny writes to a secondary zone by non-system users
Fixes: #6678
We don't want to allow regular users to write to secondary zones,
otherwise we'd end up with data inconsistencies.

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-11-07 09:17:22 -08:00
Sage Weil
d8f05024e7 doc/release-notes: note crush update timeout on startup change
Signed-off-by: Sage Weil <sage@inktank.com>
2013-11-06 20:02:09 -08:00
Sage Weil
1ee112fa2e osdmaptool: fix cli tests
From c22c84a88c.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-11-06 19:59:56 -08:00
Li Wang
082e7c9eed Ceph: Fix memory leak in chain_flistxattr()
Free allocated memory before return.

Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-11-06 19:00:52 -08:00