Commit Graph

30244 Commits

Author SHA1 Message Date
Sage Weil
3b3cbf52fb crush/CrushCompiler: make current set of tunables 'safe'
We can reenable this error the next time we add new tunables.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 16:24:16 -08:00
Sage Weil
8535ceda03 crushtool: remove scary tunables messages
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 16:24:15 -08:00
Sage Weil
4eb8891d8d crush/CrushCompiler: start with legacy tunables when compiling
Ensure that a crush file always compiled deterministically, even though
the default values for *new* maps has changed.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 16:24:15 -08:00
Sage Weil
e8fdef217f crush: add indep data set to cli tests
This will help us catch things if we break the mapping.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 16:22:59 -08:00
Sage Weil
564de6ea05 osdmaptool: fix cli tests for 3x
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 16:22:26 -08:00
Sage Weil
6704be68d4 osd: default to 3x replication
3x is the recommendation; it should be the default too.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 16:21:37 -08:00
Sage Weil
308e4f9def Merge pull request #913 from dachary/wip-crush-unittest
CrushWrapper::move_bucket unittest and minor fixes

Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-06 16:10:00 -08:00
Josh Durgin
8d0180b1b7 objecter: don't take extra throttle budget for resent ops
These ops have already taken their budget in the original op_submit().
It will be returned via put_op_budget() when they complete.
If there were many localized reads of missing objects from replicas,
or cache pool redirects, this would cause the objecter to use up all
of its op throttle budget and hang.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-06 16:03:20 -08:00
Sage Weil
38647f7627 Revert "osd: default to 3x replication"
This reverts commit cb26fbde52.

Fix unit tests and do integration tests first; this may have unexpected
consequences.
2013-12-06 15:48:39 -08:00
Loic Dachary
cbeb1f4510 crush: detach_bucket must test item >= 0 not > 0
Since detach_bucket is a private helper solely used by move_bucket which
contains another ( correct ) safeguard, the code cannot be reached and
the problem can never happen. If another function uses detach_bucket,
it may happen.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-07 00:31:54 +01:00
Loic Dachary
2cd73f9d3e crush: remove obsolete comments from link_bucket
Probably copy/pasted from move_bucket.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-07 00:27:09 +01:00
Loic Dachary
e00324b2bc crush: remove redundant code from move_bucket
The following was introduced in 2012 by a2d0cff1b0

  // un-set the device name so we can use add_item later
  build_rmap(name_map, name_rmap);
  name_map.erase(id);
  name_rmap.erase(id_name);

when insert_item refused to move a bucket for which a name already
exists. It was changed in 2013 by
4e2557a038 and now supports it. The
TestCrushWrapper unittest for move_bucket pass.

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-07 00:21:16 +01:00
Loic Dachary
8ef80a4c67 crush: unittest CrushWrapper::move_bucket
Signed-off-by: Loic Dachary <loic@dachary.org>
2013-12-07 00:20:31 +01:00
Sage Weil
865880b5b1 Merge pull request #888 from ceph/wip-crush-tunables
default to bobtail-era crush tunables.

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-12-06 14:45:57 -08:00
Sage Weil
650f896c4d Merge pull request #903 from ceph/wip-memstore
memstore: reference ObjectStore backend

Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-12-06 14:38:15 -08:00
Sage Weil
a6d66f9c7f common/bloom_filter: fix copy ctor
We should not delete[] an uninitialized pointer.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:31 -08:00
Sage Weil
638b27447a ceph_test_rados_api_tier: add HitSetRead
Verify that the HitSet reflects a read (and never written) object.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
01cbbfaae6 ceph_test_rados_api_tier: HitSetRead -> HitSetWrite
This way it will pass despite thrashing.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
456daf2a61 ceph_test_rados_api_tier: add HitSet trim test
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
3ea9230a74 osd/HitSet: fix sealed initialization in Params ctor
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
f0cfd22975 ceph_test_rados_api_tier: make HitSetRead test less noisy
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
bf96a7eae0 osd/HitSet: fix copy ctor
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
01f3ff72d9 osd/HitSet: fix dump() of fpp
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:30 -08:00
Sage Weil
c941e82902 test/encoding/check-generated: test copy ctor, operator=
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
1c107d3cb0 ceph-dencoder: add 'copy' command to test operator=
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
c0eb95b888 mds/Capability: no copying
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Greg Farnum
1d0af14a5e test: add a HitSet unit test
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
c365cca4f3 osd/HitSet: track BloomHitSet::Params fpp in micros, not as a double
...and store it as a 32-bit value, so that it actually works!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
146e6aa777 osd/ReplicatedPG: archive hit_set if it is old and not full
This matches the condition under which we call _persist().

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
737533f270 osd: prevent zero BloomHitSet fpp
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
a72094d504 osd/HitSet: take Params as const ref to avoid confusion about ownership
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:29 -08:00
Sage Weil
68c44cbbdc mon/OSDMonitor: non-zero default bloom fpp
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
41e0f97005 osd/HitSet: make pg_pool_t and Params operator<< less parenthetical
pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 8 owner 0 crash_replay_interval 45 hit_set bloom{false_positive_probability: 0, target size: 0, seed: 0} 10s x8

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
5da128581a osd/ReplicatedPG: apply log to new HitSet to capture writes after peering
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Greg Farnum
fa76d5e4a6 ReplicatedPG: do not seal() HitSets until we're done with them
We don't want to seal HitSets just because we're writing a
snapshot to disk; it potentially shrinks the in-memory one
we want to keep adding stuff to!

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:28 -08:00
Greg Farnum
3c2d2d7616 pg_hit_set_info_t: remove unused size, target_size members
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
1e94e27fd9 ceph_test_rados: hit hit_set_{list,get} rados operations
This will do a list, and then get a random HitSet.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
4a743fb63e osd/ReplicatedPG: trim old hit_set objects on persist
Any time we persist a hit_set object, take the opportunity to remove any
old ones that we don't want any more.

Note that this means if the admin decreases the number of objects to track,
we won't remove them until the next time we persist something.  We also
don't clean up if the HitSet tracking is disabled entirely.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
916313c344 osd/ReplicatedPG: put hit_set objects in a configurable namespace
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:28 -08:00
Sage Weil
a0cfbfd742 librados: create new ceph_test_rados_api_tier target
Move the dirty/undirty test to it, and add one for HitSets.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Greg Farnum
0c43b778e2 librados, osd: list and get HitSets via librados
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
904859e929 osd/ReplicatedPG: use vectorized osd_op outdata for pg ops
This lets us put PGLS in a compound operation.  Nothing does that yet, but
this would allow it.
Despite appearances, this is not a protocol change and does not require
a feature bit for clients: using the osd_ops vector mechanisms store all
the data in the same places as before, it just fills in some of the
already-decoded-but-empty data structures in the MOSDOpReply header.
<Greg note:> We may need a feature bit to let clients know they can send
compound PG ops to OSDs, though? Or maybe we can let it be covered
by supporting hitset ops.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
a97129f197 osd/ReplicatedPG: add basic HitSet tracking
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
b92f431100 mon/OSDMonitor: set hit_set fields
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
db3fd1152a osd: add hit_set_* parameters to pg_pool_t
Add pool properties to control what type of HitSet we want to use, along with
some (mostly generic) parameters.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
e8ef72490b osd/osd_types: include pg_hit_set_history_t in pg_info_t
Track metadata about the currently accumulating HitSet as well as
previously archived ones in the pg_info_t.  This will not scale well for
extremely long histories, but does let us avoid explicitly sharing this
metadata during recovery or other normal update activity.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:27 -08:00
Sage Weil
a430525ca7 osd/osd_types: add pg_hit_set_{info,history}_t
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:26 -08:00
Sage Weil
b5ea47008b common/bloom_filter: fix operator=
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-06 14:37:26 -08:00
Sage Weil
c01b183da0 osd_types: add generic HitSet type with bloom and explicit implementations
Track a set of hash values, either explicitly or using a bloom_filter. Hide
the implementation and allow us to transparently encode and decode.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:26 -08:00
Sage Weil
0b9874cd02 osd/ReplicatedPG: factor out simple_repop_{create,submit} helpers
This makes it easier to create repops correctly, and should help
prevent bugs like the one we remove here in process_copy_op (we were
serializing on the wrong object!)

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-12-06 14:37:26 -08:00