Commit Graph

31717 Commits

Author SHA1 Message Date
Sage Weil
c9daf8e5ea osd/ReplicatedPG: add slop to agent mode selection
We want to avoid a situation where the agent clicks on and off when the
system hovers around a utilization threshold.  Particularly for trim,
the system can expend a lot of energy doing a minimal amount of work when
the effort level is low.  To avoid this, enable when we are some amount
above the threshold, and do not turn off until we are the same amount below
the target.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:39 -08:00
Sage Weil
3bc31272db osd/ReplicatedPG: initial agent to random hash position inside pg
When the agent starts, start at a random offset to ensure we get a more
uniform distribution of attention to all objects in the PG.  Otherwise, we
will disproportionately examine objects at the "beginning" of the PG if we
are interrupted by peering or restarts or some other activity.

Note that if the agent_state is preserved, we do not forget our position,
which is also nice.

We *could* persist this position in the pg_info_t somewhere, but I am not
sure it is worth the effort.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
f6930452bc osd: add pg_pool_t::get_random_pg_position()
Return a hash position somewhere inside a given pg.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
7bb0aa5a07 osd: only enable tier agent when osd is in active state
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
cb4aa3a716 osd: observe 'notieragent' osdmap flag
Pause/unpause the agent thread accordingly.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
dcf20b9bf3 osd: add 'notieragent' flag to OSDMap
This will pause tiering agent work.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Loic Dachary
a5dabb58fa histogram: fix histogram::get_position_micro overflow
Convert the return values to uint64_t

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-02-15 22:09:38 -08:00
Loic Dachary
199bdb1ba8 mon: test dirty stats in ceph df detail
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-02-15 22:09:38 -08:00
Sage Weil
18bc151bec osd/ReplicatedPG: decay tier agent histograms over time
Make decisions based on recent observations of object age distributions,
not all time history.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
da9ed08ea3 osd/ReplicatedPG: basic flush and evict agent functionality
This is very basic flush and evict functionality for the tiering agent.

The flush policy is very simple: if we are above the threshold and the
object is dirty, and not super young, flush it.  This is not too braindead
of a policy (although we could clearly do something smarter).

The evict policy is pretty simple: evict the object if it is clean and
we are over our full threshold.  If we are in the middle mode, try to
estimate how cold the object is based on an accumulated histogram of
objects we have examined so far, and decide to evict based on our
position in that histogram relative to our "effort" level.

Caveats:
 * the histograms are not refreshed
 * we aren't taking temperature into consideration yet, although some of
   the infrastructure is there.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
a54f81982d osd: agent worker thread
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
9ac03ef579 osd/ReplicatedPG: fix finish_flush
Make sure we reallocate a pgbackend transaction at the time when we are
initiating new work.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
34fcf42c69 osd/HitSet: add HitSetRef
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
6950212315 osd/ReplicatedPG: factor clone check out of evict op code
Move the check for clones into a helper so that we will be able to use in
other places where we need to evict.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:38 -08:00
Sage Weil
fc28a99f55 osd/ReplicatedPG: add on_finish to OpContext
Add a callback hook for whenever an OpContext completes or cancels.  We
are pretty sloppy here about the return values because our initial user
will not care, and it is unclear if future users will.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:37 -08:00
Sage Weil
a57052cb7b mon: include dirty stats in 'ceph df detail'
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:37 -08:00
Sage Weil
bc945248ec osd: rename test/test_osd_types.cc -> test/osd/types.cc
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:09:37 -08:00
Sage Weil
e65c280b0e osd: add pg_pool_t::get_pg_num_divisor
A PG is not always an equally sized fraction of the total pool size due to
the use of ceph_stable_mod.  Add a helper to return the fraction
(denominator) of a given pg based on the current pg_num value.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:08:12 -08:00
Sage Weil
95f25ce092 mon/OSDMonitor: allow new pool policy fields to be set
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:08:12 -08:00
Sage Weil
0988c8438b osd/osd_types: add cache policy fields to pg_pool_t
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:08:12 -08:00
Sage Weil
297d54eb95 histogram: add decay
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:07:04 -08:00
Sage Weil
fb4152aeab histogram: move to common, add unit tests
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:07:04 -08:00
Sage Weil
85a82722cc histogram: rename set -> set_bin
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:07:04 -08:00
Sage Weil
8b68ad037f histogram: calculate bin position of a value in the histrogram
Generate a lower and upper bound.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 22:07:04 -08:00
Sage Weil
af848d4a4a Merge pull request #1176 from ceph/wip-primary-affinity
osd: primary affinity

Added primary-affinity thrashing to thrashosd.py.

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-02-15 16:59:35 -08:00
Sage Weil
f0f1cf4f96 Merge pull request #1249 from dachary/wip-qa-erasure-test
qa: do not create erasure pools yet
2014-02-15 16:48:36 -08:00
Loic Dachary
d921d9b383 qa: do not create erasure pools yet
comment out erasure pool related tests when an OSD is involved because
it does not work yet. See http://tracker.ceph.com/issues/7360.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-02-16 00:53:13 +01:00
Sage Weil
c673f4084d osd/OSDMap: include primary affinity in OSDMap::print
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:50:09 -08:00
Sage Weil
87be7c1574 osd/OSDMap: remove bad assert
You can have an erasure poool with all CRUSH_ITEM_NONE and primary == -1.
acting is not empty.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:50:09 -08:00
Sage Weil
ba3eef86d8 mon/OSDMonitor: add 'mon osd allow primary affinity' bool option
By default, disallow adjustment of primary affinity unless the user has
opted in by adjusting their monitor config.  This will avoid some user
pain because inadvertantly setting the affinity will prevent older clients
from connecting to and using the cluster.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:50:09 -08:00
Sage Weil
c360c604aa ceph_psim: some futzing to test primary_affinity
- map to acting
- count first position, primary

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:50:09 -08:00
Sage Weil
f825624ff0 osd/OSDMap: add primary_affinity feature bit
Indicate that we support it.  Indicate when an OSDMap requires it.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:50:09 -08:00
Sage Weil
8ecec02fc1 osd/OSDMap: apply primary_affinity to mapping
The behavior is a bit different for replicated and indep/erasure mode.
In the first case, we are rearranging the result.  In the second case,
we can just set the primary argument to the right value.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:50:08 -08:00
Sage Weil
a91d0cbc1b Merge pull request #1245 from ceph/wip-brag
ceph-brag

Sebastien Han confirms that this is under the default (LGPL2) license, thus:

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:37:26 -08:00
Sage Weil
871a5f04f0 ceph.spec: add ceph-brag
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:15:16 -08:00
Sage Weil
4ea0a25aa6 debian: add ceph-brag
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:15:15 -08:00
Sage Weil
57d7018371 ceph-brag: add Makefile
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 10:15:15 -08:00
Sage Weil
7e9f03b18e Merge pull request #1181 from dachary/wip-7277
DNM: mon: s/ENOSYS/ENOTSUP/

Reviewed-by: Christophe Courtaut <christophe.courtaut@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-15 10:08:45 -08:00
Sage Weil
e485c95f74 Merge branch 'master' of https://github.com/enovance/ceph-brag into wip-brag 2014-02-15 09:17:22 -08:00
Sage Weil
cf4f7027e7 mon/Elector: bootstrap on timeout
Currently if an election times out we call a new
election.  If we have never joined a quorum, bootstrap
instead. This is heavier weight, but captures the case
where, during bootstrap:

 - a and b have learned each others' addresses
 - everybody calls an election
 - a and b form a quorum
 - c loops trying to call an election, but is ignored
   because a and b don't see its address in the monmap

See logs:
  ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-02-14_13:50:04-ceph-deploy-wip-7212-sage-b-testing-basic-plana/83194

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-15 08:59:51 -08:00
Sage Weil
4595c44ba1 mon: tell MonmapMonitor first about winning an election
It is important in the bootstrap case that the very first paxos round
also codify the contents of the monmap itself in order to avoid any manner
of confusing scenarios where subsequent elections are called and people
try to recover and modify paxos without agreeing on who the quorum
participants are.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-14 20:27:45 -08:00
Sage Weil
7bd2104acf mon: only learn peer addresses when monmap == 0
It is only safe to dynamically update the address for a peer mon in our
monmap if we are in the midst of the initial quorum formation (i.e.,
monmap.epoch == 0).  If it is a later epoch, we have formed our initial
quorum and any and all monmap changes need to be agreed upon by the quorum
and committed via paxos.

Fixes: #7212
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-14 20:27:44 -08:00
Greg Farnum
3c76b81f2f OSD: use the osdmap_subscribe helper
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-02-14 16:54:43 -08:00
Greg Farnum
6db3ae851d OSD: create a helper for handling OSDMap subscriptions, and clean them up
We've had some trouble with not clearing out subscription requests and
overloading the monitors (though only because of other bugs). Write a
helper for handling subscription requests that we can use to centralize
safety logic. Clear out the subscription whenever we get a map that covers
it; if there are more maps available than we received, we will issue another
subscription request based on "m->newest_map" at the end of handle_osd_map().

Notice that the helper will no longer request old maps which we already have,
and that unless forced it will not dispatch multiple subscribe requests
to a single monitor.
Skipping old maps is safe:
1) we only trim old maps when the monitor tells us to,
2) we do not send messages to our peers until we have updated our maps
from the monitor.
That means only old and broken OSDs will send us messages based on maps
in our past, and we can (and should) ignore any directives from them anyway.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-02-14 16:54:43 -08:00
Greg Farnum
5b9c187caf monc: new fsub_want_increment( function to make handling subscriptions easier
Provide a subscription-modifying function which will not decrement
the start version.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-02-14 16:53:51 -08:00
Sage Weil
7d398c2ae2 doc/release-notes: v0.67.6
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-14 14:20:51 -08:00
Sage Weil
f47062d8a6 Merge pull request #1237 from dachary/wip-hashpspool
mon: ceph hashpspool false clears the flag

Reviewed-by: Christophe Courtaut <christophe.courtaut@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-14 09:27:33 -08:00
Loic Dachary
d8964b2f33 Merge pull request #1235 from ceph/wip-osdmaptool-pool-fix
wip-osdmaptool-pool-fix

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-02-14 13:33:26 +01:00
Ilya Dryomov
0ed6a81b4b osdmaptool: add tests for --pool option
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-02-14 12:24:33 +02:00
Ilya Dryomov
f98435a45f osdmaptool: add --pool option for --test-map-pgs mode to usage()
--test-map-pgs mode allows to map all pgs from either all pools or just
one pool.  Mention it in usage output.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-02-14 12:24:32 +02:00