mirror of
https://github.com/ceph/ceph
synced 2025-02-21 01:47:25 +00:00
Merge pull request #19331 from jecluis/wip-mon-osdmap-prune
mon: osdmap prune Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Kefu Chai <kchai@redhat.com>
This commit is contained in:
commit
940dd941ef
415
doc/dev/mon-osdmap-prune.rst
Normal file
415
doc/dev/mon-osdmap-prune.rst
Normal file
@ -0,0 +1,415 @@
|
||||
===========================
|
||||
FULL OSDMAP VERSION PRUNING
|
||||
===========================
|
||||
|
||||
For each incremental osdmap epoch, the monitor will keep a full osdmap
|
||||
epoch in the store.
|
||||
|
||||
While this is great when serving osdmap requests from clients, allowing
|
||||
us to fulfill their request without having to recompute the full osdmap
|
||||
from a myriad of incrementals, it can also become a burden once we start
|
||||
keeping an unbounded number of osdmaps.
|
||||
|
||||
The monitors will attempt to keep a bounded number of osdmaps in the store.
|
||||
This number is defined (and configurable) via ``mon_min_osdmap_epochs``, and
|
||||
defaults to 500 epochs. Generally speaking, we will remove older osdmap
|
||||
epochs once we go over this limit.
|
||||
|
||||
However, there are a few constraints to removing osdmaps. These are all
|
||||
defined in ``OSDMonitor::get_trim_to()``.
|
||||
|
||||
In the event one of these conditions is not met, we may go over the bounds
|
||||
defined by ``mon_min_osdmap_epochs``. And if the cluster does not meet the
|
||||
trim criteria for some time (e.g., unclean pgs), the monitor may start
|
||||
keeping a lot of osdmaps. This can start putting pressure on the underlying
|
||||
key/value store, as well as on the available disk space.
|
||||
|
||||
One way to mitigate this problem would be to stop keeping full osdmap
|
||||
epochs on disk. We would have to rebuild osdmaps on-demand, or grab them
|
||||
from cache if they had been recently served. We would still have to keep
|
||||
at least one osdmap, and apply all incrementals on top of either this
|
||||
oldest map epoch kept in the store or a more recent map grabbed from cache.
|
||||
While this would be feasible, it seems like a lot of cpu (and potentially
|
||||
IO) would be going into rebuilding osdmaps.
|
||||
|
||||
Additionally, this would prevent the aforementioned problem going forward,
|
||||
but would do nothing for stores currently in a state that would truly
|
||||
benefit from not keeping osdmaps.
|
||||
|
||||
This brings us to full osdmap pruning.
|
||||
|
||||
Instead of not keeping full osdmap epochs, we are going to prune some of
|
||||
them when we have too many.
|
||||
|
||||
Deciding whether we have too many will be dictated by a configurable option
|
||||
``mon_osdmap_full_prune_min`` (default: 10000). The pruning algorithm will be
|
||||
engaged once we go over this threshold.
|
||||
|
||||
We will not remove all ``mon_osdmap_full_prune_min`` full osdmap epochs
|
||||
though. Instead, we are going to poke some holes in the sequence of full
|
||||
maps. By default, we will keep one full osdmap per 10 maps since the last
|
||||
map kept; i.e., if we keep epoch 1, we will also keep epoch 10 and remove
|
||||
full map epochs 2 to 9. The size of this interval is configurable with
|
||||
``mon_osdmap_full_prune_interval``.
|
||||
|
||||
Essentially, we are proposing to keep ~10% of the full maps, but we will
|
||||
always honour the minimum number of osdmap epochs, as defined by
|
||||
``mon_min_osdmap_epochs``, and these won't be used for the count of the
|
||||
minimum versions to prune. For instance, if we have on-disk versions
|
||||
[1..50000], we would allow the pruning algorithm to operate only over
|
||||
osdmap epochs [1..49500); but, if have on-disk versions [1..10200], we
|
||||
won't be pruning because the algorithm would only operate on versions
|
||||
[1..9700), and this interval contains less versions than the minimum
|
||||
required by ``mon_osdmap_full_prune_min``.
|
||||
|
||||
|
||||
ALGORITHM
|
||||
=========
|
||||
|
||||
Say we have 50,000 osdmap epochs in the store, and we're using the
|
||||
defaults for all configurable options.
|
||||
|
||||
::
|
||||
|
||||
-----------------------------------------------------------
|
||||
|1|2|..|10|11|..|100|..|1000|..|10000|10001|..|49999|50000|
|
||||
-----------------------------------------------------------
|
||||
^ first last ^
|
||||
|
||||
We will prune when all the following constraints are met:
|
||||
|
||||
1. number of versions is greater than ``mon_min_osdmap_epochs``;
|
||||
|
||||
2. the number of versions between ``first`` and ``prune_to`` is greater (or
|
||||
equal) than ``mon_osdmap_full_prune_min``, with ``prune_to`` being equal to
|
||||
``last`` minus ``mon_min_osdmap_epochs``.
|
||||
|
||||
If any of these conditions fails, we will *not* prune any maps.
|
||||
|
||||
Furthermore, if it is known that we have been pruning, but since then we
|
||||
are no longer satisfying at least one of the above constraints, we will
|
||||
not continue to prune. In essence, we only prune full osdmaps if the
|
||||
number of epochs in the store so warrants it.
|
||||
|
||||
As pruning will create gaps in the sequence of full maps, we need to keep
|
||||
track of the intervals of missing maps. We do this by keeping a manifest of
|
||||
pinned maps -- i.e., a list of maps that, by being pinned, are not to be
|
||||
pruned.
|
||||
|
||||
While pinned maps are not removed from the store, maps between two consecutive
|
||||
pinned maps will; and the number of maps to be removed will be dictated by the
|
||||
configurable option ``mon_osdmap_full_prune_interval``. The algorithm makes an
|
||||
effort to keep pinned maps apart by as many maps as defined by this option,
|
||||
but in the event of corner cases it may allow smaller intervals. Additionally,
|
||||
as this is a configurable option that is read any time a prune iteration
|
||||
occurs, there is the possibility this interval will change if the user changes
|
||||
this config option.
|
||||
|
||||
Pinning maps is performed lazily: we will be pinning maps as we are removing
|
||||
maps. This grants us more flexibility to change the prune interval while
|
||||
pruning is happening, but also simplifies considerably the algorithm, as well
|
||||
as the information we need to keep in the manifest. Below we show a simplified
|
||||
version of the algorithm:::
|
||||
|
||||
manifest.pin(first)
|
||||
last_to_prune = last - mon_min_osdmap_epochs
|
||||
|
||||
while manifest.get_last_pinned() + prune_interval < last_to_prune AND
|
||||
last_to_prune - first > mon_min_osdmap_epochs AND
|
||||
last_to_prune - first > mon_osdmap_full_prune_min AND
|
||||
num_pruned < mon_osdmap_full_prune_txsize:
|
||||
|
||||
last_pinned = manifest.get_last_pinned()
|
||||
new_pinned = last_pinned + prune_interval
|
||||
manifest.pin(new_pinned)
|
||||
for e in (last_pinned .. new_pinned):
|
||||
store.erase(e)
|
||||
++num_pruned
|
||||
|
||||
In essence, the algorithm ensures that the first version in the store is
|
||||
*always* pinned. After all, we need a starting point when rebuilding maps, and
|
||||
we can't simply remove the earliest map we have; otherwise we would be unable
|
||||
to rebuild maps for the very first pruned interval.
|
||||
|
||||
Once we have at least one pinned map, each iteration of the algorithm can
|
||||
simply base itself on the manifest's last pinned map (which we can obtain by
|
||||
reading the element at the tail of the manifest's pinned maps list).
|
||||
|
||||
We'll next need to determine the interval of maps to be removed: all the maps
|
||||
from ``last_pinned`` up to ``new_pinned``, which in turn is nothing more than
|
||||
``last_pinned`` plus ``mon_osdmap_full_prune_interval``. We know that all maps
|
||||
between these two values, ``last_pinned`` and ``new_pinned`` can be removed,
|
||||
considering ``new_pinned`` has been pinned.
|
||||
|
||||
The algorithm ceases to execute as soon as one of the two initial
|
||||
preconditions is not met, or if we do not meet two additional conditions that
|
||||
have no weight on the algorithm's correctness:
|
||||
|
||||
1. We will stop if we are not able to create a new pruning interval properly
|
||||
aligned with ``mon_osdmap_full_prune_interval`` that is lower than
|
||||
``last_pruned``. There is no particular technical reason why we enforce
|
||||
this requirement, besides allowing us to keep the intervals with an
|
||||
expected size, and preventing small, irregular intervals that would be
|
||||
bound to happen eventually (e.g., pruning continues over the course of
|
||||
several iterations, removing one or two or three maps each time).
|
||||
|
||||
2. We will stop once we know that we have pruned more than a certain number of
|
||||
maps. This value is defined by ``mon_osdmap_full_prune_txsize``, and
|
||||
ensures we don't spend an unbounded number of cycles pruning maps. We don't
|
||||
enforce this value religiously (deletes do not cost much), but we make an
|
||||
effort to honor it.
|
||||
|
||||
We could do the removal in one go, but we have no idea how long that would
|
||||
take. Therefore, we will perform several iterations, removing at most
|
||||
``mon_osdmap_full_prune_txsize`` osdmaps per iteration.
|
||||
|
||||
In the end, our on-disk map sequence will look similar to::
|
||||
|
||||
------------------------------------------
|
||||
|1|10|20|30|..|49500|49501|..|49999|50000|
|
||||
------------------------------------------
|
||||
^ first last ^
|
||||
|
||||
|
||||
Because we are not pruning all versions in one go, we need to keep state
|
||||
about how far along on our pruning we are. With that in mind, we have
|
||||
created a data structure, ``osdmap_manifest_t``, that holds the set of pinned
|
||||
maps:::
|
||||
|
||||
struct osdmap_manifest_t:
|
||||
set<version_t> pinned;
|
||||
|
||||
Given we are only pinning maps while we are pruning, we don't need to keep
|
||||
track of additional state about the last pruned version. We know as a matter
|
||||
of fact that we have pruned all the intermediate maps between any two
|
||||
consecutive pinned maps.
|
||||
|
||||
The question one could ask, though, is how can we be sure we pruned all the
|
||||
intermediate maps if, for instance, the monitor crashes. To ensure we are
|
||||
protected against such an event, we always write the osdmap manifest to disk
|
||||
on the same transaction that is deleting the maps. This way we have the
|
||||
guarantee that, if the monitor crashes, we will read the latest version of the
|
||||
manifest: either containing the newly pinned maps, meaning we also pruned the
|
||||
in-between maps; or we will find the previous version of the osdmap manifest,
|
||||
which will not contain the maps we were pinning at the time we crashed, given
|
||||
the transaction on which we would be writing the updated osdmap manifest was
|
||||
not applied (alongside with the maps removal).
|
||||
|
||||
The osdmap manifest will be written to the store each time we prune, with an
|
||||
updated list of pinned maps. It is written in the transaction effectively
|
||||
pruning the maps, so we guarantee the manifest is always up to date. As a
|
||||
consequence of this criteria, the first time we will write the osdmap manifest
|
||||
is the first time we prune. If an osdmap manifest does not exist, we can be
|
||||
certain we do not hold pruned map intervals.
|
||||
|
||||
We will rely on the manifest to ascertain whether we have pruned maps
|
||||
intervals. In theory, this will always be the on-disk osdmap manifest, but we
|
||||
make sure to read the on-disk osdmap manifest each time we update from paxos;
|
||||
this way we always ensure having an up to date in-memory osdmap manifest.
|
||||
|
||||
Once we finish pruning maps, we will keep the manifest in the store, to
|
||||
allow us to easily find which maps have been pinned (instead of checking
|
||||
the store until we find a map). This has the added benefit of allowing us to
|
||||
quickly figure out which is the next interval we need to prune (i.e., last
|
||||
pinned plus the prune interval). This doesn't however mean we will forever
|
||||
keep the osdmap manifest: the osdmap manifest will no longer be required once
|
||||
the monitor trims osdmaps and the earliest available epoch in the store is
|
||||
greater than the last map we pruned.
|
||||
|
||||
The same conditions from ``OSDMonitor::get_trim_to()`` that force the monitor
|
||||
to keep a lot of osdmaps, thus requiring us to prune, may eventually change
|
||||
and allow the monitor to remove some of its oldest maps.
|
||||
|
||||
MAP TRIMMING
|
||||
------------
|
||||
|
||||
If the monitor trims maps, we must then adjust the osdmap manifest to
|
||||
reflect our pruning status, or remove the manifest entirely if it no longer
|
||||
makes sense to keep it. For instance, take the map sequence from before, but
|
||||
let us assume we did not finish pruning all the maps.::
|
||||
|
||||
-------------------------------------------------------------
|
||||
|1|10|20|30|..|490|500|501|502|..|49500|49501|..|49999|50000|
|
||||
-------------------------------------------------------------
|
||||
^ first ^ pinned.last() last ^
|
||||
|
||||
pinned = {1, 10, 20, ..., 490, 500}
|
||||
|
||||
Now let us assume that the monitor will trim up to epoch 501. This means
|
||||
removing all maps prior to epoch 501, and updating the ``first_committed``
|
||||
pointer to ``501``. Given removing all those maps would invalidate our
|
||||
existing pruning efforts, we can consider our pruning has finished and drop
|
||||
our osdmap manifest. Doing so also simplifies starting a new prune, if all
|
||||
the starting conditions are met once we refreshed our state from the
|
||||
store.
|
||||
|
||||
We would then have the following map sequence: ::
|
||||
|
||||
---------------------------------------
|
||||
|501|502|..|49500|49501|..|49999|50000|
|
||||
---------------------------------------
|
||||
^ first last ^
|
||||
|
||||
However, imagine a slightly more convoluted scenario: the monitor will trim
|
||||
up to epoch 491. In this case, epoch 491 has been previously pruned from the
|
||||
store.
|
||||
|
||||
Given we will always need to have the oldest known map in the store, before
|
||||
we trim we will have to check whether that map is in the prune interval
|
||||
(i.e., if said map epoch belongs to ``[ pinned.first()..pinned.last() )``).
|
||||
If so, we need to check if this is a pinned map, in which case we don't have
|
||||
much to be concerned aside from removing lower epochs from the manifest's
|
||||
pinned list. On the other hand, if the map being trimmed to is not a pinned
|
||||
map, we will need to rebuild said map and pin it, and only then will we remove
|
||||
the pinned maps prior to the map's epoch.
|
||||
|
||||
In this case, we would end up with the following sequence:::
|
||||
|
||||
-----------------------------------------------
|
||||
|491|500|501|502|..|49500|49501|..|49999|50000|
|
||||
-----------------------------------------------
|
||||
^ ^- pinned.last() last ^
|
||||
`- first
|
||||
|
||||
There is still an edge case that we should mention. Consider that we are
|
||||
going to trim up to epoch 499, which is the very last pruned epoch.
|
||||
|
||||
Much like the scenario above, we would end up writing osdmap epoch 499 to
|
||||
the store; but what should we do about pinned maps and pruning?
|
||||
|
||||
The simplest solution is to drop the osdmap manifest. After all, given we
|
||||
are trimming to the last pruned map, and we are rebuilding this map, we can
|
||||
guarantee that all maps greater than e 499 are sequential (because we have
|
||||
not pruned any of them). In essence, dropping the osdmap manifest in this
|
||||
case is essentially the same as if we were trimming over the last pruned
|
||||
epoch: we can prune again later if we meet the required conditions.
|
||||
|
||||
And, with this, we have fully dwelled into full osdmap pruning. Later in this
|
||||
document one can find detailed `REQUIREMENTS, CONDITIONS & INVARIANTS` for the
|
||||
whole algorithm, from pruning to trimming. Additionally, the next section
|
||||
details several additional checks to guarantee the sanity of our configuration
|
||||
options. Enjoy.
|
||||
|
||||
|
||||
CONFIGURATION OPTIONS SANITY CHECKS
|
||||
-----------------------------------
|
||||
|
||||
We perform additional checks before pruning to ensure all configuration
|
||||
options involved are sane:
|
||||
|
||||
1. If ``mon_osdmap_full_prune_interval`` is zero we will not prune; we
|
||||
require an actual positive number, greater than one, to be able to prune
|
||||
maps. If the interval is one, we would not actually be pruning any maps, as
|
||||
the interval between pinned maps would essentially be a single epoch. This
|
||||
means we would have zero maps in-between pinned maps, hence no maps would
|
||||
ever be pruned.
|
||||
|
||||
2. If ``mon_osdmap_full_prune_min`` is zero we will not prune; we require a
|
||||
positive, greater than zero, value so we know the threshold over which we
|
||||
should prune. We don't want to guess.
|
||||
|
||||
3. If ``mon_osdmap_full_prune_interval`` is greater than
|
||||
``mon_osdmap_full_prune_min`` we will not prune, as it is impossible to
|
||||
ascertain a proper prune interval.
|
||||
|
||||
4. If ``mon_osdmap_full_prune_txsize`` is lower than
|
||||
``mon_osdmap_full_prune_interval`` we will not prune; we require a
|
||||
``txsize`` with a value at least equal than ``interval``, and (depending on
|
||||
the value of the latter) ideally higher.
|
||||
|
||||
|
||||
REQUIREMENTS, CONDITIONS & INVARIANTS
|
||||
-------------------------------------
|
||||
|
||||
REQUIREMENTS
|
||||
~~~~~~~~~~~~
|
||||
|
||||
* All monitors in the quorum need to support pruning.
|
||||
|
||||
* Once pruning has been enabled, monitors not supporting pruning will not be
|
||||
allowed in the quorum, nor will be allowed to synchronize.
|
||||
|
||||
* Removing the osdmap manifest results in disabling the pruning feature quorum
|
||||
requirement. This means that monitors not supporting pruning will be allowed
|
||||
to synchronize and join the quorum, granted they support any other features
|
||||
required.
|
||||
|
||||
|
||||
CONDITIONS & INVARIANTS
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* Pruning has never happened, or we have trimmed past its previous
|
||||
intervals:::
|
||||
|
||||
invariant: first_committed > 1
|
||||
|
||||
condition: pinned.empty() AND !store.exists(manifest)
|
||||
|
||||
|
||||
* Pruning has happened at least once:::
|
||||
|
||||
invariant: first_committed > 0
|
||||
invariant: !pinned.empty())
|
||||
invariant: pinned.first() == first_committed
|
||||
invariant: pinned.last() < last_committed
|
||||
|
||||
precond: pinned.last() < prune_to AND
|
||||
pinned.last() + prune_interval < prune_to
|
||||
|
||||
postcond: pinned.size() > old_pinned.size() AND
|
||||
(for each v in [pinned.first()..pinned.last()]:
|
||||
if pinned.count(v) > 0: store.exists_full(v)
|
||||
else: !store.exists_full(v)
|
||||
)
|
||||
|
||||
|
||||
* Pruning has finished:::
|
||||
|
||||
invariant: first_committed > 0
|
||||
invariant: !pinned.empty()
|
||||
invariant: pinned.first() == first_committed
|
||||
invariant: pinned.last() < last_committed
|
||||
|
||||
condition: pinned.last() == prune_to OR
|
||||
pinned.last() + prune_interval < prune_to
|
||||
|
||||
|
||||
* Pruning intervals can be trimmed:::
|
||||
|
||||
precond: OSDMonitor::get_trim_to() > 0
|
||||
|
||||
condition: !pinned.empty()
|
||||
|
||||
invariant: pinned.first() == first_committed
|
||||
invariant: pinned.last() < last_committed
|
||||
invariant: pinned.first() <= OSDMonitor::get_trim_to()
|
||||
invariant: pinned.last() >= OSDMonitor::get_trim_to()
|
||||
|
||||
* Trim pruned intervals:::
|
||||
|
||||
invariant: !pinned.empty()
|
||||
invariant: pinned.first() == first_committed
|
||||
invariant: pinned.last() < last_committed
|
||||
invariant: pinned.first() <= OSDMonitor::get_trim_to()
|
||||
invariant: pinned.last() >= OSDMonitor::get_trim_to()
|
||||
|
||||
postcond: pinned.empty() OR
|
||||
(pinned.first() == OSDMonitor::get_trim_to() AND
|
||||
pinned.last() > pinned.first() AND
|
||||
(for each v in [0..pinned.first()]:
|
||||
!store.exists(v) AND
|
||||
!store.exists_full(v)
|
||||
) AND
|
||||
(for each m in [pinned.first()..pinned.last()]:
|
||||
if pinned.count(m) > 0: store.exists_full(m)
|
||||
else: !store.exists_full(m) AND store.exists(m)
|
||||
)
|
||||
)
|
||||
postcond: !pinned.empty() OR
|
||||
(!store.exists(manifest) AND
|
||||
(for each v in [pinned.first()..pinned.last()]:
|
||||
!store.exists(v) AND
|
||||
!store.exists_full(v)
|
||||
)
|
||||
)
|
||||
|
@ -44,10 +44,14 @@ else
|
||||
COREPATTERN="core.%e.%p.%t"
|
||||
fi
|
||||
|
||||
function finish() {
|
||||
function cleanup() {
|
||||
if [ -n "$precore" ]; then
|
||||
sudo sysctl -w ${KERNCORE}=${precore}
|
||||
fi
|
||||
}
|
||||
|
||||
function finish() {
|
||||
cleanup
|
||||
exit 0
|
||||
}
|
||||
|
||||
@ -55,6 +59,10 @@ trap finish TERM HUP INT
|
||||
|
||||
PATH=$(pwd)/bin:$PATH
|
||||
|
||||
# add /sbin and /usr/sbin to PATH to find sysctl in those cases where the
|
||||
# user's PATH does not get these directories by default (e.g., tumbleweed)
|
||||
PATH=$PATH:/sbin:/usr/sbin
|
||||
|
||||
# TODO: Use getops
|
||||
dryrun=false
|
||||
if [[ "$1" = "--dry-run" ]]; then
|
||||
@ -75,6 +83,11 @@ count=0
|
||||
errors=0
|
||||
userargs=""
|
||||
precore="$(sysctl -n $KERNCORE)"
|
||||
|
||||
if [[ "${precore:0:1}" = "|" ]]; then
|
||||
precore="${precore:1}"
|
||||
fi
|
||||
|
||||
# If corepattern already set, avoid having to use sudo
|
||||
if [ "$precore" = "$COREPATTERN" ]; then
|
||||
precore=""
|
||||
@ -130,9 +143,7 @@ do
|
||||
fi
|
||||
fi
|
||||
done
|
||||
if [ -n "$precore" ]; then
|
||||
sudo sysctl -w ${KERNCORE}=${precore}
|
||||
fi
|
||||
cleanup
|
||||
|
||||
if [ "$errors" != "0" ]; then
|
||||
echo "$errors TESTS FAILED, $count TOTAL TESTS"
|
||||
|
62
qa/standalone/mon/mon-osdmap-prune.sh
Executable file
62
qa/standalone/mon/mon-osdmap-prune.sh
Executable file
@ -0,0 +1,62 @@
|
||||
#!/bin/bash
|
||||
|
||||
source $CEPH_ROOT/qa/standalone/ceph-helpers.sh
|
||||
|
||||
base_test=$CEPH_ROOT/qa/workunits/mon/test_mon_osdmap_prune.sh
|
||||
|
||||
# We are going to open and close a lot of files, and generate a lot of maps
|
||||
# that the osds will need to process. If we don't increase the fd ulimit, we
|
||||
# risk having the osds asserting when handling filestore transactions.
|
||||
ulimit -n 4096
|
||||
|
||||
function run() {
|
||||
|
||||
local dir=$1
|
||||
shift
|
||||
|
||||
export CEPH_MON="127.0.0.1:7115"
|
||||
export CEPH_ARGS
|
||||
CEPH_ARGS+="--fsid=$(uuidgen) --auth-supported=none --mon-host=$CEPH_MON "
|
||||
|
||||
local funcs=${@:-$(set | sed -n -e 's/^\(TEST_[0-9a-z_]*\) .*/\1/p')}
|
||||
for func in $funcs; do
|
||||
setup $dir || return 1
|
||||
$func $dir || return 1
|
||||
teardown $dir || return 1
|
||||
done
|
||||
}
|
||||
|
||||
function TEST_osdmap_prune() {
|
||||
|
||||
local dir=$1
|
||||
|
||||
run_mon $dir a || return 1
|
||||
run_mgr $dir x || return 1
|
||||
run_osd $dir 0 || return 1
|
||||
run_osd $dir 1 || return 1
|
||||
run_osd $dir 2 || return 1
|
||||
|
||||
sleep 5
|
||||
|
||||
# we are getting OSD_OUT_OF_ORDER_FULL health errors, and it's not clear
|
||||
# why. so, to make the health checks happy, mask those errors.
|
||||
ceph osd set-full-ratio 0.97
|
||||
ceph osd set-backfillfull-ratio 0.97
|
||||
|
||||
ceph config set osd osd_beacon_report_interval 10 || return 1
|
||||
ceph config set mon mon_debug_extra_checks true || return 1
|
||||
|
||||
ceph config set mon mon_min_osdmap_epochs 100 || return 1
|
||||
ceph config set mon mon_osdmap_full_prune_enabled true || return 1
|
||||
ceph config set mon mon_osdmap_full_prune_min 200 || return 1
|
||||
ceph config set mon mon_osdmap_full_prune_interval 10 || return 1
|
||||
ceph config set mon mon_osdmap_full_prune_txsize 100 || return 1
|
||||
|
||||
|
||||
bash -x $base_test || return 1
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
main mon-osdmap-prune "$@"
|
||||
|
@ -1,3 +1,13 @@
|
||||
overrides:
|
||||
ceph:
|
||||
conf:
|
||||
mon:
|
||||
mon min osdmap epochs: 50
|
||||
paxos service trim min: 10
|
||||
# prune full osdmaps regularly
|
||||
mon osdmap full prune min: 15
|
||||
mon osdmap full prune interval: 2
|
||||
mon osdmap full prune txsize: 2
|
||||
tasks:
|
||||
- install:
|
||||
- ceph:
|
||||
|
@ -4,6 +4,10 @@ overrides:
|
||||
mon:
|
||||
mon min osdmap epochs: 25
|
||||
paxos service trim min: 5
|
||||
# prune full osdmaps regularly
|
||||
mon osdmap full prune min: 15
|
||||
mon osdmap full prune interval: 2
|
||||
mon osdmap full prune txsize: 2
|
||||
# thrashing monitors may make mgr have trouble w/ its keepalive
|
||||
log-whitelist:
|
||||
- daemon x is unresponsive
|
||||
|
@ -0,0 +1,22 @@
|
||||
overrides:
|
||||
ceph:
|
||||
conf:
|
||||
mon:
|
||||
mon debug extra checks: true
|
||||
mon min osdmap epochs: 100
|
||||
mon osdmap full prune enabled: true
|
||||
mon osdmap full prune min: 200
|
||||
mon osdmap full prune interval: 10
|
||||
mon osdmap full prune txsize: 100
|
||||
osd:
|
||||
osd beacon report interval: 10
|
||||
log-whitelist:
|
||||
# setting/unsetting noup will trigger health warns,
|
||||
# causing tests to fail due to health warns, even if
|
||||
# the tests themselves are successful.
|
||||
- \(OSDMAP_FLAGS\)
|
||||
tasks:
|
||||
- workunit:
|
||||
clients:
|
||||
client.0:
|
||||
- mon/test_mon_osdmap_prune.sh
|
@ -10,6 +10,13 @@ overrides:
|
||||
osd scrub max interval: 120
|
||||
osd max backfills: 3
|
||||
osd snap trim sleep: 2
|
||||
mon:
|
||||
mon min osdmap epochs: 50
|
||||
paxos service trim min: 10
|
||||
# prune full osdmaps regularly
|
||||
mon osdmap full prune min: 15
|
||||
mon osdmap full prune interval: 2
|
||||
mon osdmap full prune txsize: 2
|
||||
tasks:
|
||||
- thrashosds:
|
||||
timeout: 1200
|
||||
|
@ -6,7 +6,12 @@ overrides:
|
||||
- osd_map_cache_size
|
||||
conf:
|
||||
mon:
|
||||
mon min osdmap epochs: 2
|
||||
mon min osdmap epochs: 50
|
||||
paxos service trim min: 10
|
||||
# prune full osdmaps regularly
|
||||
mon osdmap full prune min: 15
|
||||
mon osdmap full prune interval: 2
|
||||
mon osdmap full prune txsize: 2
|
||||
osd:
|
||||
osd map cache size: 1
|
||||
osd scrub min interval: 60
|
||||
|
@ -10,6 +10,13 @@ overrides:
|
||||
filestore odsync write: true
|
||||
osd max backfills: 2
|
||||
osd snap trim sleep: .5
|
||||
mon:
|
||||
mon min osdmap epochs: 50
|
||||
paxos service trim min: 10
|
||||
# prune full osdmaps regularly
|
||||
mon osdmap full prune min: 15
|
||||
mon osdmap full prune interval: 2
|
||||
mon osdmap full prune txsize: 2
|
||||
tasks:
|
||||
- thrashosds:
|
||||
timeout: 1200
|
||||
|
@ -1,3 +1,13 @@
|
||||
overrides:
|
||||
ceph:
|
||||
conf:
|
||||
mon:
|
||||
mon min osdmap epochs: 50
|
||||
paxos service trim min: 10
|
||||
# prune full osdmaps regularly
|
||||
mon osdmap full prune min: 15
|
||||
mon osdmap full prune interval: 2
|
||||
mon osdmap full prune txsize: 2
|
||||
tasks:
|
||||
- install:
|
||||
- ceph:
|
||||
|
205
qa/workunits/mon/test_mon_osdmap_prune.sh
Executable file
205
qa/workunits/mon/test_mon_osdmap_prune.sh
Executable file
@ -0,0 +1,205 @@
|
||||
#!/bin/bash
|
||||
|
||||
. $(dirname $0)/../../standalone/ceph-helpers.sh
|
||||
|
||||
set -x
|
||||
|
||||
function wait_for_osdmap_manifest() {
|
||||
|
||||
local what=${1:-"true"}
|
||||
|
||||
local -a delays=($(get_timeout_delays $TIMEOUT .1))
|
||||
local -i loop=0
|
||||
|
||||
for ((i=0; i < ${#delays[*]}; ++i)); do
|
||||
has_manifest=$(ceph report | jq 'has("osdmap_manifest")')
|
||||
if [[ "$has_manifest" == "$what" ]]; then
|
||||
return 0
|
||||
fi
|
||||
|
||||
sleep ${delays[$i]}
|
||||
done
|
||||
|
||||
echo "osdmap_manifest never outputted on report"
|
||||
ceph report
|
||||
return 1
|
||||
}
|
||||
|
||||
function wait_for_trim() {
|
||||
|
||||
local -i epoch=$1
|
||||
local -a delays=($(get_timeout_delays $TIMEOUT .1))
|
||||
local -i loop=0
|
||||
|
||||
for ((i=0; i < ${#delays[*]}; ++i)); do
|
||||
fc=$(ceph report | jq '.osdmap_first_committed')
|
||||
if [[ $fc -eq $epoch ]]; then
|
||||
return 0
|
||||
fi
|
||||
sleep ${delays[$i]}
|
||||
done
|
||||
|
||||
echo "never trimmed up to epoch $epoch"
|
||||
ceph report
|
||||
return 1
|
||||
}
|
||||
|
||||
function test_osdmap() {
|
||||
|
||||
local epoch=$1
|
||||
local ret=0
|
||||
|
||||
tmp_map=$(mktemp)
|
||||
ceph osd getmap $epoch -o $tmp_map || return 1
|
||||
if ! osdmaptool --print $tmp_map | grep "epoch $epoch" ; then
|
||||
echo "ERROR: failed processing osdmap epoch $epoch"
|
||||
ret=1
|
||||
fi
|
||||
rm $tmp_map
|
||||
return $ret
|
||||
}
|
||||
|
||||
function generate_osdmaps() {
|
||||
|
||||
local -i num=$1
|
||||
|
||||
cmds=( set unset )
|
||||
for ((i=0; i < num; ++i)); do
|
||||
ceph osd ${cmds[$((i%2))]} noup || return 1
|
||||
done
|
||||
return 0
|
||||
}
|
||||
|
||||
function test_mon_osdmap_prune() {
|
||||
|
||||
create_pool foo 32
|
||||
wait_for_clean || return 1
|
||||
|
||||
ceph config set mon mon_debug_block_osdmap_trim true || return 1
|
||||
|
||||
generate_osdmaps 500 || return 1
|
||||
|
||||
report="$(ceph report)"
|
||||
fc=$(jq '.osdmap_first_committed' <<< $report)
|
||||
lc=$(jq '.osdmap_last_committed' <<< $report)
|
||||
|
||||
[[ $((lc-fc)) -ge 500 ]] || return 1
|
||||
|
||||
wait_for_osdmap_manifest || return 1
|
||||
|
||||
manifest="$(ceph report | jq '.osdmap_manifest')"
|
||||
|
||||
first_pinned=$(jq '.first_pinned' <<< $manifest)
|
||||
last_pinned=$(jq '.last_pinned' <<< $manifest)
|
||||
pinned_maps=( $(jq '.pinned_maps[]' <<< $manifest) )
|
||||
|
||||
# validate pinned maps list
|
||||
[[ $first_pinned -eq ${pinned_maps[0]} ]] || return 1
|
||||
[[ $last_pinned -eq ${pinned_maps[-1]} ]] || return 1
|
||||
|
||||
# validate pinned maps range
|
||||
[[ $first_pinned -lt $last_pinned ]] || return 1
|
||||
[[ $last_pinned -lt $lc ]] || return 1
|
||||
[[ $first_pinned -eq $fc ]] || return 1
|
||||
|
||||
# ensure all the maps are available, and work as expected
|
||||
# this can take a while...
|
||||
|
||||
for ((i=$first_pinned; i <= $last_pinned; ++i)); do
|
||||
test_osdmap $i || return 1
|
||||
done
|
||||
|
||||
# update pinned maps state:
|
||||
# the monitor may have pruned & pinned additional maps since we last
|
||||
# assessed state, given it's an iterative process.
|
||||
#
|
||||
manifest="$(ceph report | jq '.osdmap_manifest')"
|
||||
first_pinned=$(jq '.first_pinned' <<< $manifest)
|
||||
last_pinned=$(jq '.last_pinned' <<< $manifest)
|
||||
pinned_maps=( $(jq '.pinned_maps[]' <<< $manifest) )
|
||||
|
||||
# test trimming maps
|
||||
#
|
||||
# we're going to perform the following tests:
|
||||
#
|
||||
# 1. force trim to a pinned map
|
||||
# 2. force trim to a pinned map's previous epoch
|
||||
# 3. trim all maps except the last 200 or so.
|
||||
#
|
||||
|
||||
# 1. force trim to a pinned map
|
||||
#
|
||||
[[ ${#pinned_maps[@]} -gt 10 ]] || return 1
|
||||
|
||||
trim_to=${pinned_maps[1]}
|
||||
ceph config set mon mon_osd_force_trim_to $trim_to
|
||||
ceph config set mon mon_min_osdmap_epochs 100
|
||||
ceph config set mon paxos_service_trim_min 1
|
||||
ceph config set mon mon_debug_block_osdmap_trim false
|
||||
|
||||
# generate an epoch so we get to trim maps
|
||||
ceph osd set noup
|
||||
ceph osd unset noup
|
||||
|
||||
wait_for_trim $trim_to || return 1
|
||||
|
||||
report="$(ceph report)"
|
||||
fc=$(jq '.osdmap_first_committed' <<< $report)
|
||||
[[ $fc -eq $trim_to ]] || return 1
|
||||
|
||||
old_first_pinned=$first_pinned
|
||||
old_last_pinned=$last_pinned
|
||||
first_pinned=$(jq '.osdmap_manifest.first_pinned' <<< $report)
|
||||
last_pinned=$(jq '.osdmap_manifest.last_pinned' <<< $report)
|
||||
[[ $first_pinned -eq $trim_to ]] || return 1
|
||||
[[ $first_pinned -gt $old_first_pinned ]] || return 1
|
||||
[[ $last_pinned -gt $old_first_pinned ]] || return 1
|
||||
|
||||
test_osdmap $trim_to || return 1
|
||||
test_osdmap $(( trim_to+1 )) || return 1
|
||||
|
||||
pinned_maps=( $(jq '.osdmap_manifest.pinned_maps[]' <<< $report) )
|
||||
|
||||
# 2. force trim to a pinned map's previous epoch
|
||||
#
|
||||
[[ ${#pinned_maps[@]} -gt 2 ]] || return 1
|
||||
trim_to=$(( ${pinned_maps[1]} - 1))
|
||||
ceph config set mon mon_osd_force_trim_to $trim_to
|
||||
|
||||
# generate an epoch so we get to trim maps
|
||||
ceph osd set noup
|
||||
ceph osd unset noup
|
||||
|
||||
wait_for_trim $trim_to || return 1
|
||||
|
||||
report="$(ceph report)"
|
||||
fc=$(jq '.osdmap_first_committed' <<< $report)
|
||||
[[ $fc -eq $trim_to ]] || return 1
|
||||
|
||||
old_first_pinned=$first_pinned
|
||||
old_last_pinned=$last_pinned
|
||||
first_pinned=$(jq '.osdmap_manifest.first_pinned' <<< $report)
|
||||
last_pinned=$(jq '.osdmap_manifest.last_pinned' <<< $report)
|
||||
pinned_maps=( $(jq '.osdmap_manifest.pinned_maps[]' <<< $report) )
|
||||
[[ $first_pinned -eq $trim_to ]] || return 1
|
||||
[[ ${pinned_maps[1]} -eq $(( trim_to+1)) ]] || return 1
|
||||
|
||||
test_osdmap $first_pinned || return 1
|
||||
test_osdmap $(( first_pinned + 1 )) || return 1
|
||||
|
||||
# 3. trim everything
|
||||
#
|
||||
ceph config set mon mon_osd_force_trim_to 0
|
||||
|
||||
# generate an epoch so we get to trim maps
|
||||
ceph osd set noup
|
||||
ceph osd unset noup
|
||||
|
||||
wait_for_osdmap_manifest "false" || return 1
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
test_mon_osdmap_prune || exit 1
|
||||
|
||||
echo "OK"
|
@ -1152,6 +1152,36 @@ std::vector<Option> get_global_options() {
|
||||
.set_default(true)
|
||||
.set_description(""),
|
||||
|
||||
/* -- mon: osdmap prune (begin) -- */
|
||||
Option("mon_osdmap_full_prune_enabled", Option::TYPE_BOOL, Option::LEVEL_ADVANCED)
|
||||
.set_default(true)
|
||||
.set_description("Enables pruning full osdmap versions when we go over a given number of maps")
|
||||
.add_see_also("mon_osdmap_full_prune_min")
|
||||
.add_see_also("mon_osdmap_full_prune_interval")
|
||||
.add_see_also("mon_osdmap_full_prune_txsize"),
|
||||
|
||||
Option("mon_osdmap_full_prune_min", Option::TYPE_UINT, Option::LEVEL_ADVANCED)
|
||||
.set_default(10000)
|
||||
.set_description("Minimum number of versions in the store to trigger full map pruning")
|
||||
.add_see_also("mon_osdmap_full_prune_enabled")
|
||||
.add_see_also("mon_osdmap_full_prune_interval")
|
||||
.add_see_also("mon_osdmap_full_prune_txsize"),
|
||||
|
||||
Option("mon_osdmap_full_prune_interval", Option::TYPE_UINT, Option::LEVEL_ADVANCED)
|
||||
.set_default(10)
|
||||
.set_description("Interval between maps that will not be pruned; maps in the middle will be pruned.")
|
||||
.add_see_also("mon_osdmap_full_prune_enabled")
|
||||
.add_see_also("mon_osdmap_full_prune_interval")
|
||||
.add_see_also("mon_osdmap_full_prune_txsize"),
|
||||
|
||||
Option("mon_osdmap_full_prune_txsize", Option::TYPE_UINT, Option::LEVEL_ADVANCED)
|
||||
.set_default(100)
|
||||
.set_description("Number of maps we will prune per iteration")
|
||||
.add_see_also("mon_osdmap_full_prune_enabled")
|
||||
.add_see_also("mon_osdmap_full_prune_interval")
|
||||
.add_see_also("mon_osdmap_full_prune_txsize"),
|
||||
/* -- mon: osdmap prune (end) -- */
|
||||
|
||||
Option("mon_osd_cache_size", Option::TYPE_INT, Option::LEVEL_ADVANCED)
|
||||
.set_default(10)
|
||||
.set_description(""),
|
||||
@ -1606,6 +1636,22 @@ std::vector<Option> get_global_options() {
|
||||
.set_default(false)
|
||||
.set_description(""),
|
||||
|
||||
Option("mon_debug_extra_checks", Option::TYPE_BOOL, Option::LEVEL_DEV)
|
||||
.set_default(false)
|
||||
.set_description("Enable some additional monitor checks")
|
||||
.set_long_description(
|
||||
"Enable some additional monitor checks that would be too expensive "
|
||||
"to run on production systems, or would only be relevant while "
|
||||
"testing or debugging."),
|
||||
|
||||
Option("mon_debug_block_osdmap_trim", Option::TYPE_BOOL, Option::LEVEL_DEV)
|
||||
.set_default(false)
|
||||
.set_description("Block OSDMap trimming while the option is enabled.")
|
||||
.set_long_description(
|
||||
"Blocking OSDMap trimming may be quite helpful to easily reproduce "
|
||||
"states in which the monitor keeps (hundreds of) thousands of "
|
||||
"osdmaps."),
|
||||
|
||||
Option("mon_debug_deprecated_as_obsolete", Option::TYPE_BOOL, Option::LEVEL_DEV)
|
||||
.set_default(false)
|
||||
.set_description(""),
|
||||
|
@ -485,6 +485,14 @@ const char** Monitor::get_tracked_conf_keys() const
|
||||
// scrub interval
|
||||
"mon_scrub_interval",
|
||||
"mon_allow_pool_delete",
|
||||
// osdmap pruning - observed, not handled.
|
||||
"mon_osdmap_full_prune_enabled",
|
||||
"mon_osdmap_full_prune_min",
|
||||
"mon_osdmap_full_prune_interval",
|
||||
"mon_osdmap_full_prune_txsize",
|
||||
// debug options - observed, not handled
|
||||
"mon_debug_extra_checks",
|
||||
"mon_debug_block_osdmap_trim",
|
||||
NULL
|
||||
};
|
||||
return KEYS;
|
||||
|
@ -188,6 +188,7 @@ OSDMonitor::OSDMonitor(
|
||||
cct(cct),
|
||||
inc_osd_cache(g_conf->mon_osd_cache_size),
|
||||
full_osd_cache(g_conf->mon_osd_cache_size),
|
||||
has_osdmap_manifest(false),
|
||||
last_attempted_minwait_time(utime_t()),
|
||||
mapper(mn->cct, &mn->cpu_tp)
|
||||
{}
|
||||
@ -276,6 +277,11 @@ void OSDMonitor::get_store_prefixes(std::set<string>& s) const
|
||||
|
||||
void OSDMonitor::update_from_paxos(bool *need_bootstrap)
|
||||
{
|
||||
// we really don't care if the version has been updated, because we may
|
||||
// have trimmed without having increased the last committed; yet, we may
|
||||
// need to update the in-memory manifest.
|
||||
load_osdmap_manifest();
|
||||
|
||||
version_t version = get_last_committed();
|
||||
if (version == osdmap.epoch)
|
||||
return;
|
||||
@ -903,6 +909,11 @@ void OSDMonitor::encode_pending(MonitorDBStore::TransactionRef t)
|
||||
dout(10) << "encode_pending e " << pending_inc.epoch
|
||||
<< dendl;
|
||||
|
||||
if (do_prune(t)) {
|
||||
dout(1) << __func__ << " osdmap full prune encoded e"
|
||||
<< pending_inc.epoch << dendl;
|
||||
}
|
||||
|
||||
// finalize up pending_inc
|
||||
pending_inc.modified = ceph_clock_now();
|
||||
|
||||
@ -1499,6 +1510,15 @@ version_t OSDMonitor::get_trim_to() const
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
if (g_conf->get_val<bool>("mon_debug_block_osdmap_trim")) {
|
||||
dout(0) << __func__
|
||||
<< " blocking osdmap trim"
|
||||
" ('mon_debug_block_osdmap_trim' set to 'true')"
|
||||
<< dendl;
|
||||
return 0;
|
||||
}
|
||||
|
||||
{
|
||||
epoch_t floor = get_min_last_epoch_clean();
|
||||
dout(10) << " min_last_epoch_clean " << floor << dendl;
|
||||
@ -1540,8 +1560,368 @@ void OSDMonitor::encode_trim_extra(MonitorDBStore::TransactionRef tx,
|
||||
bufferlist bl;
|
||||
get_version_full(first, bl);
|
||||
put_version_full(tx, first, bl);
|
||||
|
||||
if (has_osdmap_manifest &&
|
||||
first > osdmap_manifest.get_first_pinned()) {
|
||||
_prune_update_trimmed(tx, first);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/* full osdmap prune
|
||||
*
|
||||
* for more information, please refer to doc/dev/mon-osdmap-prune.rst
|
||||
*/
|
||||
|
||||
void OSDMonitor::load_osdmap_manifest()
|
||||
{
|
||||
bool store_has_manifest =
|
||||
mon->store->exists(get_service_name(), "osdmap_manifest");
|
||||
|
||||
if (!store_has_manifest) {
|
||||
if (!has_osdmap_manifest) {
|
||||
return;
|
||||
}
|
||||
|
||||
dout(20) << __func__
|
||||
<< " dropping osdmap manifest from memory." << dendl;
|
||||
osdmap_manifest = osdmap_manifest_t();
|
||||
has_osdmap_manifest = false;
|
||||
return;
|
||||
}
|
||||
|
||||
dout(20) << __func__
|
||||
<< " osdmap manifest detected in store; reload." << dendl;
|
||||
|
||||
bufferlist manifest_bl;
|
||||
int r = get_value("osdmap_manifest", manifest_bl);
|
||||
if (r < 0) {
|
||||
derr << __func__ << " unable to read osdmap version manifest" << dendl;
|
||||
ceph_assert(0 == "error reading manifest");
|
||||
}
|
||||
osdmap_manifest.decode(manifest_bl);
|
||||
has_osdmap_manifest = true;
|
||||
|
||||
dout(10) << __func__ << " store osdmap manifest pinned ("
|
||||
<< osdmap_manifest.get_first_pinned()
|
||||
<< " .. "
|
||||
<< osdmap_manifest.get_last_pinned()
|
||||
<< ")"
|
||||
<< dendl;
|
||||
}
|
||||
|
||||
bool OSDMonitor::should_prune() const
|
||||
{
|
||||
version_t first = get_first_committed();
|
||||
version_t last = get_last_committed();
|
||||
version_t min_osdmap_epochs =
|
||||
g_conf->get_val<int64_t>("mon_min_osdmap_epochs");
|
||||
version_t prune_min =
|
||||
g_conf->get_val<uint64_t>("mon_osdmap_full_prune_min");
|
||||
version_t prune_interval =
|
||||
g_conf->get_val<uint64_t>("mon_osdmap_full_prune_interval");
|
||||
version_t last_pinned = osdmap_manifest.get_last_pinned();
|
||||
version_t last_to_pin = last - min_osdmap_epochs;
|
||||
|
||||
// Make it or break it constraints.
|
||||
//
|
||||
// If any of these conditions fails, we will not prune, regardless of
|
||||
// whether we have an on-disk manifest with an on-going pruning state.
|
||||
//
|
||||
if ((last - first) <= min_osdmap_epochs) {
|
||||
// between the first and last committed epochs, we don't have
|
||||
// enough epochs to trim, much less to prune.
|
||||
dout(10) << __func__
|
||||
<< " currently holding only " << (last - first)
|
||||
<< " epochs (min osdmap epochs: " << min_osdmap_epochs
|
||||
<< "); do not prune."
|
||||
<< dendl;
|
||||
return false;
|
||||
|
||||
} else if ((last_to_pin - first) < prune_min) {
|
||||
// between the first committed epoch and the last epoch we would prune,
|
||||
// we simply don't have enough versions over the minimum to prune maps.
|
||||
dout(10) << __func__
|
||||
<< " could only prune " << (last_to_pin - first)
|
||||
<< " epochs (" << first << ".." << last_to_pin << "), which"
|
||||
" is less than the required minimum (" << prune_min << ")"
|
||||
<< dendl;
|
||||
return false;
|
||||
|
||||
} else if (has_osdmap_manifest && last_pinned >= last_to_pin) {
|
||||
dout(10) << __func__
|
||||
<< " we have pruned as far as we can; do not prune."
|
||||
<< dendl;
|
||||
return false;
|
||||
|
||||
} else if (last_pinned + prune_interval > last_to_pin) {
|
||||
dout(10) << __func__
|
||||
<< " not enough epochs to form an interval (last pinned: "
|
||||
<< last_pinned << ", last to pin: "
|
||||
<< last_to_pin << ", interval: " << prune_interval << ")"
|
||||
<< dendl;
|
||||
return false;
|
||||
}
|
||||
|
||||
dout(15) << __func__
|
||||
<< " should prune (" << last_pinned << ".." << last_to_pin << ")"
|
||||
<< " lc (" << first << ".." << last << ")"
|
||||
<< dendl;
|
||||
return true;
|
||||
}
|
||||
|
||||
void OSDMonitor::_prune_update_trimmed(
|
||||
MonitorDBStore::TransactionRef tx,
|
||||
version_t first)
|
||||
{
|
||||
dout(10) << __func__
|
||||
<< " first " << first
|
||||
<< " last_pinned " << osdmap_manifest.get_last_pinned()
|
||||
<< " last_pinned " << osdmap_manifest.get_last_pinned()
|
||||
<< dendl;
|
||||
|
||||
if (!osdmap_manifest.is_pinned(first)) {
|
||||
osdmap_manifest.pin(first);
|
||||
}
|
||||
|
||||
set<version_t>::iterator p_end = osdmap_manifest.pinned.find(first);
|
||||
set<version_t>::iterator p = osdmap_manifest.pinned.begin();
|
||||
osdmap_manifest.pinned.erase(p, p_end);
|
||||
ceph_assert(osdmap_manifest.get_first_pinned() == first);
|
||||
|
||||
if (osdmap_manifest.get_last_pinned() == first+1 ||
|
||||
osdmap_manifest.pinned.size() == 1) {
|
||||
// we reached the end of the line, as pinned maps go; clean up our
|
||||
// manifest, and let `should_prune()` decide whether we should prune
|
||||
// again.
|
||||
tx->erase(get_service_name(), "osdmap_manifest");
|
||||
return;
|
||||
}
|
||||
|
||||
bufferlist bl;
|
||||
osdmap_manifest.encode(bl);
|
||||
tx->put(get_service_name(), "osdmap_manifest", bl);
|
||||
}
|
||||
|
||||
void OSDMonitor::prune_init()
|
||||
{
|
||||
dout(1) << __func__ << dendl;
|
||||
|
||||
version_t pin_first;
|
||||
|
||||
if (!has_osdmap_manifest) {
|
||||
// we must have never pruned, OR if we pruned the state must no longer
|
||||
// be relevant (i.e., the state must have been removed alongside with
|
||||
// the trim that *must* have removed past the last pinned map in a
|
||||
// previous prune).
|
||||
ceph_assert(osdmap_manifest.pinned.empty());
|
||||
ceph_assert(!mon->store->exists(get_service_name(), "osdmap_manifest"));
|
||||
pin_first = get_first_committed();
|
||||
|
||||
} else {
|
||||
// we must have pruned in the past AND its state is still relevant
|
||||
// (i.e., even if we trimmed, we still hold pinned maps in the manifest,
|
||||
// and thus we still hold a manifest in the store).
|
||||
ceph_assert(!osdmap_manifest.pinned.empty());
|
||||
ceph_assert(osdmap_manifest.get_first_pinned() == get_first_committed());
|
||||
ceph_assert(osdmap_manifest.get_last_pinned() < get_last_committed());
|
||||
|
||||
dout(10) << __func__
|
||||
<< " first_pinned " << osdmap_manifest.get_first_pinned()
|
||||
<< " last_pinned " << osdmap_manifest.get_last_pinned()
|
||||
<< dendl;
|
||||
|
||||
pin_first = osdmap_manifest.get_last_pinned();
|
||||
}
|
||||
|
||||
osdmap_manifest.pin(pin_first);
|
||||
}
|
||||
|
||||
bool OSDMonitor::_prune_sanitize_options() const
|
||||
{
|
||||
uint64_t prune_interval =
|
||||
g_conf->get_val<uint64_t>("mon_osdmap_full_prune_interval");
|
||||
uint64_t prune_min =
|
||||
g_conf->get_val<uint64_t>("mon_osdmap_full_prune_min");
|
||||
uint64_t txsize =
|
||||
g_conf->get_val<uint64_t>("mon_osdmap_full_prune_txsize");
|
||||
|
||||
bool r = true;
|
||||
|
||||
if (prune_interval == 0) {
|
||||
derr << __func__
|
||||
<< " prune is enabled BUT prune interval is zero; abort."
|
||||
<< dendl;
|
||||
r = false;
|
||||
} else if (prune_interval == 1) {
|
||||
derr << __func__
|
||||
<< " prune interval is equal to one, which essentially means"
|
||||
" no pruning; abort."
|
||||
<< dendl;
|
||||
r = false;
|
||||
}
|
||||
if (prune_min == 0) {
|
||||
derr << __func__
|
||||
<< " prune is enabled BUT prune min is zero; abort."
|
||||
<< dendl;
|
||||
r = false;
|
||||
}
|
||||
if (prune_interval > prune_min) {
|
||||
derr << __func__
|
||||
<< " impossible to ascertain proper prune interval because"
|
||||
<< " it is greater than the minimum prune epochs"
|
||||
<< " (min: " << prune_min << ", interval: " << prune_interval << ")"
|
||||
<< dendl;
|
||||
r = false;
|
||||
}
|
||||
|
||||
if (txsize <= prune_interval) {
|
||||
derr << __func__
|
||||
<< "'mon_osdmap_full_prune_txsize' (" << txsize
|
||||
<< ") <= 'mon_osdmap_full_prune_interval' (" << prune_interval
|
||||
<< "); abort." << dendl;
|
||||
r = false;
|
||||
}
|
||||
return r;
|
||||
}
|
||||
|
||||
bool OSDMonitor::is_prune_enabled() const {
|
||||
return g_conf->get_val<bool>("mon_osdmap_full_prune_enabled");
|
||||
}
|
||||
|
||||
bool OSDMonitor::is_prune_supported() const {
|
||||
return mon->get_required_mon_features().contains_any(
|
||||
ceph::features::mon::FEATURE_OSDMAP_PRUNE);
|
||||
}
|
||||
|
||||
/** do_prune
|
||||
*
|
||||
* @returns true if has side-effects; false otherwise.
|
||||
*/
|
||||
bool OSDMonitor::do_prune(MonitorDBStore::TransactionRef tx)
|
||||
{
|
||||
bool enabled = is_prune_enabled();
|
||||
|
||||
dout(1) << __func__ << " osdmap full prune "
|
||||
<< ( enabled ? "enabled" : "disabled")
|
||||
<< dendl;
|
||||
|
||||
if (!enabled || !_prune_sanitize_options() || !should_prune()) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// we are beyond the minimum prune versions, we need to remove maps because
|
||||
// otherwise the store will grow unbounded and we may end up having issues
|
||||
// with available disk space or store hangs.
|
||||
|
||||
// we will not pin all versions. We will leave a buffer number of versions.
|
||||
// this allows us the monitor to trim maps without caring too much about
|
||||
// pinned maps, and then allow us to use another ceph-mon without these
|
||||
// capabilities, without having to repair the store.
|
||||
|
||||
version_t first = get_first_committed();
|
||||
version_t last = get_last_committed();
|
||||
|
||||
version_t last_to_pin = last - g_conf->mon_min_osdmap_epochs;
|
||||
version_t last_pinned = osdmap_manifest.get_last_pinned();
|
||||
uint64_t prune_interval =
|
||||
g_conf->get_val<uint64_t>("mon_osdmap_full_prune_interval");
|
||||
uint64_t txsize =
|
||||
g_conf->get_val<uint64_t>("mon_osdmap_full_prune_txsize");
|
||||
|
||||
prune_init();
|
||||
|
||||
// we need to get rid of some osdmaps
|
||||
|
||||
dout(5) << __func__
|
||||
<< " lc (" << first << " .. " << last << ")"
|
||||
<< " last_pinned " << last_pinned
|
||||
<< " interval " << prune_interval
|
||||
<< " last_to_pin " << last_to_pin
|
||||
<< dendl;
|
||||
|
||||
// We will be erasing maps as we go.
|
||||
//
|
||||
// We will erase all maps between `last_pinned` and the `next_to_pin`.
|
||||
//
|
||||
// If `next_to_pin` happens to be greater than `last_to_pin`, then
|
||||
// we stop pruning. We could prune the maps between `next_to_pin` and
|
||||
// `last_to_pin`, but by not doing it we end up with neater pruned
|
||||
// intervals, aligned with `prune_interval`. Besides, this should not be a
|
||||
// problem as long as `prune_interval` is set to a sane value, instead of
|
||||
// hundreds or thousands of maps.
|
||||
|
||||
auto map_exists = [this](version_t v) {
|
||||
string k = mon->store->combine_strings("full", v);
|
||||
return mon->store->exists(get_service_name(), k);
|
||||
};
|
||||
|
||||
// 'interval' represents the number of maps from the last pinned
|
||||
// i.e., if we pinned version 1 and have an interval of 10, we're pinning
|
||||
// version 11 next; all intermediate versions will be removed.
|
||||
//
|
||||
// 'txsize' represents the maximum number of versions we'll be removing in
|
||||
// this iteration. If 'txsize' is large enough to perform multiple passes
|
||||
// pinning and removing maps, we will do so; if not, we'll do at least one
|
||||
// pass. We are quite relaxed about honouring 'txsize', but we'll always
|
||||
// ensure that we never go *over* the maximum.
|
||||
|
||||
// e.g., if we pin 1 and 11, we're removing versions [2..10]; i.e., 9 maps.
|
||||
uint64_t removal_interval = prune_interval - 1;
|
||||
|
||||
if (txsize < removal_interval) {
|
||||
dout(5) << __func__
|
||||
<< " setting txsize to removal interval size ("
|
||||
<< removal_interval << " versions"
|
||||
<< dendl;
|
||||
txsize = removal_interval;
|
||||
}
|
||||
ceph_assert(removal_interval > 0);
|
||||
|
||||
uint64_t num_pruned = 0;
|
||||
while (num_pruned + removal_interval <= txsize) {
|
||||
last_pinned = osdmap_manifest.get_last_pinned();
|
||||
|
||||
if (last_pinned + prune_interval > last_to_pin) {
|
||||
break;
|
||||
}
|
||||
ceph_assert(last_pinned < last_to_pin);
|
||||
|
||||
version_t next_pinned = last_pinned + prune_interval;
|
||||
ceph_assert(next_pinned <= last_to_pin);
|
||||
osdmap_manifest.pin(next_pinned);
|
||||
|
||||
dout(20) << __func__
|
||||
<< " last_pinned " << last_pinned
|
||||
<< " next_pinned " << next_pinned
|
||||
<< " num_pruned " << num_pruned
|
||||
<< " removal interval (" << (last_pinned+1)
|
||||
<< ".." << (next_pinned-1) << ")"
|
||||
<< " txsize " << txsize << dendl;
|
||||
|
||||
ceph_assert(map_exists(last_pinned));
|
||||
ceph_assert(map_exists(next_pinned));
|
||||
|
||||
for (version_t v = last_pinned+1; v < next_pinned; ++v) {
|
||||
ceph_assert(!osdmap_manifest.is_pinned(v));
|
||||
|
||||
dout(20) << __func__ << " pruning full osdmap e" << v << dendl;
|
||||
string full_key = mon->store->combine_strings("full", v);
|
||||
tx->erase(get_service_name(), full_key);
|
||||
++num_pruned;
|
||||
}
|
||||
}
|
||||
|
||||
ceph_assert(num_pruned > 0);
|
||||
|
||||
bufferlist bl;
|
||||
osdmap_manifest.encode(bl);
|
||||
tx->put(get_service_name(), "osdmap_manifest", bl);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
// -------------
|
||||
|
||||
bool OSDMonitor::preprocess_query(MonOpRequestRef op)
|
||||
@ -3125,16 +3505,138 @@ int OSDMonitor::get_version(version_t ver, bufferlist& bl)
|
||||
return ret;
|
||||
}
|
||||
|
||||
int OSDMonitor::get_inc(version_t ver, OSDMap::Incremental& inc)
|
||||
{
|
||||
bufferlist inc_bl;
|
||||
int err = get_version(ver, inc_bl);
|
||||
ceph_assert(err == 0);
|
||||
ceph_assert(inc_bl.length());
|
||||
|
||||
bufferlist::iterator p = inc_bl.begin();
|
||||
inc.decode(p);
|
||||
dout(10) << __func__ << " "
|
||||
<< " epoch " << inc.epoch
|
||||
<< " inc_crc " << inc.inc_crc
|
||||
<< " full_crc " << inc.full_crc
|
||||
<< " encode_features " << inc.encode_features << dendl;
|
||||
return 0;
|
||||
}
|
||||
|
||||
int OSDMonitor::get_full_from_pinned_map(version_t ver, bufferlist& bl)
|
||||
{
|
||||
dout(10) << __func__ << " ver " << ver << dendl;
|
||||
|
||||
version_t closest_pinned = osdmap_manifest.get_lower_closest_pinned(ver);
|
||||
if (closest_pinned == 0) {
|
||||
return -ENOENT;
|
||||
}
|
||||
if (closest_pinned > ver) {
|
||||
dout(0) << __func__ << " pinned: " << osdmap_manifest.pinned << dendl;
|
||||
}
|
||||
ceph_assert(closest_pinned <= ver);
|
||||
|
||||
dout(10) << __func__ << " closest pinned ver " << closest_pinned << dendl;
|
||||
|
||||
// get osdmap incremental maps and apply on top of this one.
|
||||
bufferlist osdm_bl;
|
||||
bool has_cached_osdmap = false;
|
||||
for (version_t v = ver-1; v >= closest_pinned; --v) {
|
||||
if (full_osd_cache.lookup(v, &osdm_bl)) {
|
||||
dout(10) << __func__ << " found map in cache ver " << v << dendl;
|
||||
closest_pinned = v;
|
||||
has_cached_osdmap = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!has_cached_osdmap) {
|
||||
int err = PaxosService::get_version_full(closest_pinned, osdm_bl);
|
||||
if (err != 0) {
|
||||
derr << __func__ << " closest pinned map ver " << closest_pinned
|
||||
<< " not available! error: " << cpp_strerror(err) << dendl;
|
||||
}
|
||||
ceph_assert(err == 0);
|
||||
}
|
||||
|
||||
ceph_assert(osdm_bl.length());
|
||||
|
||||
OSDMap osdm;
|
||||
osdm.decode(osdm_bl);
|
||||
|
||||
dout(10) << __func__ << " loaded osdmap epoch " << closest_pinned
|
||||
<< " e" << osdm.epoch
|
||||
<< " crc " << osdm.get_crc()
|
||||
<< " -- applying incremental maps." << dendl;
|
||||
|
||||
uint64_t encode_features = 0;
|
||||
for (version_t v = closest_pinned + 1; v <= ver; ++v) {
|
||||
dout(20) << __func__ << " applying inc epoch " << v << dendl;
|
||||
|
||||
OSDMap::Incremental inc;
|
||||
int err = get_inc(v, inc);
|
||||
ceph_assert(err == 0);
|
||||
|
||||
encode_features = inc.encode_features;
|
||||
|
||||
err = osdm.apply_incremental(inc);
|
||||
ceph_assert(err == 0);
|
||||
|
||||
// this block performs paranoid checks on map retrieval
|
||||
if (g_conf->get_val<bool>("mon_debug_extra_checks") &&
|
||||
inc.full_crc != 0) {
|
||||
|
||||
uint64_t f = encode_features;
|
||||
if (!f) {
|
||||
f = (mon->quorum_con_features ? mon->quorum_con_features : -1);
|
||||
}
|
||||
|
||||
// encode osdmap to force calculating crcs
|
||||
bufferlist tbl;
|
||||
osdm.encode(tbl, f | CEPH_FEATURE_RESERVED);
|
||||
// decode osdmap to compare crcs with what's expected by incremental
|
||||
OSDMap tosdm;
|
||||
tosdm.decode(tbl);
|
||||
|
||||
if (tosdm.get_crc() != inc.full_crc) {
|
||||
derr << __func__
|
||||
<< " osdmap crc mismatch! (osdmap crc " << tosdm.get_crc()
|
||||
<< ", expected " << inc.full_crc << ")" << dendl;
|
||||
ceph_assert(0 == "osdmap crc mismatch");
|
||||
}
|
||||
}
|
||||
|
||||
// note: we cannot add the recently computed map to the cache, as is,
|
||||
// because we have not encoded the map into a bl.
|
||||
}
|
||||
|
||||
if (!encode_features) {
|
||||
dout(10) << __func__
|
||||
<< " last incremental map didn't have features;"
|
||||
<< " defaulting to quorum's or all" << dendl;
|
||||
encode_features =
|
||||
(mon->quorum_con_features ? mon->quorum_con_features : -1);
|
||||
}
|
||||
osdm.encode(bl, encode_features | CEPH_FEATURE_RESERVED);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int OSDMonitor::get_version_full(version_t ver, bufferlist& bl)
|
||||
{
|
||||
if (full_osd_cache.lookup(ver, &bl)) {
|
||||
return 0;
|
||||
}
|
||||
int ret = PaxosService::get_version_full(ver, bl);
|
||||
if (!ret) {
|
||||
full_osd_cache.add(ver, bl);
|
||||
if (ret == -ENOENT) {
|
||||
// build map?
|
||||
ret = get_full_from_pinned_map(ver, bl);
|
||||
}
|
||||
return ret;
|
||||
if (ret != 0) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
full_osd_cache.add(ver, bl);
|
||||
return 0;
|
||||
}
|
||||
|
||||
epoch_t OSDMonitor::blacklist(const entity_addr_t& a, utime_t until)
|
||||
@ -3380,6 +3882,9 @@ void OSDMonitor::tick()
|
||||
|
||||
dout(10) << osdmap << dendl;
|
||||
|
||||
// always update osdmap manifest, regardless of being the leader.
|
||||
load_osdmap_manifest();
|
||||
|
||||
if (!mon->is_leader()) return;
|
||||
|
||||
bool do_propose = false;
|
||||
@ -3390,8 +3895,16 @@ void OSDMonitor::tick()
|
||||
}
|
||||
|
||||
// mark osds down?
|
||||
if (check_failures(now))
|
||||
if (check_failures(now)) {
|
||||
do_propose = true;
|
||||
}
|
||||
|
||||
// Force a proposal if we need to prune; pruning is performed on
|
||||
// ``encode_pending()``, hence why we need to regularly trigger a proposal
|
||||
// even if there's nothing going on.
|
||||
if (is_prune_enabled() && should_prune()) {
|
||||
do_propose = true;
|
||||
}
|
||||
|
||||
// mark down osds out?
|
||||
|
||||
@ -3565,6 +4078,12 @@ void OSDMonitor::dump_info(Formatter *f)
|
||||
f->open_object_section("crushmap");
|
||||
osdmap.crush->dump(f);
|
||||
f->close_section();
|
||||
|
||||
if (has_osdmap_manifest) {
|
||||
f->open_object_section("osdmap_manifest");
|
||||
osdmap_manifest.dump(f);
|
||||
f->close_section();
|
||||
}
|
||||
}
|
||||
|
||||
namespace {
|
||||
|
@ -25,6 +25,7 @@
|
||||
#include <set>
|
||||
|
||||
#include "include/types.h"
|
||||
#include "include/encoding.h"
|
||||
#include "common/simple_cache.hpp"
|
||||
#include "msg/Messenger.h"
|
||||
|
||||
@ -124,6 +125,85 @@ public:
|
||||
};
|
||||
|
||||
|
||||
struct osdmap_manifest_t {
|
||||
// all the maps we have pinned -- i.e., won't be removed unless
|
||||
// they are inside a trim interval.
|
||||
set<version_t> pinned;
|
||||
|
||||
osdmap_manifest_t() {}
|
||||
|
||||
version_t get_last_pinned() const
|
||||
{
|
||||
set<version_t>::const_reverse_iterator it = pinned.crbegin();
|
||||
if (it == pinned.crend()) {
|
||||
return 0;
|
||||
}
|
||||
return *it;
|
||||
}
|
||||
|
||||
version_t get_first_pinned() const
|
||||
{
|
||||
set<version_t>::const_iterator it = pinned.cbegin();
|
||||
if (it == pinned.cend()) {
|
||||
return 0;
|
||||
}
|
||||
return *it;
|
||||
}
|
||||
|
||||
bool is_pinned(version_t v) const
|
||||
{
|
||||
return pinned.find(v) != pinned.end();
|
||||
}
|
||||
|
||||
void pin(version_t v)
|
||||
{
|
||||
pinned.insert(v);
|
||||
}
|
||||
|
||||
version_t get_lower_closest_pinned(version_t v) const {
|
||||
set<version_t>::const_iterator p = pinned.lower_bound(v);
|
||||
if (p == pinned.cend()) {
|
||||
return 0;
|
||||
} else if (*p > v) {
|
||||
if (p == pinned.cbegin()) {
|
||||
return 0;
|
||||
}
|
||||
--p;
|
||||
}
|
||||
return *p;
|
||||
}
|
||||
|
||||
void encode(bufferlist& bl) const
|
||||
{
|
||||
ENCODE_START(1, 1, bl);
|
||||
encode(pinned, bl);
|
||||
ENCODE_FINISH(bl);
|
||||
}
|
||||
|
||||
void decode(bufferlist::iterator& bl)
|
||||
{
|
||||
DECODE_START(1, bl);
|
||||
decode(pinned, bl);
|
||||
DECODE_FINISH(bl);
|
||||
}
|
||||
|
||||
void decode(bufferlist& bl) {
|
||||
bufferlist::iterator p = bl.begin();
|
||||
decode(p);
|
||||
}
|
||||
|
||||
void dump(Formatter *f) {
|
||||
f->dump_unsigned("first_pinned", get_first_pinned());
|
||||
f->dump_unsigned("last_pinned", get_last_pinned());
|
||||
f->open_array_section("pinned_maps");
|
||||
for (auto& i : pinned) {
|
||||
f->dump_unsigned("epoch", i);
|
||||
}
|
||||
f->close_section();
|
||||
}
|
||||
};
|
||||
WRITE_CLASS_ENCODER(osdmap_manifest_t);
|
||||
|
||||
class OSDMonitor : public PaxosService {
|
||||
CephContext *cct;
|
||||
|
||||
@ -142,6 +222,9 @@ public:
|
||||
SimpleLRU<version_t, bufferlist> inc_osd_cache;
|
||||
SimpleLRU<version_t, bufferlist> full_osd_cache;
|
||||
|
||||
bool has_osdmap_manifest;
|
||||
osdmap_manifest_t osdmap_manifest;
|
||||
|
||||
bool check_failures(utime_t now);
|
||||
bool check_failure(utime_t now, int target_osd, failure_info_t& fi);
|
||||
void force_failure(int target_osd, int by);
|
||||
@ -160,7 +243,7 @@ public:
|
||||
};
|
||||
|
||||
// svc
|
||||
public:
|
||||
public:
|
||||
void create_initial() override;
|
||||
void get_store_prefixes(std::set<string>& s) const override;
|
||||
|
||||
@ -171,6 +254,19 @@ private:
|
||||
void on_active() override;
|
||||
void on_restart() override;
|
||||
void on_shutdown() override;
|
||||
|
||||
/* osdmap full map prune */
|
||||
void load_osdmap_manifest();
|
||||
bool should_prune() const;
|
||||
void _prune_update_trimmed(
|
||||
MonitorDBStore::TransactionRef tx,
|
||||
version_t first);
|
||||
void prune_init();
|
||||
bool _prune_sanitize_options() const;
|
||||
bool is_prune_enabled() const;
|
||||
bool is_prune_supported() const;
|
||||
bool do_prune(MonitorDBStore::TransactionRef tx);
|
||||
|
||||
/**
|
||||
* we haven't delegated full version stashing to paxosservice for some time
|
||||
* now, making this function useless in current context.
|
||||
@ -542,6 +638,8 @@ public:
|
||||
|
||||
int get_version(version_t ver, bufferlist& bl) override;
|
||||
int get_version_full(version_t ver, bufferlist& bl) override;
|
||||
int get_inc(version_t ver, OSDMap::Incremental& inc);
|
||||
int get_full_from_pinned_map(version_t ver, bufferlist& bl);
|
||||
|
||||
epoch_t blacklist(const entity_addr_t& a, utime_t until);
|
||||
|
||||
|
@ -434,7 +434,6 @@ public:
|
||||
}
|
||||
void load_health();
|
||||
|
||||
private:
|
||||
/**
|
||||
* @defgroup PaxosService_h_store_keys Set of keys that are usually used on
|
||||
* all the services implementing this
|
||||
@ -451,6 +450,7 @@ public:
|
||||
* @}
|
||||
*/
|
||||
|
||||
private:
|
||||
/**
|
||||
* @defgroup PaxosService_h_version_cache Variables holding cached values
|
||||
* for the most used versions (first
|
||||
|
@ -493,6 +493,7 @@ namespace ceph {
|
||||
constexpr mon_feature_t FEATURE_KRAKEN( (1ULL << 0));
|
||||
constexpr mon_feature_t FEATURE_LUMINOUS( (1ULL << 1));
|
||||
constexpr mon_feature_t FEATURE_MIMIC( (1ULL << 2));
|
||||
constexpr mon_feature_t FEATURE_OSDMAP_PRUNE (1ULL << 3);
|
||||
|
||||
constexpr mon_feature_t FEATURE_RESERVED( (1ULL << 63));
|
||||
constexpr mon_feature_t FEATURE_NONE( (0ULL));
|
||||
@ -507,6 +508,7 @@ namespace ceph {
|
||||
FEATURE_KRAKEN |
|
||||
FEATURE_LUMINOUS |
|
||||
FEATURE_MIMIC |
|
||||
FEATURE_OSDMAP_PRUNE |
|
||||
FEATURE_NONE
|
||||
);
|
||||
}
|
||||
@ -525,10 +527,18 @@ namespace ceph {
|
||||
FEATURE_KRAKEN |
|
||||
FEATURE_LUMINOUS |
|
||||
FEATURE_MIMIC |
|
||||
FEATURE_OSDMAP_PRUNE |
|
||||
FEATURE_NONE
|
||||
);
|
||||
}
|
||||
|
||||
constexpr mon_feature_t get_optional() {
|
||||
return (
|
||||
FEATURE_OSDMAP_PRUNE |
|
||||
FEATURE_NONE
|
||||
);
|
||||
}
|
||||
|
||||
static inline mon_feature_t get_feature_by_name(std::string n);
|
||||
}
|
||||
}
|
||||
@ -543,6 +553,8 @@ static inline const char *ceph::features::mon::get_feature_name(uint64_t b) {
|
||||
return "luminous";
|
||||
} else if (f == FEATURE_MIMIC) {
|
||||
return "mimic";
|
||||
} else if (f == FEATURE_OSDMAP_PRUNE) {
|
||||
return "osdmap-prune";
|
||||
} else if (f == FEATURE_RESERVED) {
|
||||
return "reserved";
|
||||
}
|
||||
@ -557,6 +569,8 @@ inline mon_feature_t ceph::features::mon::get_feature_by_name(std::string n) {
|
||||
return FEATURE_LUMINOUS;
|
||||
} else if (n == "mimic") {
|
||||
return FEATURE_MIMIC;
|
||||
} else if (n == "osdmap-prune") {
|
||||
return FEATURE_OSDMAP_PRUNE;
|
||||
} else if (n == "reserved") {
|
||||
return FEATURE_RESERVED;
|
||||
}
|
||||
|
@ -11,21 +11,21 @@
|
||||
required: [none]
|
||||
|
||||
AVAILABLE FEATURES:
|
||||
supported: [kraken(1),luminous(2),mimic(4)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4)]
|
||||
supported: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
MONMAP FEATURES:
|
||||
persistent: [none]
|
||||
optional: [none]
|
||||
required: [none]
|
||||
|
||||
AVAILABLE FEATURES:
|
||||
supported: [kraken(1),luminous(2),mimic(4)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4)]
|
||||
supported: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
monmap:persistent:[none]
|
||||
monmap:optional:[none]
|
||||
monmap:required:[none]
|
||||
available:supported:[kraken(1),luminous(2),mimic(4)]
|
||||
available:persistent:[kraken(1),luminous(2),mimic(4)]
|
||||
available:supported:[kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
available:persistent:[kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
|
||||
$ monmaptool --feature-set foo /tmp/test.monmap.1234
|
||||
unknown features name 'foo' or unable to parse value: Expected option value to be integer, got 'foo'
|
||||
@ -49,8 +49,8 @@
|
||||
required: [kraken(1),unknown(16),unknown(32)]
|
||||
|
||||
AVAILABLE FEATURES:
|
||||
supported: [kraken(1),luminous(2),mimic(4)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4)]
|
||||
supported: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
|
||||
$ monmaptool --feature-unset 32 --optional --feature-list /tmp/test.monmap.1234
|
||||
monmaptool: monmap file /tmp/test.monmap.1234
|
||||
@ -60,8 +60,8 @@
|
||||
required: [kraken(1),unknown(16),unknown(32)]
|
||||
|
||||
AVAILABLE FEATURES:
|
||||
supported: [kraken(1),luminous(2),mimic(4)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4)]
|
||||
supported: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
monmaptool: writing epoch 0 to /tmp/test.monmap.1234 (1 monitors)
|
||||
|
||||
$ monmaptool --feature-unset 32 --persistent --feature-unset 16 --optional --feature-list /tmp/test.monmap.1234
|
||||
@ -72,8 +72,8 @@
|
||||
required: [kraken(1)]
|
||||
|
||||
AVAILABLE FEATURES:
|
||||
supported: [kraken(1),luminous(2),mimic(4)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4)]
|
||||
supported: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
monmaptool: writing epoch 0 to /tmp/test.monmap.1234 (1 monitors)
|
||||
|
||||
$ monmaptool --feature-unset kraken --feature-list /tmp/test.monmap.1234
|
||||
@ -84,8 +84,8 @@
|
||||
required: [none]
|
||||
|
||||
AVAILABLE FEATURES:
|
||||
supported: [kraken(1),luminous(2),mimic(4)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4)]
|
||||
supported: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
persistent: [kraken(1),luminous(2),mimic(4),osdmap-prune(8)]
|
||||
monmaptool: writing epoch 0 to /tmp/test.monmap.1234 (1 monitors)
|
||||
|
||||
$ rm /tmp/test.monmap.1234
|
||||
|
Loading…
Reference in New Issue
Block a user