Previously, errors stuck indelibly to the inode, which
meant that a close call would see an error even if the
user already dutifully fsync()'d and handled it.
We should emit each error only once per file handle.
Signed-off-by: John Spray <john.spray@redhat.com>
Sometimes I get output like:
HEALTH_ERR 2 pgs stuck unclean; Full ratio(s) out of order
Which goes away over time. So it is a transit issue
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
Adding, removing or move items / buckets via the CrushWrapper API when
choose_args is not empty is unlikely to produce the desired outcome. The
caller should instead add, remove or move items / buckets in a
decompiled crushmap, update the associated choose_arg and upload the new
crushmap.
Signed-off-by: Loic Dachary <loic@dachary.org>
A map of crush_choose_arg_map is added to the crushmap text syntax. The
key is an integer matching a pool number.
Signed-off-by: Loic Dachary <loic@dachary.org>
If there is no crush_choose_arg_map for a given pool (the default) a
NULL pointer is given instead and crush_do_rule behavior remains
unchanged.
Signed-off-by: Loic Dachary <loic@dachary.org>
bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).
We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:
position 0 position 1
item 3 0x10000 0x100000
item 7 0x40000 0x10000
item 13 0x40000 0x10000
When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.
By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.
bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.
Signed-off-by: Loic Dachary <loic@dachary.org>
At tail journal, there can be partial written entry. Before appending
new entries to the journal, we need to drop any partial written entry
and adjust write_pos. For mds log, partial written entry is detected
and dropped when replaying the journal.
For PurgeQueue journal, we don't replay the whole journal when MDS
starts. Before appending new entry to the journal, we need to drop
any partial written entry and adjust write_pos.
Previous patch makes the journal header write_pos align to boundary
of fully flushed entry. We can start finding partial written entry
from the journal header write_pos. It should be fast even when the
purge queue is very large.
Fixes: http://tracker.ceph.com/issues/19450
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Do some minor changes:
1 Restrict the total DPDK memory used by an osd instance.
change the name from bluestore_spdk_socket_mem to
bluestore_spdk_mem.
2 use spdk_env_init instead of rte_eal_init. The reason is that
SPDK lib invokes rte_eal_init which reduces the initilization
paramter conversion and check, also spdk 17.03 invokes
spdk_vtophys_register_dpdk_mem() (which is an internal function)
in spdk_env_init, and this func must be called.
Signed-off-by: optimistyzy <optimistyzy@gmail.com>
Once started we now queue scrub work at higher priority than
scheduled scrubs.
Fixes: http://tracker.ceph.com/issues/15789
Signed-off-by: David Zafman <dzafman@redhat.com>
osd,mon: misc full fixes and cleanups
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: John Spray <john.spray@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
This is a bug that would not adjust available space based on the
currently configured full ratio, but rather the mon_osd_full_ratio
default initial value.
Signed-off-by: David Zafman <dzafman@redhat.com>
We actually compute kb_used as the kb - kb_avail. We don't have the
statfs() system call issue of non-privileged f_bavail vs f_bfree. It
was assumed that used was really like (blocks - f_bfree). It is not.
Signed-off-by: David Zafman <dzafman@redhat.com>