'remap' is to non-specific a name. In particular, it
sounds like it is related to the 'remapped' PG state
but in reality it is not related.
'upmap' or 'pg-upmap' is more specific: it maps a pgid
to the 'up' set value (or item)
Signed-off-by: Sage Weil <sage@redhat.com>
Commit d1f2c557 incorrectly changed the order of variables within
the payload. This resulted in breaking the resize RPC message
with older versions of Ceph.
Fixes: http://tracker.ceph.com/issues/19636
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
New serialized Unix attrs need to reflect the change being made,
and should be reverted if the change fails.
Fixes: http://tracker.ceph.com/issues/19653
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Previously, errors stuck indelibly to the inode, which
meant that a close call would see an error even if the
user already dutifully fsync()'d and handled it.
We should emit each error only once per file handle.
Signed-off-by: John Spray <john.spray@redhat.com>
Sometimes I get output like:
HEALTH_ERR 2 pgs stuck unclean; Full ratio(s) out of order
Which goes away over time. So it is a transit issue
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
- One FreeBSD these are the service command to start a service
even if the service is not activated in /etc/rc.conf
Which will allow ceph-disk and ceph-deploy to start even without
setting /etc/rc.conf
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
Adding, removing or move items / buckets via the CrushWrapper API when
choose_args is not empty is unlikely to produce the desired outcome. The
caller should instead add, remove or move items / buckets in a
decompiled crushmap, update the associated choose_arg and upload the new
crushmap.
Signed-off-by: Loic Dachary <loic@dachary.org>
A map of crush_choose_arg_map is added to the crushmap text syntax. The
key is an integer matching a pool number.
Signed-off-by: Loic Dachary <loic@dachary.org>
If there is no crush_choose_arg_map for a given pool (the default) a
NULL pointer is given instead and crush_do_rule behavior remains
unchanged.
Signed-off-by: Loic Dachary <loic@dachary.org>
bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).
We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:
position 0 position 1
item 3 0x10000 0x100000
item 7 0x40000 0x10000
item 13 0x40000 0x10000
When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.
By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.
bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.
Signed-off-by: Loic Dachary <loic@dachary.org>
buffer::list::c_str() will rebuild itself if it isn't
contiguous, and append(char *) will do copy from the ptr.
Signed-off-by: Yunchuan Wen <yunchuan.wen@kylin-cloud.com>
Current cephfs client uses string to indicate start position of
readdir. The string is last entry of previous readdir reply.
This approach does not work for seeky readdir because we can
not easily convert the new postion to a string. For seeky readdir,
mds needs to return dentries from the beginning. Client keeps
retrying if the reply does not contain the dentry it wants.
In current version of ceph, mds sorts CDentry in its cache in
hash order. Client also uses dentry hash to compose dir postion.
For seeky readdir, if client passes the hash part of dir postion
to mds. mds can avoid replying useless dentries.
Fixes: http://tracker.ceph.com/issues/19306
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
At tail journal, there can be partial written entry. Before appending
new entries to the journal, we need to drop any partial written entry
and adjust write_pos. For mds log, partial written entry is detected
and dropped when replaying the journal.
For PurgeQueue journal, we don't replay the whole journal when MDS
starts. Before appending new entry to the journal, we need to drop
any partial written entry and adjust write_pos.
Previous patch makes the journal header write_pos align to boundary
of fully flushed entry. We can start finding partial written entry
from the journal header write_pos. It should be fast even when the
purge queue is very large.
Fixes: http://tracker.ceph.com/issues/19450
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Do some minor changes:
1 Restrict the total DPDK memory used by an osd instance.
change the name from bluestore_spdk_socket_mem to
bluestore_spdk_mem.
2 use spdk_env_init instead of rte_eal_init. The reason is that
SPDK lib invokes rte_eal_init which reduces the initilization
paramter conversion and check, also spdk 17.03 invokes
spdk_vtophys_register_dpdk_mem() (which is an internal function)
in spdk_env_init, and this func must be called.
Signed-off-by: optimistyzy <optimistyzy@gmail.com>
Once started we now queue scrub work at higher priority than
scheduled scrubs.
Fixes: http://tracker.ceph.com/issues/15789
Signed-off-by: David Zafman <dzafman@redhat.com>