The MDS expects to be able to perform writes to OSDs even
if the full ratio has been reached, in order to journal
file deletions to free space.
Fixes: #7780
Signed-off-by: John Spray <john.spray@inktank.com>
OSDs that for some reason get behind on processing their op queue break
expect_alloc_hint_eq(), as it pokes the FS and not the journal. Fix it
by flushing the journal before proceeding with anything else.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Add flush_journal admin socket command to be able to flush journal to
the permanent store for online osds. (For offline osds we already have
ceph-osd --flush-journal.)
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Only rbd and mount_ceph need secret.c, and only secret.c needs libkeyutils;
remove it from LIBCOMMON_DEPS so it's not a dependency for everything,
remove secret.c from libcommon.a, and add it to mount.ceph/rbd's sources;
add LIBKEYID_LIB to mount.ceph/rbd's LDADD.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Since all we really need on a snapdir is the context, we really only
need it to be !missing. However, it might become !missing before it
becomes !unreadable. That allows ops to end up in the
waiting_for_degraded queue before one in waiting_for_unreadable is
woken, which allows the ops to be reordered. Rather than reintroduce an
extra waiting_for_missing queue, simply require !unreadable for snapdir
(which implies !misssing).
Fixes: #7777
Signed-off-by: Samuel Just <sam.just@inktank.com>
wip-mon-docs: Better explain required number of monitors & how to troubleshoot a monitor
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
Otherwise, we might get into a situation where the primary
forgets about a stray pg. This is simpler and does not
increase the number of notifies by much.
Fixes: #7733
Signed-off-by: Samuel Just <sam.just@inktank.com>
The previous logic should have kept the current best info if it found a
replica which best could log-recover, but p couldn't. However, the
continue in that loop advanced the inner loop instead of the outer loop
allowing the primary case to take over in cases where best had a longer
tail. Instead, we will prefer the longer tail regardless of the other
infos to simplify the logic.
Fixes: #7755
Signed-off-by: Samuel Just <sam.just@inktank.com>
Not handling the error return from cmd_getval() may leave uninitialzied
values, which can cause issues, specially with non-string values.
Fixes: 6806
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
The qa and functional tests are adapted to the new command prototype
requiring a profile instead of a list of properties. When possible the
implicit ruleset creation is used to simplify the test setup.
Signed-off-by: Loic Dachary <loic@dachary.org>
Cleanup the TEST_crush_rule_all function from redundant leftovers.
Explicitly test crush rule rm instead of implicitly.
Signed-off-by: Loic Dachary <loic@dachary.org>
The "default" erasure_code_profile is set by OSDMap::build_simple using
the osd_pool_default_erasure_code_profile default configuration option.
Signed-off-by: Loic Dachary <loic@dachary.org>
The poolstr is removed from the prepare_pool_crush_ruleset prototype
because it no longer decides for the default ruleset, if it is not
omitted by the caller of osd pool create.
If no profile
profile = default
If no ruleset and profile is default
ruleset = erasure-code
If no ruleset and profile is not default
ruleset = the name of the pool
Signed-off-by: Loic Dachary <loic@dachary.org>
The ruleset to be used for the new erasure coded pool was expected in
the properties, under the name crush_ruleset. It does not belong to the
erasure code profile and needs to be added to the prototype explicitly.
The crush ruleset name is added to the prototype of the prepare_new_pool
and prepare_pool_crush_ruleset methods.
Signed-off-by: Loic Dachary <loic@dachary.org>