Notify kernel to invalidate top level directory entries. As a side
effect, the kernel inode cache get shrinked.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
After a split the pg stats are approximate but not precisely correct. Any
inaccuracy can be problematic for the agent because it determines the
level of effort and potentially full/blocking behavior based on that.
We could concievably do some estimation here that is "safe" in that we
don't commit to too much effort (or back off later if it isn't paying off)
and never block, but that is error-prone.
Instead, just disable the agent until a scrub makes the stats reliable
again.
We should document that a scrub after split is recommended (in any case)
and especially important on cache tiers, but there are currently *no*
user docs about PG splitting.
Fixes: #7975
Signed-off-by: Sage Weil <sage@inktank.com>
This ensures that they get new maps before an op which requires them (that
they would then request from the monitor).
Signed-off-by: Greg Farnum <greg@inktank.com>
The hit_set transactions may include both a modify of the new hit_set and
deletion of an old one, spanning the backfill boundary, and we may end up
sending a backfill target a blank transaction that does not correctly
remove the old object. Later it will notice the stray object and
throw an assertion.
Fix this by skipping hit_set_persist() if any of the backfill targets are
still working on the very first hash value in the PG (which is where all
of the hit_set objects live). This is coarse but simple.
Another solution would be to send separate ops for the trim/deletion and
new hit_set update, but that is a bit more complex and a bit more
runtime overhead (twice the messages).
Fixes: #7983
Signed-off-by: Sage Weil <sage@inktank.com>
This reintroduces the same semantics that were in place in dumpling prior
to the refactoring of the cap/command matching code.
We haven't added this requirement to auth read-write operations as that
would have the potential to break a lot of well-configured keyrings once
the users upgraded, without any significant gain -- we assume that if
they have set 'rw' caps on a given entity, they are indeed expecting said
entity to be sort-of-privileged entities with regard to monitor access.
Fixes: #7919
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
While we're here, remove the non-const get_xlock_by() (because
we don't need it). Also note we return a full MutationRef
(instead of a ref to the stored one). It's necessary in case we
don't have a set-up more() object.
Signed-off-by: Greg Farnum <greg@inktank.com>
We keep an MDRequestImpl::set_self_ref(MDRequestRef&) function so
that we don't need to do the pointer conversion elsewhere.
Signed-off-by: Greg Farnum <greg@inktank.com>
We're switching the MDRequest to be used as a shared pointer. This is the
first step on the path to inserting an OpTracker into the MDS.
Give the MDRequestImpl a weak_ptr self_ref so that we can keep
using the elist for now.
Signed-off-by: Greg Farnum <greg@inktank.com>
When MDS receives the getattr request, corresponding inode's filelock
can be in unstable state which waits for client's Fr cap.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Pass properly 'retain' to Client::send_cap() because it is used to
adjust cap->issued.
Also make Client::encode_inode_release() not release used/dirty caps.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
The following
./ceph osd pool create data-cache 8 8
./ceph osd tier add data data-cache
./ceph osd tier cache-mode data-cache writeback
./ceph osd tier set-overlay data data-cache
./rados -p data create foo
./rados -p data stat foo
results in
error stat-ing data/foo: No such file or directory
even though foo exists in the data-cache pool, as it should. STAT
checks for (exists && !is_whiteout()), but the whiteout flag isn't
cleared on CREATE as it is on WRITE and WRITEFULL. The problem is
that, for newly created 0-sized cache pool objects, CREATE handler in
do_osd_ops() doesn't get a chance to queue OP_TOUCH, and so the logic
in prepare_transaction() considers CREATE to be a read and therefore
doesn't clear whiteout. Fix it by allowing CREATE handler to queue
OP_TOUCH at all times, mimicking WRITE and WRITEFULL behaviour.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
When getting a REJECT from a backfill target, tell already GRANTed targets to
go back to RepNotRecovering state by sending a REJECT to them.
Fixes: #7922
Signed-off-by: David Zafman <david.zafman@inktank.com>
Fixes: #7978
We tried to move to the next placement rule, but we were already at the
last one, so we ended up looping forever.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
find_object_context provides some niceties which we don't need since we know
the oid of the clones. Problematically, it also return ENOENT if the snap
requested happens to have been removed. Even in such a case, the clone may
well still exist for other snaps. Rather than modify find_object_context to
avoid this situation for this caller, we'll simply do it inline in do_op.
Fixes: #7858
Signed-off-by: Samuel Just <sam.just@inktank.com>
Head eviction implies that no clones are present. Also, add
an exists flag to SnapSetContext in order prevent an ssc from
a recent eviction from preventing a snap read from activating
the promotion machinery.
Fixes: #7858
Signed-off-by: Samuel Just <sam.just@inktank.com>
This will make the OSD randomly reject backfill reservation requests. This
exercises the failure code paths but does not break overall behavior
because the primary will back off and retry later.
This should help us reproduce #7922.
Signed-off-by: Sage Weil <sage@inktank.com>