User testing has shown that smaller values yield better results; see #4917.
Jim's testing has had good results with even more aggressive trimming, but I
would like to do more validation yet before changing defaults.
Signed-off-by: Sage Weil <sage@inktank.com>
We regularly have been observing a stall where the MDS is blocked waiting
for a cap revocation (Ls, in our case) and never gets a reply. We finally
tracked down the sequence:
- mds issues cap seq 1 to client
- mds does revocation (seq 2)
- client replies
- much time goes by
- client trims inode from cache, sends release with seq == 2
- mds ignores release because its issue_seq is 1
- mds later tries to revoke other caps
- client discards message because it doesn't have the inode in cache
The problem is simply that we are using seq instead of issue_seq in the
cap release message. Note that the other release call site in
encode_inode_release() is correct. That one is much more commonly
triggered by short tests, as compared to this case where the inode needs to
get pushed out of the client cache.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
set_priv() expects to be given a reference to own; take one. This fixes
various crashes after we see a hb connection reset.
Signed-off-by: Sage Weil <sage@inktank.com>
If argparse gets its hands on it, it's not available for parse_argv()
and is therefore ignored.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The tests covers 100% of the LOC of rewind_divergent_log. There are
three situations :
* throw an assert because the data is inconsistent
* special case when the entire logs is divergent
* regular workflow where all divergent entries are run to
merge_old_entry
http://tracker.ceph.com/issues/5213 refs #5213
Signed-off-by: Loic Dachary <loic@dachary.org>
The tests covers 100% of the LOC of merge_log. It is broken down
in 7 cases to enumerate all the situations it must address. Each case
is isolated in a independant code block where the conditions are
reproduced. Where possible and sensible to read, a code block covers
as much lines as possible. For instance:
The log entry (1,3) deletes the object x9 but the olog entry (2,3)
modifies it and is authoritative : the log entry (1,3) is divergent.
is the only test case covering a dozen "if" statements and half a
dozen "while/for" loops. It covers all the lines but it would be
useful to create others scenarii in the future.
Each test is made of a comment describing the test case, the
definition of the data structures to create the desired conditons, a
sequence of EXPECT_* checking that they are met, a single call to
merge_log and another sequence of EXPECT_* ( ordered to be easy to
compare with the first sequence ) checking all the desired side
effects.
The TestPGLog.cc file was untabified to improve the display of ascii
art when it is output as part of a diff.
http://tracker.ceph.com/issues/5213 refs #5213
Signed-off-by: Loic Dachary <loic@dachary.org>
Stat a bunch of (non-existent) random objects in the pool so ensure the
pg exists on the OSD before we assert that we get a 0 from querying it.
Although it is somewhat tempting to make the pg commands block until the
pg exists, that defeats much of the value of the command as a diagnostic
tool as it could block indefinitely instead of informing the admin/dev
that "the pg isn't there yet".
In any case, this fixes the api test failure.
Signed-off-by: Sage Weil <sage@inktank.com>
When talking to old daemons, if a command succeeds, there may be
output on outs, outbuf, or both; combine them if there's no error,
and clear outs so it's not treated as stderr fodder.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
For osd tell or pg <pgid> commands, the CLI sends the command directly
to the OSD; if the OSDs are still old, the command needs to be sent
in 'plain' (non-JSON) form. Also, the 'ceph status' from -w needs to
handle failure/fallback-to-old-command.
Refactor the guts of json_command() into send_command(), and call it
from json_command() and where needed for old-style commands.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
It was hard-coded at 5. Make it range from 5-15 by default, for now.
We should still keep this smallish since this range is locked for the
duration of the scrub on this chunk.
Signed-off-by: Sage Weil <sage@inktank.com>
In case of deep uri resources (ones created beyond a single level
of hierarchy, e.g. auth/v1.0) we want to create a new empty
handlers for the path if no handlers exists. E.g., for
auth/v1.0 we need to have a handler for 'auth', otherwise
the default S3 handler will be used, which we don't want.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Fixes: #5262
The original test was not comparing the correct string, ended up
with the effect of just checking the substring of the uri to match
the resource.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Fixes: #5261
Backport: cuttlefish
Add 'cors' to the list of sub-resources, otherwise auth signing
is wrong.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
This is a potentially use object/file, usually prefixed by a zeroed region
on disk, that is not used by scrub at all. It dates back to
f51348dc8b (2008) and the original version of
scrub.
This *might* fix#4179. It is not a leak per se, but I observed 1GB
scrub messages going over the write. Maybe the allocations are causing
fragmentation, or the sub_op queues are growing.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
This fixes a specific case of double-queuing seen in #4832:
- client goes stale, inode marked NEEDSRECOVER
- eval does sync, queued, -> RECOVERING
- client resumes
- client goes stale (again), inode marked NEEDSRECOVER
- eval_gather queues *again*
Note that a cursory look at the recovery code makes me think this needs
a much more serious overhaul. In particular, I don't think we should
be triggering recovery when transitioning *from* a stable state, but
explicitly when we are flagged, or when gathering. We should probably
also hold a wrlock over the recovery period and remove the force_wrlock
kludge from the final size check. Opened ticket #5268.
Signed-off-by: Sage Weil <sage@inktank.com>