The new parsing code had been trying to allow flexibility for the
'old form' commands (where id could be different from N in osd.N),
but also accept 'new form' commands. The new rule is that where
there's an OSD specified in the osd crush command, it is of type
CephOsdName, which can be an id *or* 'osd.<id>', but not both.
Pass CephOsdName as int64_t 'id' for convenience in mon code
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Generalize the map check machinery that the pool dne check uses to also
get the latest map for OSD down/dne checks. This is better semantics, but
more important fixes the more immediate bug of returning the error code
to the caller from the osd_command -> _submit_command (that is ignored by
pretty much any caller) and then never triggering the callback.
Fixes: #5331
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
A mere
import readline
line is dumping this to stdout on CentOS 6.3:
00000000 1b 5b 3f 31 30 33 34 68 .[?1034h
That confuses non-terminals that read from stdout, so only import when we
are in the interactive mode.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
The get_version(string, string) is the wrong method; it combines the two
args into a key that is nested inside prefix (so it's prefix/a/b), but we
want perfix/format_version. Add a method to grab an int for this
particular combo and use that.
This fixes an infinite loop when we actually trigger this code.
Bug introduced by f43c974571.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
If <path-to-ceph> contains pybind and .libs:
- prepend <path-to-ceph>/pybind to PYTHONPATH
- append <path-to-ceph>/.libs to LD_LIBRARY_PATH if not already there
and exec self so it takes effect
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
'ceph-conf ...' doesn't give you final/default values, only what is in the
conf file. Use -w output to test this instead.
Fixes: #5327
Signed-off-by: Sage Weil <sage@inktank.com>
Trying to track down this failure:
2013-06-12T06:11:13.430 INFO:teuthology.task.workunit.client.0.err:+ rsync -auv --exclude local/ /usr/ usr.2
2013-06-12T06:11:13.430 INFO:teuthology.task.workunit.client.0.err:+ tee a
2013-06-12T06:11:13.527 INFO:teuthology.task.workunit.client.0.out:sending incremental file list
2013-06-12T06:11:46.206 INFO:teuthology.task.workunit.client.0.out:
2013-06-12T06:11:46.208 INFO:teuthology.task.workunit.client.0.out:sent 1689627 bytes received 8302 bytes 50684.45 bytes/sec
2013-06-12T06:11:46.208 INFO:teuthology.task.workunit.client.0.out:total size is 3274130495 speedup is 1928.31
2013-06-12T06:11:46.209 INFO:teuthology.task.workunit.client.0.err:+ wc -l a
2013-06-12T06:11:46.209 INFO:teuthology.task.workunit.client.0.err:+ grep 4
2013-06-12T06:11:46.211 INFO:teuthology.task.workunit:Stopping misc on client.0...
...and am perplexed!
Signed-off-by: Sage Weil <sage@inktank.com>
User testing has shown that smaller values yield better results; see #4917.
Jim's testing has had good results with even more aggressive trimming, but I
would like to do more validation yet before changing defaults.
Signed-off-by: Sage Weil <sage@inktank.com>
We regularly have been observing a stall where the MDS is blocked waiting
for a cap revocation (Ls, in our case) and never gets a reply. We finally
tracked down the sequence:
- mds issues cap seq 1 to client
- mds does revocation (seq 2)
- client replies
- much time goes by
- client trims inode from cache, sends release with seq == 2
- mds ignores release because its issue_seq is 1
- mds later tries to revoke other caps
- client discards message because it doesn't have the inode in cache
The problem is simply that we are using seq instead of issue_seq in the
cap release message. Note that the other release call site in
encode_inode_release() is correct. That one is much more commonly
triggered by short tests, as compared to this case where the inode needs to
get pushed out of the client cache.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
set_priv() expects to be given a reference to own; take one. This fixes
various crashes after we see a hb connection reset.
Signed-off-by: Sage Weil <sage@inktank.com>
If argparse gets its hands on it, it's not available for parse_argv()
and is therefore ignored.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The tests covers 100% of the LOC of rewind_divergent_log. There are
three situations :
* throw an assert because the data is inconsistent
* special case when the entire logs is divergent
* regular workflow where all divergent entries are run to
merge_old_entry
http://tracker.ceph.com/issues/5213 refs #5213
Signed-off-by: Loic Dachary <loic@dachary.org>
The tests covers 100% of the LOC of merge_log. It is broken down
in 7 cases to enumerate all the situations it must address. Each case
is isolated in a independant code block where the conditions are
reproduced. Where possible and sensible to read, a code block covers
as much lines as possible. For instance:
The log entry (1,3) deletes the object x9 but the olog entry (2,3)
modifies it and is authoritative : the log entry (1,3) is divergent.
is the only test case covering a dozen "if" statements and half a
dozen "while/for" loops. It covers all the lines but it would be
useful to create others scenarii in the future.
Each test is made of a comment describing the test case, the
definition of the data structures to create the desired conditons, a
sequence of EXPECT_* checking that they are met, a single call to
merge_log and another sequence of EXPECT_* ( ordered to be easy to
compare with the first sequence ) checking all the desired side
effects.
The TestPGLog.cc file was untabified to improve the display of ascii
art when it is output as part of a diff.
http://tracker.ceph.com/issues/5213 refs #5213
Signed-off-by: Loic Dachary <loic@dachary.org>
Stat a bunch of (non-existent) random objects in the pool so ensure the
pg exists on the OSD before we assert that we get a 0 from querying it.
Although it is somewhat tempting to make the pg commands block until the
pg exists, that defeats much of the value of the command as a diagnostic
tool as it could block indefinitely instead of informing the admin/dev
that "the pg isn't there yet".
In any case, this fixes the api test failure.
Signed-off-by: Sage Weil <sage@inktank.com>
When talking to old daemons, if a command succeeds, there may be
output on outs, outbuf, or both; combine them if there's no error,
and clear outs so it's not treated as stderr fodder.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
For osd tell or pg <pgid> commands, the CLI sends the command directly
to the OSD; if the OSDs are still old, the command needs to be sent
in 'plain' (non-JSON) form. Also, the 'ceph status' from -w needs to
handle failure/fallback-to-old-command.
Refactor the guts of json_command() into send_command(), and call it
from json_command() and where needed for old-style commands.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>