This uses the old stand-alone qemu-iotests repo so it works with the
version of qemu in Ubuntu 12.04. The tests depend tightly on qemu
version, so to use later tests we'd need to install corresponding
versions of qemu.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Added init-radosgw.sys file for rpm based systems, added it to
the tarball list in the makefile, and updated the specfile to
install it. Also added the a dependency in ceph since it uses
utility routes from that package (On debian systems these are
packaged in ceph-common). Incorporated review comments from
Alex. (Bug #4571)
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Reviewed-by: Alexandre Marangone <alexandre.marangone@inktank.com>
The reconnect caps sent by the client on reconnect may not have
inodes found in the inode cache until after clientreplay (when
the client creates a new file, for example). Currently, we send an
export for that cap to the client if we don't see an inode in the cache
and path_is_mine() returns false (for example, if the client didn't
send a path because the file was already unlinked).
Instead, we want to delay handling of the reconnect cap until
clientreplay completes.
This patch modifies handle_client_reconnect() so that we don't assume
the cap isn't ours if we don't have an inode for it, but instead delay
recovery for later. An export cap message is only sent if the inode exists
and the cap isn't ours (non-auth) during reconnect. If any remaining
recovered caps exist in the recovered list once the mds goes active, we
send export messages at that point.
Also, after removing the path_is_mine check,
MDCache::parallel_fetch_traverse_dir() needs to skip non-auth dirfrags.
Fixes#4451.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
If mds failure causes client reconnect while the
client is unmounting, the client will send a session
close request to the mds even if there are outstanding
inodes in the cache waiting to receive flush_acks. This
causes the mds to send back a session close message and
the client closes the connection, so that when the mds tries
to send flush acks back to the client, they get dropped, resulting
in the client hanging on unmount. The pattern for this bug is:
1. mds restart
2. client sends session open request
3. client unmount sets unmounting flag and waits for flush_acks
4. mds sends session open reply
5. client sends session close request (because its unmounting)
6. mds sends session close, client closes connection
7. mds tries to send flush_acks, but drops them because the connection
is gone
This patch unifies the session close handling so that the client
only sends a session close in unmount once all flush acks have been
received. If the mds restarts during session close, the reconnect
logic will kick the session close waiter so that session close requests
are re-sent for session close replies not yet received.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
This adds a new test script for validating data reads from a mapped
rbd image is what it's expected to be.
See the content of the file for a bit more explanation.
Signed-off-by: Alex Elder <elder@inktank.com>
This is also the same as journaled_seq + 1 for writeahead
journaling, but not for parallel journaling.
Signed-off-by: Samuel Just <sam.just@inktank.com>
At one point, a commit had to drain the FileStore op
queue. This is no longer the case. Consequently, the
journal may have to wait more than one commit for the
filestore to create a stable commit point at a particular
sequence. Handling this requires two changes:
1) We cannot transition to FULL_WAIT until we receive
a commit_start on a seq >= journaled_seq.
2) We cannot remove the journal completion plug until get
a committed_thru on a seq >= header.start_seq at least as
new as the oldest committed item in the journal. If on
replay, the journal does not include fs_op_seq, we ignore
it, which is fine since we won't have reported those
entries committed!
Signed-off-by: Samuel Just <sam.just@inktank.com>
commit 0bcf2ac081 changes session_info_t's format, but there is
a typo in the code that decodes old format. We also need to
handle struct_v == 1, which had the same encoding but without
the size guards (which is all handled by DECODE_START_LEGACY_COMPAT).
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
There's no reason to check the duration of a watch. The notify will
timeout after 30s on the OSD, but there's no guarantee the client will
see that in any bounded time. This test is really meant as a stress
test of the OSDs anyway, not of the clients, so just remove asserts
about operation duration.
Fixes: #4591
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
Second guessing the first sequence number from the FileStore
was silly and broke tests which had the temerity to start at
1 instead of 2...
Fixes: #4687
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
This reverts commit a309177466. This commit
includes calls that involve Mutexes, Lockers, and lockdep -- which isn't
yet set up, so things break horribly. A more subtle approach is required.
Signed-off-by: Greg Farnum <greg@inktank.com>
Allow argparse functions to fail if no argument given by using
special versions that avoid the default CLI behavior of "cerr/exit"
Fixes: #4678
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
mon needs to call argparse for a couple of -- options, and the
argparse_witharg routines were attempting to cerr/exit on missing
arguments. This is appropriate for the CLI usage, but not the daemon
usage. Add a 'cli' flag that can be set false for the daemon usage
(and cause the parsing routine to return false instead of exit).
The daemon's parsing code due for a rewrite soon.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
The MDRequest is destroyed once the client reply is sent, but
we need the reference to the LogSegment for updating the backtrace, so
store a temporary ref to the LogSegment for later.
Fixes#4660.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
The _prefetch() function which intereprets temp_fetch_len interprets
it as the amount of data we need from read_pos, which is the beginning
of read_buf. So by setting it to the amount *more* we needed, we were
getting stuck forever if we actually hit this condition. Fix it by
setting temp_fetch_len based on the amount of data we need in aggregate.
Furthermore, we were previously rounding *down* the requested amount in
order to read only full log segments. Round up instead!
Fixes#4618
Signed-off-by: Greg Farnum <greg@inktank.com>
Currently we don't start logging on daemon startup unless the log_file
parameter was adjusted by ceph.conf. Instead, we should call all config
observers so that the logging subsystem is fully configured and we log
even prior to the daemonize and common_init_finish (when we call observers
again). This fixes logging for the initial period before we daemonize.
For some of the daemons (osd, mon), this includes significant work. It
also fixes the problem where users don't see the 'ceph version ...' banner
on daemon start.
Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Ensure that we push log data out before we restart logging. This may not
be strictly necessary, but it avoids a whole class of possible pitfalls.
Signed-off-by: Sage Weil <sage@inktank.com>
- fix seed
- the array indices are points in time; no need to subtract one from i!
- pick a random seed and print it to stdout
I ran this with several different seeds without failure, so I am confident
we are in good shape. And if we ever get a future failure, we'll have the
seed to reproduce.
Signed-off-by: Sage Weil <sage@inktank.com>
This lets us put a cap on outstanding client IOs. This is particularly
important for clients issuing lots of small IOs.
Fixes: #4579
Signed-off-by: Sage Weil <sage@inktank.com>
We already have a throttler that lets of limit the amount of memory
consumed by messages from a given source. Currently this is based only
on the size of the message payload. Add a second throttler that limits
the number of messages so that we can effectively throttle small requests
as well.
Signed-off-by: Sage Weil <sage@inktank.com>
If we write to an interval that didn't previously exist and then discard
it so that it again doesn't exist, all during the same interval, then we
should not include it in the 'written' set (or exists set, obviously).
Similarly, when we got to look at a merged diff, we can ignore extents
that were written (and possibly zeroed) if they neither existed before nor
after.
Bump up the iteration count to get more confidence that this is actually
correct.
Signed-off-by: Sage Weil <sage@inktank.com>
Fixes: #4600
Object marker should be treated as an object, so that name is formatted
correctly when getting the raw oid.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
FileStore::header_t::start_seq now encodes the op seq which may be
written at FileStore::header_t::start. This way, FileStore::open()
can pass a valid sequence number to read_entry for validation.
Otherwise, read_entry has no way of knowing whether a failure of a
read at header.start indicates that the journal was empty, or that
the entry is corrupt. With start_seq, read_entry can assume
corruption if start_seq <= committed_up_to.
Fixes: #4527
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Move 50-rbd.rules into the ceph base package since the related
ceph-rbdnamer binary is part of this package. Use correct install
pattern.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>