We don't want to add to the throttler if we aren't going to queue the
write, or else we'll never take it off again.
Signed-off-by: Sage Weil <sage@newdream.net>
This should only occur with the root inode, but caused a segfault for
anybody running more than one MDS who restarted.
Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Accomplish this by making a list of cap releases in the (permanent)
MetaRequest, and then copying that into the (potentially-temporary)
MClientRequest.
We need to keep the halt_delivery plug set on failure/shutdown in order to
prevent a racing reader from queuing new messages. The only time we clear
it is when we discard messages due to a session reset.
Signed-off-by: Sage Weil <sage@newdream.net>
We can't modify 'sd' or (more importnatly) close sd while any other thread
might be using it, or else we might race with an open and they might end
up using someone else's fd.
Take care to _only_ close(sd) in connect(), when the reader thread is
stopped, or when reaping the connection.
Signed-off-by: Sage Weil <sage@newdream.net>
We want to make sure the pipe's queue item doesn't go away.
Also, make queue_received() require pipe_lock to be held. This avoids some
useless unlocking/locking, since (in the case where the pipe is already
queued) we then don't need to drop the pipe_lock at all.
Signed-off-by: Sage Weil <sage@newdream.net>
Put the readdir results (list of snapshots) in the right place in the
hierarchy; we were putting them in the parent dir (as if they were real
directories).
This bug manifested itself as a snaptest-2.sh failure.
Signed-off-by: Sage Weil <sage@newdream.net>
The client is allowed to not send a snapflush if there is no dirty metadata
to write for a given snap. However, the mds can only look up inodes by
the last snapid in the interval. So, when doing a null_snapflush (filling
in for snapflushes the client didn't send), we have to walk forward through
intervening snaps until we find the right inode.
Note that this means we will call _do_snap_update multiple times on the
same inode, but with different snapids.
Add unit test to check this.
Signed-off-by: Sage Weil <sage@newdream.net>
The take_op_budget() may drop our lock if we are in keep_balanced_budget
mode, so we need to do that _before_ we take references to internal state
that may change out from under us during that time.
This fixes a crash like
./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
./osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_inst(int)':
./osd/OSDMap.h:460: FAILED assert(exists(osd) && is_up(osd))
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (Objecter::op_submit(Objecter::Op*)+0x6c2) [0x38658854c2]
2: /usr/lib64/librados.so.1() [0x3865855dc9]
3: (RadosClient::aio_write(RadosClient::PoolCtx&, object_t, long,
ceph::buffer::list const&, unsigned long,
RadosClient::AioCompletion*)+0x24b) [0x386585724b]
4: (rados_aio_write()+0x9a) [0x386585741a]
5: /usr/bin/qemu-kvm() [0x45a305]
6: /usr/bin/qemu-kvm() [0x45a430]
7: /usr/bin/qemu-kvm() [0x43bb73]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (ABRT) ***
ceph version 0.22.1 (commit:c6f403a6f441184956e00659ce713eaee7014279)
1: (sigabrt_handler(int)+0x91) [0x3865922b91]
2: /lib64/libc.so.6() [0x3c0c032a30]
3: (gsignal()+0x35) [0x3c0c0329b5]
4: (abort()+0x175) [0x3c0c034195]
5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3c110beaad]
Signed-off-by: Sage Weil <sage@newdream.net>
dpkg-buildpackage will autodetect the dependency. Except on lenny, where
it doesn't exist and we don't use it!
Signed-off-by: Sage Weil <sage@newdream.net>
This replaces the old --shadow option, which didn't work.
It starts up the MDS daemon, then replays the journal for
another MDS, and then shuts down.
Also minimally modifies the MDSMonitor to enable this
behavior; since it requires shared state.
This realigns the code with the kernel version, fixing a number of
problems when you have multiple MDSs returning info on the same inode.
Signed-off-by: Sage Weil <sage@newdream.net>
When we're renaming across nodes, we need to freeze the inode. This
requires that we allow for the auth_pins that _we_ hold, which include
one because of the linklock xlock, and one by the MDRequest.
Signed-off-by: Sage Weil <sage@newdream.net>
Yay, we don't need it!
If we can't update the frag on scatter, fine. The staleness of the frag
is implicit in the frag's scatter stat version not matching the inode's.
If/when we do want to update it, the frag will clearly be writable, and
we can bring it back in sync then.
Signed-off-by: Sage Weil <sage@newdream.net>
The rdlock_path_xlock_dentry helper works for _auth_ dentries that we
create locally in an auth dirfrag. For the srcdn, we need to discover an
_existing_ dentry that is not necessarily auth.
Call path_traverse ourselves, but be careful to take the appropriate locks
on the resulting dn, dir, and ancestors.
Signed-off-by: Sage Weil <sage@newdream.net>
The scatter_writebehind() takes a wrlock, but that may still allow the lock
to complete a gather to LOCK and even move to say MIX before the data is
committed. Bad news!
Signed-off-by: Sage Weil <sage@newdream.net>
is_stale() => next MIX is MIX_STALE. Stale flag is then cleared. Then we
special case the import to preserve stale-ness.
TODO: add_replica_inode likely has this same problem.
Signed-off-by: Sage Weil <sage@newdream.net>
Our new invariant is that MIX_STALE always implies is_stale(). And on
import, if is_stale(), MIX becomes MIX_STALE. This ensures that a replica
that we put into MIX_STALE doesn't turn back into MIX if we import it
and take the auth's state in CInode::decode_import().
Signed-off-by: Sage Weil <sage@newdream.net>