Yay, we don't need it!
If we can't update the frag on scatter, fine. The staleness of the frag
is implicit in the frag's scatter stat version not matching the inode's.
If/when we do want to update it, the frag will clearly be writable, and
we can bring it back in sync then.
Signed-off-by: Sage Weil <sage@newdream.net>
The rdlock_path_xlock_dentry helper works for _auth_ dentries that we
create locally in an auth dirfrag. For the srcdn, we need to discover an
_existing_ dentry that is not necessarily auth.
Call path_traverse ourselves, but be careful to take the appropriate locks
on the resulting dn, dir, and ancestors.
Signed-off-by: Sage Weil <sage@newdream.net>
The scatter_writebehind() takes a wrlock, but that may still allow the lock
to complete a gather to LOCK and even move to say MIX before the data is
committed. Bad news!
Signed-off-by: Sage Weil <sage@newdream.net>
is_stale() => next MIX is MIX_STALE. Stale flag is then cleared. Then we
special case the import to preserve stale-ness.
TODO: add_replica_inode likely has this same problem.
Signed-off-by: Sage Weil <sage@newdream.net>
Our new invariant is that MIX_STALE always implies is_stale(). And on
import, if is_stale(), MIX becomes MIX_STALE. This ensures that a replica
that we put into MIX_STALE doesn't turn back into MIX if we import it
and take the auth's state in CInode::decode_import().
Signed-off-by: Sage Weil <sage@newdream.net>
Previously I changed the std::multimap decoder to minimize the number of
constructor invocations. However, it could be much more expensive to
copy an initialized (decoded) val_t than to copy an empty one. For
example, if we are decoding std::multimap < int, std::set <int> >. So
change the code to insert a non-decoded val_t again.
However, this still saves two constructor invocations over the original.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
To avoid seeing confusing errors later in the configure process, in
autogen.sh, check to make sure the pkg-config program is installed.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
Separate behavior into two dimensions: whether or not we are updating
the dirfrag, and whether or not the dirfrag is stale.
Change the various helpers to NOT implicitly update accounted_*, as the
caller doesn't always want that, notably when we are non-stale but frozen.
Signed-off-by: Sage Weil <sage@newdream.net>
If we're in the MIX state, we clearly can't touch this without screwing up
the delicate scatter/gather behavior. If we're in, say, LOCK, there is
still no reason to update it. One frag at least is local and auth if we
are in this code, but there may be other frags on other nodes. This would
just make them appear stale when they are not.
Signed-off-by: Sage Weil <sage@newdream.net>
When the lock scattered, if we didn't have an auth frag that was frozen,
we go into MIX state. Later, we may import a stale dirfrag. We need to
move to MIX_STALE at that point, and/or mark the lock stale so that any
subsequent transition does so.
Signed-off-by: Sage Weil <sage@newdream.net>
We only do the assimilate_dirty_rstat_inodes if we do an update AND the
frag rstat was non-stale, but the bottom half (_finish) doesn't have the
same info to know whether we did it because the top half updates the
fragstat version. Use a flag to indicate we've updated the dirfrag so
the bottom half will only run when needed.
Signed-off-by: Sage Weil <sage@newdream.net>
We need to pass the inode rstat's version into finish_scatter_update, not
the shadowed local variable. Otherwise we don't update the dirfrag when
we should.
Signed-off-by: Sage Weil <sage@newdream.net>
CInode::make_path_string: don't coerce the inode number to 32-bits.
Everyone else is treating it as 64 bits; this function should too.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
If we encounter a bad event in the journal, dump it to the log.
Optionally skip it, if 'mds log skip corrupt events = true'.
Signed-off-by: Sage Weil <sage@newdream.net>
This is extremely important, and it forces the MDS to get the osdmap that
includes the blacklist entry for its predecessor. This in turn means that
any OSD we contact trying to read the journal will be forced to get that
osdmap (or newer) before handling our read request, which means that
anything we read cannot be overwritten by a racing request from our
predecessor. This prevents two MDSs writing to the journal at the same
time.
This change fixes potential (and observed!) journal corruption.
Signed-off-by: Sage Weil <sage@newdream.net>