coll_t is now a string. META_COLL and TEMP_COLL are just constants now.
Now there is a constructor that takes pgid_t and snapid_t, rather than
factory methods. It's clear what that constructor does, so wrapping it
in factory methods should be unecessary.
Bump coll_t serialization version to 3. Implement decoding for the old
versions.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
This change makes interval_set::m and interval_set::_size private data
members in interval_set, instead of public. This change also creates a
non-const iterator. Using this iterator, users can modify the length of
an interval. So now, all users can use the iterators rather than
interacting with the class internals directly.
We were properly falling out of the while loop when we reached end(), but
not checking for it in the following if-else. Now we do!
Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
This avoids stalling out peering, because the peer just responds with
another 'empty' PG::Info in response (which we already have).
Signed-off-by: Sage Weil <sage@newdream.net>
If during recovery we are unable to pull from a replica due to reaching
EOF (e.g., zeroed out object), pull from the next available replica (if
any).
Eventually this should be extended to do the same when a checksum fails.
Signed-off-by: Sage Weil <sage@newdream.net>
The setup-chroot.sh script is very handy for building the server in a
chroot environment. I thought I would share it here in case anyone else
finds it useful.
This really shouldn't happen (!), but if it does, at least avoid getting
the primary state out of sync with the replicas.
Signed-off-by: Sage Weil <sage@newdream.net>
If we already auth_pinned, we're past the gates; don't stop on freezable.
This screws up xlock: the lock moves to PREXLOCK state, but the request
that would normally xlock it gets deferred because of a racing freezing
of the tree. Then the PREXLOCK gather kicks in and badness happens.
Signed-off-by: Sage Weil <sage@newdream.net>
This makes the interface a bit more adaptable for a situation where it has
a simple string representation instead of the strict structure it has now.
Eventually this function can simply attempt a pg_t parse.
Signed-off-by: Sage Weil <sage@newdream.net>
We can't error out if we don't get everything we want in one go now that
we support pushing objects in pieces. Remove this check entirely, since
we don't have a good error handling case anyway.
We need to preserve the order of processing of cap release and writeback
messages across handle_client_caps() and process_request_cap_release().
Use a helper with the appropriate condition, and defer the release
processing as needed.
Signed-off-by: Sage Weil <sage@newdream.net>
Note that this will let the parent nestlock 'dirty' state get out of
sync with the lock state, as the whole point of the dirty rstat lists is
that it can happen any time. It does, however, queue us up.
We put some of the predirty_journal_parents() code that calls the
project_rstat_inode_to_frag() into a common helper and use that.
Signed-off-by: Sage Weil <sage@newdream.net>
The importer also needs to scatter pin. This avoids scatterlock gather
races like so:
A: start exporting to B
A: freeze, scatter pin tree
C: initiate gather
A: delay replay to gather
B: reply to gather, do not include (non-auth) dirfrag
A,B: finish migration
A: reply to gather, do not include (now non-auth) dirfrag
C: gets no info about the dirfrag!
By pinning on the importer, we ensure that at least one MDS will respond
to the gather with auth dirfrag info.
Signed-off-by: Sage Weil <sage@newdream.net>
The accounted_rstat must always remain consistent with the parent dirfrag,
which in turn means it is governed by the parent's nestlock.
The rstat is protected by _this_ inode's nestlock, and is updated by
scatter_writebehind() or predirty_journal_parents().
Signed-off-by: Sage Weil <sage@newdream.net>
Be careful about when we update bounding dirfrag info during an import. If
the lock is in a MIX state, we do NOT want to update, since the inode
auth doesn't know jack (unless they are also dirfrag auth, in which case
we'll find out when we unscatter anyway).
Fixes fix 9d81f9d6.
This can cause the inode rstat etc to become out of sync with dirfrag
accounted_rstat when the scatterlock is not in a gathered state: the
local values will get updated but those on other nodes will not, and the
inode will drift out of sync with the dirfrags.
Other callers to scatter_writebehind() are all in contexts where we have
_just_ gathered dirfrag state, or there is no remote dirfrag state to
gather.
Signed-off-by: Sage Weil <sage@newdream.net>
This is simpler (for the migrator), and wrlocks allow scatter_writebehind,
which is a no-no for a frozen tree. By pinning the frozen dir's parent
inode, we prevent any scatter or unscatter operations from implicitly
updating metadata within the frozen root dirfrag.