When the inode sucks in the dir's updated frag info, we need to journal
the dirfrag update as well. This ensures that any subsequent changes in
the dirfrag will be reflected by the difference between rstat and
accounted_rstat from the perspective of the journal contents (in case we
fail/recover during that process).
We can backlist either a specific instance (1.2.3.4:1234/5678) or an
entire IP, in which case the table has something like "1.2.3.4:0/0" (a port
and nonce of 0).
We would send an incremental for anything >1, or the latest map, but not
osdmap e1 itself. Fix the condition, and make send_incremental() smart
about starting with the full map at 1 as needed.
If the client reconnects, the journal 'close' replay doesn't remove the
session, which leaves the session state intact. It needs to reset it in
that case, or else we get problems if the session is reopened and the
state doesn't match up.
Reported-by: Nat N <phenisha@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Reimplement core readdir (readdir_r_cb), using kclient as a template.
Reimplement all other readdir variants in terms of readdir_r_cb.
Main change is support for frag chunking, and hopefully lots of subtle bugs
that have been fixed in the kclient code we're based on.
Should be mix->sync(2): the same as a replica who already go the first
SYNC message and is waiting in mix->sync(2) for the final SYNC to indicate
the gather is completed.
This fixes a ABBA deadlock between
acquire_locks(): auth_pins items, then locks in order
export_dir: locks paths, then freezes.
Instead, we check for lockability (but don't lock), do the freeze, and then
try to take the locks after. If we can't do so atomically, we currently
just fail. In theory this could wait for the distributed locks, but it's
probably not worth the complexity at this stage; export_dir is currently
still opportunistic and can bail out for a variety of reasons.
The can_* fields need to be ANY or AUTH.. not REQ or XCL (at least not
without trickier checks). Otherwise we progress to the next state too
early and violate the locking rules.
If peering screws up and the primary mistakenly tries to pull an object
from us we don't have, log an error instead of crashing. This will still
throw off recovery (it will hang), but that's better than crashing
outright.