Be smarter about when we write back caps on fsync, and when we
wait. Also, wait only for those caps to write back, not for all
caps to be clean, avoiding starvation.
Silly us, we were using the same object names for multiple mds's.
That doesn't work so well.
Renamed like so:
mdstable_snaptable -> mds_snaptable
mdstable_inotable -> mds0_inotable
sessionmap -> mds0_sessionmap
mdstable_anchortable -> mds_anchortable
To do that we add ourselves to the rejoin_ack_gather. Otherwise
we end up in up:active before we've even finished our
parallel_fetch or finished up our caps!
This is needed only because we identify_files_to_recover() before
sending the rejoin acks, and that may twiddle the lock state, so
we need to be in a compatible state.
Anything that's not ours should be unknown, including the root dir frag,
which normally starts out as mds0.
If we leave it as 0, then when mds0 claims a subset of /, its bounds are
left as 0 as well instead of being set to unknown. Which leads to
incorrect resolve stage results.
Since we require caps for all inodes in our cache, no need to consider
parents when identifying where to sent a request. Just look at fragtree
(for fragmented dirs) or caps.
We want to allow pipelined cap updates, like
client->mds writeback Fw 1
dirty FwAx
client->mds writeback FwAx 2
dirty Ax
client->mds writeback Ax 3
mds->client ack 1
mds->client ack 2
mds->client ack 3
We need to make sure that the Fw bit is only cleaned after ack 2,
and Ax after ack 3. A single tid for the inode isn't sufficient,
since that would e.g. ignore ack 2... we need a tid per cap bit so
we can pipeline writeback of different caps.
Note that we can't simply write back dirty | flushing caps every
time, since the write may also be releasing the cap. And it would
gum up the MDS locking.
Move the last_tid to the inode, and only pay attention to 16 bits
per cap bit.. that's 17*2 bytes, vs the old 16. Could be worse.
An 8 bit tid is probably also sufficient (that's 256 pipelined
writes) if we're concerned about inode size down the road.
Call on correct ci. Skip past dropping inodes without dropping
our spinlock. Hold ref on prior inode until we traverse to the
next one. (We can't iput while holding our spinlock.)