Verify that slave requests received are not stale.
Verify that slave replies match the currently processing request.
Clean up the code a bit.
Signed-off-by: Sage Weil <sage@newdream.net>
We need to distinguish between different attempts to process a request, or
else we can get annoying races in the slave request handling code. E.g.,
- request sent to mds A
- A authpins items on B, B registered slave_request
- A forwards request to C, sends slave finish to B
- C receives request, sends authpin slave request to B
- B receives C's authpin request, discards (*)
- B receives A's finish, closes slave request
First we just add tracking of the attempt number.
Signed-off-by: Sage Weil <sage@newdream.net>
trim() would iterate over segments. It would take the *p segment, ++p,
then call try_expire(). But the _expired() function would also clean up
and (if possible) retire subsequent segments on the list if they were on
the expired list, invalidating the p iterator.
Untangle the mess by making expired segment trimming (i.e. removing from
segment list) a separate operation performed only by trim() (probably a
good idea anyway). This keeps the iterator safe/stable.
Signed-off-by: Sage Weil <sage@newdream.net>
Give these a different type so they are not interpreted as subtree
boundaries during replay. Otherwise we break the truncate_finish code,
which references the truncate_start logsegment by offset. Probably other
stuff too.
Signed-off-by: Sage Weil <sage@newdream.net>
If we have an open session with an mds, we need to have an open session.
The problem is if we, say,
- client has old mdsmap
- mds A adds B as target in mdsmap
- send request to mds A
- A exports to B
- we get the EXPORT, but B isn't listed as a target for A in client map
- client gets updated map
At the time we receive the map we need to open the session to B. We can't
really do it when we get the EXPORT because we don't know the target MDS.
We can either track which exports are pending to do it, or just blindly
open sessions with targets for any MDSs we have caps with. Which is
basically every session we have open. That's simplest for now.
Signed-off-by: Sage Weil <sage@newdream.net>
commit:dac1dc83ee5598ca97c29cd5d0b12150685cd05b added handling
for scatter_wanted, but we need to handle unscatter_wanted here too.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
We use this field to indicate we want a scatter or an unscatter. Make
that distinction explicit.
Also, clear the unscatter_wanted in simple_lock when we start a gather!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
This will generate missed deadline noise in the log that may otherwise be
missed by an infrequent heartbeat_interval. We generally want to know if
deadlines are missed, but we don't necessarily need to touch the heartbeat
file every second. This gets us both.
Signed-off-by: Sage Weil <sage@newdream.net>
Register and unregister worker threads. Periodically touch heartbeat
when idle. Set heartbeat timeout before processing a queue item.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Each thread registered and gets a private structure it can write a timeout
value to. The timeout is time_t and always fits in a single word, so no
locking is used to update it.
Anyone can call is_healthy() to find out if any timeouts have expired.
Eventually some background thread will do this.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
During resolve we may journal EImportFinish(true/false) as we resolve our
imports/exports. And as a side-effect we may journal an ESubtreeMap. We
need to properly mark ambig subtrees in that entry based on the
my_ambiguous_imports (resolve state), not just the migrator state (for the
active mds).
Note that the other Migrator::is_ambiguous_import() user
(send_resolve_now()) already does this correctly.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
For active MDS, pin when we add to the list, unpin when we finish
truncating.
For replay, pin when we replay a truncate start, unpin when we replay a
truncate finish. Use a nice helper for both.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
When we get the first prep, we may respond to the master with an expanded
list of witnesses for the rename before making any change (or rollback
plan). If the master fails before sending the second prep attempt, we
may end up in the abort path of _commit_slave_rename() with an empty
rollback_bl. That's fine; don't crash. We still need to unfreeze the
srci, but can skip the do_rename_rollback since we didn't actually journal
a change.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
- mds A authpins item on mds B
- mds B starts to freeze tree containing item
- mds A tries wrlock_start on A, sends REQSCATTER to B
- mds B lock is unstable, sets scatter_wanted
- mds B lock stabilizes, calls try_eval, defers because freezing.
-> deadlock
In general, we want to avoid the eval while freezing to prevent starvation.
However, in this case with the multi-mds locking, we need to honor
the scatter_wanted even so.
Insert this check in try_eval(). This will catch it on the first try_eval
call after the lock stabilizes. The ambiguous auth will never catch us
while freezing, and the master holds an auth_pin to prevent a freeze, so
we will never defer the eval; no need to do the same logic in the other
eval method (eval(MDSCacheObject*, ...)) used for retry.
Signed-off-by: Sage Weil <sage@newdream.net>