When we take the clone branch, we update the missing map. This invalidates
our current iterator, which can cause badness. Instead, increment the
iterator near the top of the loop so we don't have to worry about it.
Signed-off-by: Sage Weil <sage@newdream.net>
The dir commit/fetch and LogSegment::try_to_expire() rely on any new or
items in the directory getting new versions that correspond to a bump in
the dirfrag version. This must include dentries/inodes that are created
by the cow process, or else we have problems during dir commit/fetch or
segment expire.
Change the dirty list in the Mutation to include the pv so that we can
properly mark them dirty later.
Leave the inode one alone. We could theoretically do the same for the
dirty inodes, but this way we avoid projecting them and copying stuff
around. Any dirty cowed inode will also have a dirty dentry, so it will
still get saved regardless.
Signed-off-by: Sage Weil <sage@newdream.net>
We should only return the pdnvec for a full traverse. i.e., either a
success, or a failure in which we instantiate a null dn for the trailing
entry. This makes pdnvec well defined, and allows callers like
rdlock_path_pin_ref() to reply with a null lease when appropriate.
Signed-off-by: Sage Weil <sage@newdream.net>
The dentry needs a [first,last] range and we don't know what first is when
we miss a lookup. And part of the point of instantiating null dentires is
to issue leases against them, which we don't do. The client will cache
the null result.
This lets us issue the most leases/caps possible. It also ensure we can
issue caps in the snapped namespace when we are still on the head inode
(previously, releasing the rdlock twiddled the state, the client didn't
get say Frc, and hung indefinitely).
We can get into a loop when doing a path traverse if we miss on a large
directory and then end up trimming the result we need before handling the
original request. To avoid this, we simply put the wanted dentry at the
top of the LRU (instead of midpoint).
Signed-off-by: Sage Weil <sage@newdream.net>
The previous if block didn't work because inode->size was usually
changed well before handle_cap_trunc was ever invoked, so it never
did the truncation in the objectcacher! This was okay if you just truncated
a file and then closed it, but if you wrote a file, truncated part of it out,
and then wrote past the (new) end you would get reads that returned
the previously-truncated data out of what should have been a hole.
Now, we do the actual objectcacher truncation in update_inode_file_bits,
because all methods of truncation will move through there and this maintains
proper ordering.
These fields are logically object attributes that should be preserved
across the clone COW process. (Not copying truncate_seq in particular
corrupts snapshot file data, depending on the order of arrival of racing
trimtrunc and writes.
The data pool in particular has seq 0 and (initially) no removed snaps. We
must not return true for that case, or else the OSD will use an empty
pool snap context and not the user/mds provided one.
Signed-off-by: Sage Weil <sage@newdream.net>
We can backlist either a specific instance (1.2.3.4:1234/5678) or an
entire IP, in which case the table has something like "1.2.3.4:0/0" (a port
and nonce of 0).
We would send an incremental for anything >1, or the latest map, but not
osdmap e1 itself. Fix the condition, and make send_incremental() smart
about starting with the full map at 1 as needed.
If the client reconnects, the journal 'close' replay doesn't remove the
session, which leaves the session state intact. It needs to reset it in
that case, or else we get problems if the session is reopened and the
state doesn't match up.
Reported-by: Nat N <phenisha@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>