We aren't actually projecting the inode unless destdn->is_auth(),
so check for that before projecting the snaprealm (which requires
a projected inode)!
Then on rename_apply, open the snaprealm on non-auth MDSes.
This was causing a mis-match in the projection code, since
assimilate_...finish() calls pop_and_dirty_projected_inode(), but
the first half is only called on CEPH_LOCK_INEST locks. So make them match!
The file_excl() trigger asserts mds_caps_wanted is empty. The caller
shouldn't call it if that's the case. If it is, just go to LOCK instead.
All we're doing is picking a state to move to that will allow us to
update max_size.
Signed-off-by: Sage Weil <sage@newdream.net>
If we do a slave request xlock, the state is LOCK, not XLOCK. Weaken
the SimpleLock::get_xlock() assert accordingly.
Signed-off-by: Sage Weil <sage@newdream.net>
When we propagate the rstat to inode in predirty_journal_parents (because
we hold the nestlock), bump the rstat version as well. This avoids
confusing any replicas, who expect the rstat to have a new version or to
remain unchanged when the lock scatters.
Signed-off-by: Sage Weil <sage@newdream.net>
We were starting a commit if we had started a new op, but that left a
window in which the op could be being journaled, and nothing new has been
applied to disk. With this fix we only commit if committing/committed
will increase. Now the check matches the
committing_seq = applied_seq;
a few lines down, and all is well.
The actual crash this fixes was:
2010-10-07 16:20:36.245301 7f07e66d3710 filestore(/mnt/osd3) taking snap 'snap_23230'
2010-10-07 16:20:36.245428 7f07e66d3710 filestore(/mnt/osd3) snap create 'snap_23230' got -1 File exists
os/FileStore.cc: In function 'void FileStore::sync_entry()':
os/FileStore.cc:1738: FAILED assert(r == 0)
ceph version 0.22~rc (1d77c14bc310aed31d6cfeb2c87e87187d3527ea)
1: (FileStore::sync_entry()+0x6ee) [0x793148]
2: (FileStore::SyncThread::entry()+0x19) [0x761d43]
3: (Thread::_entry_func(void*)+0x20) [0x667822]
4: (()+0x68ba) [0x7f07eac248ba]
5: (clone()+0x6d) [0x7f07e9bd802d]
Signed-off-by: Sage Weil <sage@newdream.net>
Look at the eversion.version field (not the whole eversion) when deciding
what is divergent. That way if we have
our log: 100'10 (0'0) m 10000004d3a.00000000/head by client4225.1:18529
new log: 122'10 (0'0) m 10000004d3a.00000000/head by client4225.1:18529
The 100'10 is divergent and the 122'10 wins and we don't get a dup
reqid in the log.
Signed-off-by: Sage Weil <sage@newdream.net>
The problem is that merge_log adds new items to the log before it unindexes
divergent items, and that behavior is needed by the current implementation
of merge_old_entry(). Since the divergent items may be the same requests
(and frequently are) these asserts needs to be loosened up.
Now, the most recent addition "wins," and we only remove the entry in
unindex() if it points to us.
Signed-off-by: Sage Weil <sage@newdream.net>
Saw an OSD that was up in the map, but the address didn't match. Caused
all kinds of strange behavior. I'm not sure what I had in mind when the
original test only checked for down AND same address before moving to boot
state, since having the wrong address is clearly bad news.
Signed-off-by: Sage Weil <sage@newdream.net>
We were truncating if we were in log_per_instance mode. But normally those
logs don't exist. And if they do, we probably don't want to truncate
them. This is particularly true if we respawn ourselves (e.g. after being
marked down) and restart with the same pid.
Signed-off-by: Sage Weil <sage@newdream.net>
This was causing a lot of slowdowns.
Additionally, pin the inode when exporting caps -- otherwise it could
disappear out from under a cap ack. This was probably just exposed
by fixing the lock check.
If peering screws up and the primary mistakenly tries to pull an object
from us we don't have, log an error instead of crashing. This will still
throw off recovery (it will hang), but that's better than crashing
outright.
We can't error out if we don't get everything we want in one go now that
we support pushing objects in pieces. Remove this check entirely, since
we don't have a good error handling case anyway.
Since we cancel deletion on pg change, we will only receive these from
old primaries, so we can safely ignore.
Signed-off-by: Sage Weil <sage@newdream.net>
If the primary changes, cancel deletion so that the new primary has the
benefit of considering whether they need anything we have. Before we were
only canceling if our role changed, but that makes little sense.
Signed-off-by: Sage Weil <sage@newdream.net>
If we can't create the mon0/magic file, show an error message rather
than calling assert(). These cases are probably cluster configuration
problems.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>