The heler also updates the SnapRealm::open_past_parents, which is needed
for the have_past_parents_open() check.
That is used when, among other things, we import caps; not updating it
prevented the cap import from sending the client cap message, which makes
the mds<->client cap relationship get out of sync.
Signed-off-by: Sage Weil <sage@newdream.net>
We were already checking that we _can_ wrlock before doing the rstat
projection (if we can't, we mark_dirty_rstat() on the inode), but we
weren't actually taking the wrlock to prevent lock state changes while
that happened.
This bug eventually manifested itself as a failed assertion at the
now familiar
mds/CInode.cc: In function 'virtual void CInode::decode_lock_state(int, ceph::bufferlist&)':
mds/CInode.cc:1364: FAILED assert(pf->rstat == rstat)
Signed-off-by: Sage Weil <sage@newdream.net>
This will return read errors on a pipe if it gets no data
for the given period of time (default 15 minutes). In a stateful
session the Connection will hang around and the next write will
initiate standard reconnect, so things keep working but we don't
rack up hundreds of useless threads!
Those are the only states where the replica can effectively prevent the
lock from cycling in a way that would force a frozen dirfrag beneath
the scatterpinned inode to update/journal something
(accounted_fragstat/rstat).
Signed-off-by: Sage Weil <sage@newdream.net>
If the user has turned on journalling, but left osd_journal_size at 0,
normally we would use the existing size of the journal without
modifications. If the journal doesn't exist (i.e., we are running
mkjournal()), we have to check for this condition and return an error.
We can't create a journal if we don't know what size that journal needs
to be.
This fixes a bug where an extremely small journal file was being
created, leading to an infinite loop in FileJournal::wrap_read_bl().
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
We aren't actually projecting the inode unless destdn->is_auth(),
so check for that before projecting the snaprealm (which requires
a projected inode)!
Then on rename_apply, open the snaprealm on non-auth MDSes.
This was causing a mis-match in the projection code, since
assimilate_...finish() calls pop_and_dirty_projected_inode(), but
the first half is only called on CEPH_LOCK_INEST locks. So make them match!
The file_excl() trigger asserts mds_caps_wanted is empty. The caller
shouldn't call it if that's the case. If it is, just go to LOCK instead.
All we're doing is picking a state to move to that will allow us to
update max_size.
Signed-off-by: Sage Weil <sage@newdream.net>
If we do a slave request xlock, the state is LOCK, not XLOCK. Weaken
the SimpleLock::get_xlock() assert accordingly.
Signed-off-by: Sage Weil <sage@newdream.net>
When we propagate the rstat to inode in predirty_journal_parents (because
we hold the nestlock), bump the rstat version as well. This avoids
confusing any replicas, who expect the rstat to have a new version or to
remain unchanged when the lock scatters.
Signed-off-by: Sage Weil <sage@newdream.net>
We were starting a commit if we had started a new op, but that left a
window in which the op could be being journaled, and nothing new has been
applied to disk. With this fix we only commit if committing/committed
will increase. Now the check matches the
committing_seq = applied_seq;
a few lines down, and all is well.
The actual crash this fixes was:
2010-10-07 16:20:36.245301 7f07e66d3710 filestore(/mnt/osd3) taking snap 'snap_23230'
2010-10-07 16:20:36.245428 7f07e66d3710 filestore(/mnt/osd3) snap create 'snap_23230' got -1 File exists
os/FileStore.cc: In function 'void FileStore::sync_entry()':
os/FileStore.cc:1738: FAILED assert(r == 0)
ceph version 0.22~rc (1d77c14bc310aed31d6cfeb2c87e87187d3527ea)
1: (FileStore::sync_entry()+0x6ee) [0x793148]
2: (FileStore::SyncThread::entry()+0x19) [0x761d43]
3: (Thread::_entry_func(void*)+0x20) [0x667822]
4: (()+0x68ba) [0x7f07eac248ba]
5: (clone()+0x6d) [0x7f07e9bd802d]
Signed-off-by: Sage Weil <sage@newdream.net>
Look at the eversion.version field (not the whole eversion) when deciding
what is divergent. That way if we have
our log: 100'10 (0'0) m 10000004d3a.00000000/head by client4225.1:18529
new log: 122'10 (0'0) m 10000004d3a.00000000/head by client4225.1:18529
The 100'10 is divergent and the 122'10 wins and we don't get a dup
reqid in the log.
Signed-off-by: Sage Weil <sage@newdream.net>
The problem is that merge_log adds new items to the log before it unindexes
divergent items, and that behavior is needed by the current implementation
of merge_old_entry(). Since the divergent items may be the same requests
(and frequently are) these asserts needs to be loosened up.
Now, the most recent addition "wins," and we only remove the entry in
unindex() if it points to us.
Signed-off-by: Sage Weil <sage@newdream.net>
Saw an OSD that was up in the map, but the address didn't match. Caused
all kinds of strange behavior. I'm not sure what I had in mind when the
original test only checked for down AND same address before moving to boot
state, since having the wrong address is clearly bad news.
Signed-off-by: Sage Weil <sage@newdream.net>
We were truncating if we were in log_per_instance mode. But normally those
logs don't exist. And if they do, we probably don't want to truncate
them. This is particularly true if we respawn ourselves (e.g. after being
marked down) and restart with the same pid.
Signed-off-by: Sage Weil <sage@newdream.net>