If we read an event that's later than our expected entry, we set read_pos
to -1 and discard the journal. If that happens we also need to reset
last_committed_seq to avoid a crash like
2010-12-08 17:04:39.246950 7f269d138910 journal commit_finish thru 16904
2010-12-08 17:04:39.246961 7f269d138910 journal committed_thru 16904 < last_committed_seq 37778589
os/FileJournal.cc: In function 'virtual void FileJournal::committed_thru(uint64_t)':
os/FileJournal.cc:854: FAILED assert(seq >= last_committed_seq)
ceph version 0.24~rc (commit:fe10300317383ec29948d7dbe3cb31b3aa277e3c)
1: (FileJournal::committed_thru(unsigned long)+0xad) [0x588e7d]
2: (JournalingObjectStore::commit_finish()+0x8c) [0x57f2ec]
3: (FileStore::sync_entry()+0xcff) [0x5764cf]
4: (FileStore::SyncThread::entry()+0xd) [0x506d9d]
5: (Thread::_entry_func(void*)+0xa) [0x4790ba]
6: /lib/libpthread.so.0 [0x7f26a2f8373a]
7: (clone()+0x6d) [0x7f26a1c2569d]
Fixes#631
Signed-off-by: Sage Weil <sage@newdream.net>
When auth first moves to sync->mix,
- auth sends AC_MIX to replicas
- replicas go to sync->mix
- replicas finish gather, send AC_SYNCACK, move to sync->mix(2)
- auth gets all acks, sends AC_MIX again
- replica moves to MIX
So any new replica should just get sync->mix(2), so that it is not confused
by the second AC_MIX.
Signed-off-by: Sage Weil <sage@newdream.net>
Any invented dirfrags have a version of 0. This will cause problems later
if we pre_dirty() anything in that dir because the dir version won't be
in sync (it'll be way too small). Also, we can do that at any point,
e.g. when flushing dirty caps, and aren't allowed to delay, so we need to
load those dirfrags now.
In theory we could read only the fnode and not all the dentries, but we
may as well. We should be more careful about memory that this patch is,
though.
Fixes#15.
Signed-off-by: Sage Weil <sage@newdream.net>
This ensures that if the replica is thinks it is flushing something the
auth will always do a scatter_writebehind.
Signed-off-by: Sage Weil <sage@newdream.net>
Since f741766a we have triggered start_flush and finish_flush on replicas.
The problem is that the finish_flush didn't always happen for the mix->lock
case: we sould start_flush when we sent the AC_LOCKACK, but could only
finish_flush if/when we got another SYNC or MIX. If the primary stayed in
the LOCK state, we would keep our flushing flag. That in turn causes
problems later when we try to eval_gather() (esp if we are auth at that
point?).
Fix this by sending an explicit AC_LOCKFLUSHED message to replicas after
we do a scatter_writebehind. The replica will only set flushing if it
flushed dirty data, which forces scatter_writebehind, so we will always
get the LOCKFLUSHED to match. Replicas that didn't flush will also get
it, but oh well. We'd need to keep track which ones sent dirty data to
do that properly, though.
TODO: still need to verify that this is correct for rejoin.
Signed-off-by: Sage Weil <sage@newdream.net>
We need to reverse the effects of encode_export_inode_caps(), which is just
the pin and state bit.
The original problem can be reproduced with
- ceph tell mds 0 injectargs '--mds-kill-import-at 5'
- restart mds
- recovery completes successfully
- wait for the subtree to be reexported
- fail with bad EXPORTINGCAPS get in encode_export_inode_caps
Signed-off-by: Sage Weil <sage@newdream.net>
Write to stdout_fileno directly rather than using a buffer, which we
would then have to flush. Fix a bug in the buffering of priorities.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
When building the debian packages, use --sysconfdir=/etc.
Also, don't fudge sysconfdir in the init-ceph script.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
derr was really just an alias for STDERR. Unfortunately, after we call
daemonize, STDERR is connected to /dev/null. So just replace calls to
derr with dout so that our important messages don't get lost.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>