RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-28 05:53:37 +00:00

Author	SHA1	Message	Date
Sage Weil	4efa300601	filestore: assert on out of order journal pipeline submissions Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-03 13:14:49 -08:00
Sage Weil	259c509a89	filestore: fix wake condition when journal submission blocks We only want to wake up if we are at the front of the line, in order to preserve journal submission pipeline ordering. This fixes, among other things, messages in the log like 2010-12-21 10:38:42.515974 7f0861486700 journal op_submit_finish 5364 expected 5370, OUT OF ORDER and bug #666. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-03 13:14:13 -08:00
Sage Weil	15dcc65199	mds: fix purge_stray for directories, zeroed layouts - We don't want to purge file content on directories - Don't fall over if a file has a zero period Reported-by: Paul Komkoff <i@stingr.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-03 11:50:53 -08:00
Colin Patrick McCabe	6cdfa30455	osd: PG::Info::History: init last_epoch_clean It seems that we have not been zeroing PG::Info::History:last_epoch_clean when the History structure is created. This led to some very interesting log output (and bugs!) Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2011-01-03 10:30:56 -08:00
Samuel Just	9ad05cf7ff	SimpleMessenger.cc: Fixes a dispatch_throttler leak in queue_received when the pipe has been halted. Signed-off-by: Samuel Just <samuelj@hq.newdream.net>	2011-01-03 10:14:52 -08:00
Sage Weil	180a417603	v0.24	2010-12-20 15:58:09 -08:00
Sage Weil	69940e2717	osd: compensate for replicas with tail > last_complete Normally we shouldn't ever have a last_complete < log.tail (&& !backlog). But maybe we do (old bugs, whatever; see #590). In that case, the primary can compensate by sending more log info to the replica. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-20 13:22:49 -08:00
Sage Weil	b04b6f4823	mds: make nested scatterlock state change check more robust The predirty_journal_parents() calls wrlock_start() with nowait=true because it has a journal entry open and we don't want to trigger a nested scatterlock change that needs to journal something again (either via scatter_writebehind or scatter_start). (MDLog can only handle a single log entry open at once because building multiple at once would require very very very careful ordering of predirty() calls and versions.) We were already check for the simple_lock() case (which may call writebehind); fix up the check to also cover the scatter_mix() (which may call scatter_start) case. Fixes this crash: mds/MDLog.h: In function 'void MDLog::start_entry(LogEvent)': mds/MDLog.h:191: FAILED assert(cur_event == __null) ceph version 0.24~rc (commit:fe10300317383ec29948d7dbe3cb31b3aa277e3c) 1: (CInode::finish_scatter_update(ScatterLock, CDir, unsigned long, unsigned long)+0x804) [0x606e14] 2: (CInode::start_scatter(ScatterLock)+0xaa) [0x60dc1a] 3: (Locker::scatter_mix(ScatterLock, bool)+0x1ca) [0x589a9a] 4: (Locker::wrlock_start(SimpleLock, MDRequest, bool)+0x165) [0x597d65] 5: (MDCache::predirty_journal_parents(Mutation, EMetaBlob, CInode, CDir, int, int, snapid_t)+0x153e) [0x55a70e] 6: (Locker::scatter_writebehind(ScatterLock)+0x42d) [0x58553d] 7: (Locker::simple_lock(SimpleLock, bool)+0x7ab) [0x58beeb] 8: (Locker::scatter_nudge(ScatterLock, Context, bool)+0x3ad) [0x58c49d] 9: (Locker::scatter_tick()+0x28a) [0x58c98a] 10: (MDS::tick()+0x4e4) [0x4b26a4] 11: (SafeTimer::timer_thread()+0x22c) [0x6d164c] 12: (SafeTimerThread::entry()+0xd) [0x6d34bd] 13: (Thread::_entry_func(void)+0xa) [0x4943da] 14: /lib/libpthread.so.0 [0x7fc87810b73a] 15: (clone()+0x6d) [0x7fc876dad69d] Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 21:02:58 -08:00
Sage Weil	3a235b0f21	filestore: make OpSequencer::flush() work for writeahead journaling items It was only waiting for items in the op_queue to complete. The goal is to wait for anything we've called queue_transactions(&osr,...) on. If we do writeahead journaling, though, there might be new ops that are still journaling but not yet submitted to the fs that are missed. This adds a journal queue to the OpSequencer, and uses it in the writeahead case only. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 15:30:39 -08:00
Colin Patrick McCabe	285f351b72	mon: build_initial_monmap: fix mismatched alloc Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-17 15:31:41 -08:00
Colin Patrick McCabe	caa4609387	common: cleanups common_init: avoid (mismatched) heap allocation ConfFile::_parse: avoid memory leak on error path ConfFile: NULL filename if not set, rather than leaving it undefined Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-17 15:26:37 -08:00
Colin Patrick McCabe	28bcf0bc98	osd: PG::choose_acting: fix major iterator mistake Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-17 15:14:53 -08:00
Colin Patrick McCabe	f7dc1a9239	rgw: fix fd leak on error path Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-17 15:14:53 -08:00
Colin Patrick McCabe	795811d66a	hadoop: fix a bunch of mismatched allocations Using array new means you need array delete. Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-17 15:14:53 -08:00
Colin Patrick McCabe	2f916086a6	auth: avoid mismatched allocation Can't pair strdup and free. Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-17 15:14:53 -08:00
Sage Weil	3c7d30f1ac	osd: flush pg writes to disk before starting scrub scan This avoids two races: - we just completed recovery by pushing objects to the replica, and the replica starts scanning before those writes reach the fs. - we just trimmed to something after last_update_applied. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 14:15:35 -08:00
Sage Weil	5184db4424	filestore: add per-sequencer flush operation Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 14:15:35 -08:00
Sage Weil	2fb60daf68	osd: debug scan_list and scrub a bit better Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 12:51:03 -08:00
Sage Weil	1cfad2ea77	osd: clear INCONSISTENT if scrub detects no errors Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 10:59:45 -08:00
Sage Weil	b190875548	osd: add assert that we're replica ar Fred saw a crash where we got into merge_log as a stray, which really shouldn't ever happen! See #590. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 10:36:34 -08:00
Laszlo Boszormenyi	1e291fc9ef	debian: don't strip rados classes Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 08:31:00 -08:00
Laszlo Boszormenyi	9c173bb400	debian: rename ceph.lintian -> ceph.lintian-overrides Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 08:30:43 -08:00
Samuel Just	73669d87e6	PG.cc: sub_op_scrub must set finalizing_scrub on the replica before waiting for last_update_applied to catch up to info.last_update. Signed-off-by: Samuel Just <samuelj@hq.newdream.net>	2010-12-16 13:06:43 -08:00
Samuel Just	29480f42be	ReplicatedPG.cc: _scrub must set head when it encounters a head snap curclone counts down, not up Signed-off-by: Samuel Just <samuelj@hq.newdream.net>	2010-12-15 17:23:59 -08:00
Sage Weil	914f6ddebd	filestore: detect final version of async ioctl SNAP_CREATE_V2 Li's revised interface for the async snap ioctl is more flexible. Update the ioctl call sites and detection code accordingly. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-15 13:39:57 -08:00
Greg Farnum	06a2d7a269	mds: Save straydn in mdr so it's consistent across retry attempts. Otherwise, we could choose new stray dirs and fail to get all the locks we needed (while leaving old strays locked forever!). Signed-off-by: Greg Farnum <gregf@hq.newdream.net>	2010-12-15 13:07:25 -08:00
Sage Weil	89d5c91e7d	mon: trim pgmap less aggressively This will make observer crashes due to missed states (#648) much harder to hit. Eventually the pgmap state trim problem will go away when the monitor/paxos code is restructured (#647). Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-14 11:50:03 -08:00
Yehuda Sadeh	b989087ddf	crypto: catch cryptopp decrypt/encrypt exceptions	2010-12-14 10:51:46 -08:00
Colin Patrick McCabe	3932f084f7	osd: PG::prior_set_affected: const cleanup Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-14 01:53:37 -08:00
Sage Weil	9add26be76	mds: fix replay/resent vs completed request check If it is a _replayed_ request, we should always send a simple ack if it is completed, because the client doesn't not care about any additional caps. If it is a _resent_ request, then we want to return useful caps on open or create requests, even if any modification side-effects have already been committed. The additional checks for completed already exist in the create and open handlers. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-12 14:40:05 -08:00
Colin Patrick McCabe	346a2aac42	rpm: update changelog Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-09 14:38:08 -08:00
Colin Patrick McCabe	e23d620068	rpm: fix ceph.spec to work with gcephtool Don't try to package gui_resources unless we are building the GUI. Get GUI dependencies correct. Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-09 14:35:48 -08:00
Vangelis Koukis	83612ef736	Fix overflow in FileJournal::_open_file() [ The following text is in the "iso-8859-7" character set. ] [ Your display is set for the "iso-8859-1" character set. ] [ Some special characters may be displayed incorrectly. ] Running the unstable branch, mkcephfs fails when trying to create a 3GB journal file on the OSDs. Relevant messages from the osd logfile: 2010-12-09 19:03:54.419737 7fdde4d51720 journal _open_file: unable to extend journal to 18446744072560312320 bytes 2010-12-09 19:03:54.419789 7fdde4d51720 filestore(/osd) mkjournal error creating journal on /osd/journal The problem is that the calculation of the journal size in bytes overflows, in FileJournal::_open_file(). Signed-off-by: Vangelis Koukis <vkoukis@cslab.ece.ntua.gr> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-09 13:45:30 -08:00
Samuel Just	d0fbc30a0a	ReplicatedPG.cc: Fixes a bug in snap_trimmer where a pointer to a stack Cond is left in the mode.waiting_cond list. Signed-off-by: Samuel Just <samuelj@hq.newdream.net>	2010-12-09 13:09:20 -08:00
Samuel Just	329ae1bc3b	ReplicatedPG: snap_trimmer now acquires a read lock on the osd map before calling share_pg_info. Signed-off-by: Samuel Just <samuelj@hq.newdream.net>	2010-12-09 13:09:20 -08:00
Colin Patrick McCabe	f68e6e7d38	rpm: don't try to package radosacl radosacl is just a test binary, so unless we build with --with-debug, we won't get it. Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-09 11:18:33 -08:00
Colin Patrick McCabe	6722b0c85d	rpm: add pkgconfig to BuildRequires You can't build without pkgconfig. Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-09 11:18:32 -08:00
Colin Patrick McCabe	9df18d1984	rpm: set files-attr for radosgw Signed-off-by: Colin McCabe <colinm@hq.newdream.net>	2010-12-09 10:28:39 -08:00
Sage Weil	b4264fbbdc	filejournal: reset last_commited_seq if we find journal to be invalid If we read an event that's later than our expected entry, we set read_pos to -1 and discard the journal. If that happens we also need to reset last_committed_seq to avoid a crash like 2010-12-08 17:04:39.246950 7f269d138910 journal commit_finish thru 16904 2010-12-08 17:04:39.246961 7f269d138910 journal committed_thru 16904 < last_committed_seq 37778589 os/FileJournal.cc: In function 'virtual void FileJournal::committed_thru(uint64_t)': os/FileJournal.cc:854: FAILED assert(seq >= last_committed_seq) ceph version 0.24~rc (commit:fe10300317383ec29948d7dbe3cb31b3aa277e3c) 1: (FileJournal::committed_thru(unsigned long)+0xad) [0x588e7d] 2: (JournalingObjectStore::commit_finish()+0x8c) [0x57f2ec] 3: (FileStore::sync_entry()+0xcff) [0x5764cf] 4: (FileStore::SyncThread::entry()+0xd) [0x506d9d] 5: (Thread::_entry_func(void*)+0xa) [0x4790ba] 6: /lib/libpthread.so.0 [0x7f26a2f8373a] 7: (clone()+0x6d) [0x7f26a1c2569d] Fixes #631 Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-08 18:10:49 -08:00
Sage Weil	a9c098df47	mon: use helper for clock drift check; log relative instead of absolute time Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-08 11:12:51 -08:00
Sage Weil	fe10300317	mds: sync->mix replica state is sync->mix(2) When auth first moves to sync->mix, - auth sends AC_MIX to replicas - replicas go to sync->mix - replicas finish gather, send AC_SYNCACK, move to sync->mix(2) - auth gets all acks, sends AC_MIX again - replica moves to MIX So any new replica should just get sync->mix(2), so that it is not confused by the second AC_MIX. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:19 -08:00
Sage Weil	2000f69e99	mds: no not choose lock state on replicas The lock state has already been set during rejoin. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:19 -08:00
Sage Weil	3825c4b87b	mds: small rejoin cleanup Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:18 -08:00
Sage Weil	9b9b86935e	mds: rev mds cluster internal protocol The lock encoding changed with the dirty bit on scatterlocks. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:18 -08:00
Sage Weil	2ea9b2d7db	mds: fix replay of already-journaled requests Check for already-completed tids for both retried and replayed requests. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:18 -08:00
Sage Weil	b5fd2e4d4e	mds: open undef dirfrags during rejoin Any invented dirfrags have a version of 0. This will cause problems later if we pre_dirty() anything in that dir because the dir version won't be in sync (it'll be way too small). Also, we can do that at any point, e.g. when flushing dirty caps, and aren't allowed to delay, so we need to load those dirfrags now. In theory we could read only the fnode and not all the dentries, but we may as well. We should be more careful about memory that this patch is, though. Fixes #15. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:18 -08:00
Sage Weil	39c5933db0	mds: add missing try_clear_more() to scatterlock Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:18 -08:00
Sage Weil	c681ed752f	mds: explicitly pass scatterlock dirty flag to auth on gather This ensures that if the replica is thinks it is flushing something the auth will always do a scatter_writebehind. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:18 -08:00
Sage Weil	9bbb33b436	mds: send LOCKFLUSHED to trigger finish_flush on replicas Since `f741766a` we have triggered start_flush and finish_flush on replicas. The problem is that the finish_flush didn't always happen for the mix->lock case: we sould start_flush when we sent the AC_LOCKACK, but could only finish_flush if/when we got another SYNC or MIX. If the primary stayed in the LOCK state, we would keep our flushing flag. That in turn causes problems later when we try to eval_gather() (esp if we are auth at that point?). Fix this by sending an explicit AC_LOCKFLUSHED message to replicas after we do a scatter_writebehind. The replica will only set flushing if it flushed dirty data, which forces scatter_writebehind, so we will always get the LOCKFLUSHED to match. Replicas that didn't flush will also get it, but oh well. We'd need to keep track which ones sent dirty data to do that properly, though. TODO: still need to verify that this is correct for rejoin. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:18 -08:00
Sage Weil	681b010fdb	mds: clear EXPORTINGCAPS on export_reverse We need to reverse the effects of encode_export_inode_caps(), which is just the pin and state bit. The original problem can be reproduced with - ceph tell mds 0 injectargs '--mds-kill-import-at 5' - restart mds - recovery completes successfully - wait for the subtree to be reexported - fail with bad EXPORTINGCAPS get in encode_export_inode_caps Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-07 16:44:18 -08:00

1 2 3 4 5 ...

12180 Commits