RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-16 00:15:35 +00:00

Author	SHA1	Message	Date
Sage Weil	28d59d374b	os/FileStore: fix non-btrfs op_seq commit order The op_seq file is the starting point for journal replay. For stable btrfs commit mode, which is using a snapshot as a reference, we should write this file before we take the snap. We normally ignore current/ contents anyway. On non-btrfs file systems, however, we should only write this file after we do a full sync, and we should then fsync(2) it before we continue (and potentially trim anything from the journal). This fixes a serious bug that could cause data loss and corruption after a power loss event. For a 'kill -9' or crash, however, there was little risk, since the writes were still captured by the host's cache. Fixes: #3721 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>	2013-01-03 17:15:07 -08:00
John Wilkins	f1e0305f0d	doc: Removed the --without-tcmalloc flag until further advised. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-01-03 16:13:13 -08:00
Sage Weil	19df20867d	Merge pull request #30 from rca/master Minor clarification in docs.	2013-01-03 16:07:59 -08:00
John Wilkins	88af7d182a	doc: Added defaults for PGs, links to recommended settings, and updated note on splitting. Fixes: #3555 Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-01-03 14:51:33 -08:00
Samuel Just	4ae4dce5c5	OSD: for old osds, dispatch peering messages immediately Normally, we batch up peering messages until the end of process_peering_events to allow us to combine many notifies, etc to the same osd into the same message. However, old osds assume that the actiavtion message (log or info) will be _dispatched before the first sub_op_modify of the interval. Thus, for those peers, we need to send the peering messages before we drop the pg lock, lest we issue a client repop from another thread before activation message is sent. Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-01-03 14:18:00 -08:00
John Wilkins	73bc8ffc90	doc: Added comments on --without-tcmalloc option when building Ceph. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-01-03 13:30:14 -08:00
rca	37b57cdf0f	Update doc/rados/configuration/filesystem-recommendations.rst Clarified when it's necessary to use the setting: filestore xattr use omap = true	2013-01-03 13:30:01 -08:00
John Wilkins	43ef6772eb	doc: Added some packages to the copyable line. Fixes: #3686 Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-01-03 13:29:20 -08:00
John Wilkins	333ae82c61	doc: Fixed syntax error. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-01-03 13:28:06 -08:00
Sage Weil	7e94f6f1a7	Merge remote-tracking branch 'gh/wip-3714-b' into next Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-01-03 12:53:07 -08:00
David Zafman	224a33bb3b	qa/workunit: Add dbench-short.sh for nfs suite A multi-client dbench run doesn't work over NFS, see bug #3718. Make single client dbench available. Signed-off-by: David Zafman <david.zafman@inktank.com>	2013-01-03 12:44:19 -08:00
Sage Weil	a32d6c5dca	osd: move common active vs booting code into consume_map Push osdmaps to PGs in separate method from activate_map() (whose name is becoming less and less accurate). Signed-off-by: Sage Weil <sage@inktank.com>	2013-01-02 22:39:10 -08:00
Sage Weil	0bfad8ef20	osd: let pgs process map advances before booting The OSD deliberate consumes and processes most OSDMaps from while it was down before it marks itself up, as this is can be slow. The new threading code does this asynchronously in peering_wq, though, and does not let it drain before booting the OSD. The OSD can get into a situation where it marks itself up but is not responsive or useful because of the backlog, and only makes the situation works by generating more osdmaps as result. Fix this by calling activate_map() even when booting, and when booting draining the peering_wq on each call. This is harmless since we are not yet processing actual ops; we only need to be async when active. Fixes: #3714 Signed-off-by: Sage Weil <sage@inktank.com>	2013-01-02 22:20:06 -08:00
Sage Weil	5fc94e89a9	osd: drop oldest_last_clean from activate_map Signed-off-by: Sage Weil <sage@inktank.com>	2013-01-02 22:04:34 -08:00
Sage Weil	67f7ee6799	osd: drop unused variables from activate_map Signed-off-by: Sage Weil <sage@inktank.com>	2013-01-02 22:04:08 -08:00
Sage Weil	a14a36ed78	OSDMap: fix modifed -> modified typo Signed-off-by: Sage Weil <sage@inktank.com>	2013-01-02 21:09:07 -08:00
Sage Weil	6b5a89d237	Merge remote-tracking branch 'gh/next'	2013-01-02 18:13:25 -08:00
Sage Weil	43cba617aa	log: fix locking typo/stupid for dump_recent() We weren't locking m_flush_mutex properly, which in turn was leading to racing threads calling dump_recent() and garbling the crash dump output. Backport: bobtail, argonaut Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2013-01-02 17:01:32 -08:00
John Wilkins	29ff87a573	Merge branch 'master' of https://github.com/ceph/ceph	2013-01-02 15:59:59 -08:00
John Wilkins	64d2760a49	doc: Added a memory profiling section. Ported from the wiki. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-01-02 15:58:03 -08:00
John Wilkins	5066abf189	doc: Added memory profiling to the index. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-01-02 15:57:22 -08:00
Sam Lang	0e9a0cd7b8	qa/workunit: Update pjd script to use new tarball The pjd script now uses the latest version of pjd with an additional test for opening a non-existent file. Signed-off-by: Sam Lang <sam.lang@inktank.com>	2013-01-02 17:08:37 -06:00
Sam Lang	d8940d15c3	fuse: Fix cleanup code path on init failure With the changes from `856f32ab`, the cfuse.init call returns a _positive_ errno, which was getting ignored. Also, if an error occurs during cfuse.init(), we need to teardown the client mount. Signed-off-by: Sam Lang <sam.lang@inktank.com>	2013-01-02 16:38:28 -06:00
Josh Durgin	c4370ff03f	librbd: establish watch before reading header This eliminates a window in which a race could occur when we have an image open but no watch established. The previous fix (using assert_version) did not work well with resend operations. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-02 14:15:34 -08:00
Sage Weil	9a1cf51888	Merge branch 'wip-journal-aio' into next Reviewed-by: Samuel Just <sam.just@inktank.com> Backport: bobtail	2013-01-02 13:42:22 -08:00
Sage Weil	483c6f76ad	test_filejournal: optionally specify journal filename as an argument Signed-off-by: Sage Weil <sage@inktank.com>	2013-01-02 13:39:05 -08:00
Sage Weil	c461e7fc1e	test_filejournal: test journaling bl with >IOV_MAX segments Signed-off-by: Sage Weil <sage@inktank.com>	2013-01-02 13:39:05 -08:00
Sage Weil	dda7b65189	os/FileJournal: limit size of aio submission Limit size of each aio submission to IOV_MAX-1 (to be safe). Take care to only mark the last aio with the seq to signal completion. Signed-off-by: Sage Weil <sage@inktank.com>	2013-01-02 13:39:05 -08:00
Josh Durgin	e0858fa899	Revert "librbd: ensure header is up to date after initial read" Using assert version for linger ops doesn't work with retries, since the version will change after the first send. This reverts commit `e177680903`. Conflicts: qa/workunits/rbd/watch_correct_version.sh	2013-01-02 12:32:33 -08:00
John Wilkins	82297706da	doc: Minor edits. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-01-02 11:24:39 -08:00
John Wilkins	d3b9803eab	doc: Fixed typo, clarified usage. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-01-02 11:15:16 -08:00
Yan, Zheng	8422474320	mds: fix rename inode exportor check Use "srcdn->is_auth() && destdnl->is_primary()" to check if the MDS is inode exportor of rename operation is not reliable, This is because OP_FINISH slave request may race with subtree import. The fix is use a variable in MDRequest to indicate if the MDS is inode exportor. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:40 +08:00
Yan, Zheng	5e8642a82e	mds: call maybe_eval_stray after removing a replica dentry MDCache::handle_cache_expire() processes dentries after inodes, so the MDCache::maybe_eval_stray() in MDCache::inode_remove_replica() always fails to remove stray inode because MDCache::eval_stray() checks if the stray inode's dentry is replicated. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:40 +08:00
Yan, Zheng	f5ea5c36a4	mds: don't defer processing caps if inode is auth pinned We should not defer processing caps if the inode is auth pinned by MDRequest, because the MDRequest may change lock state of the inode later and wait for the deferred caps. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:40 +08:00
Yan, Zheng	fe5936b158	mds: remove unnecessary is_xlocked check Locker::foo_eval() is always called for stable locks, so no need to check if the lock is xlocked. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:40 +08:00
Yan, Zheng	b2d5005aa0	mds: fix lock state transition check Locker::simple_excl() and Locker::scatter_mix() miss is_rdlocked check; Locker::file_excl() miss is_rdlocked check and is_wrlocked check. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:40 +08:00
Yan, Zheng	b3796f46a4	mds: indroduce DROPLOCKS slave request In some rare case, Locker::acquire_locks() drops all acquired locks in order to auth pin new objects. But Locker::drop_locks only drops explicitly acquired remote locks, does not drop objects' version locks that were implicitly acquired on remote MDS. These leftover locks break locking order when re-acquiring _locks and may cause dead lock. The fix is indroduce DROPLOCKS slave request which drops all acquired lock on remote MDS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:40 +08:00
Yan, Zheng	7e04504d3e	mds: fix on-going two phrase commits tracking The slaves for two phrase commit should be mdr->more()->witnessed instead of mdr->more()->slaves. mdr->more()->slaves includes MDS for remote auth pin and lock Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:40 +08:00
Yan, Zheng	2f96b472ef	mds: fix anchor table commit race Anchor table updates for a given inode is fully serialized on client side. But due to network latency, two commit requests from different clients can arrive to anchor server out of order. The anchor table gets corrupted if updates are committed in wrong order. The fix is track on-going anchor updates for individual inode and delay processing commit requests that arrive out of order. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:40 +08:00
Yan, Zheng	a79493da34	mds: skip frozen inode when assimilating dirty inodes' rstat CDir::assimilate_dirty_rstat_inodes() may encounter frozen inodes that are being renamed. Skip these frozen inodes because assimilating inode's rstat require auth pinning the inode. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:40 +08:00
Yan, Zheng	61da9b1845	mds: mark rename inode as ambiguous auth on all involved MDS When handling cross authority rename, the master first sends OP_RENAMEPREP slave requests to witness MDS, then sends OP_RENAMEPREP slave request to the rename inode's auth MDS after getting all witness MDS' acknowledgments. Before receiving the OP_RENAMEPREP slave request, the rename inode's auth MDS may change lock state of the rename inode and send lock messages to witness MDS. But the witness MDS may already received the OP_RENAMEPREP slave request and changed the source inode's authority. So the witness MDS send lock acknowledgment message to wrong MDS and trigger assertion. The fix is, firstly the master marks rename inode as ambiguous and send a message to ask the rename inode's auth MDS to mark the inode as ambiguous, then send OP_RENAMEPREP slave requests to the witness MDS, finally send OP_RENAMEPREP slave request to the rename inode's auth MDS after getting all witness MDS' acknowledgments. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:39 +08:00
Yan, Zheng	3b13d3dcbc	mds: only export directory fragments in stray to their auth MDS Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:39 +08:00
Yan, Zheng	d9d7147339	mds: don't trim ambiguous imports in MDCache::trim_non_auth_subtree Trimming ambiguous imports in MDCache::trim_non_auth_subtree() confuses MDCache::disambiguate_imports() and causes infinite loop. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:39 +08:00
Yan, Zheng	fcb9f98887	mds: use null dentry to find old parent of renamed directory When replaying an directory rename operation, MDS need to find old parent of the renamed directory to adjust auth subtree. Current code searchs the cache to find the old parent, it does not work if the renamed directory inode is not in the cache. EMetaBlob for directory rename contains at most one null dentry, so MDS can use null dentry to find old parent of the renamed directory. If there is no null dentry in the EMetaBlob, the MDS was witness of the rename operation and there is not auth subtree underneath the renamed directory. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:39 +08:00
Yan, Zheng	7a52016864	mds: don't journal null dentry for overwrited remote linkage Server::_rename_prepare() adds null dest dentry to the EMetaBlob if the rename operation overwrites a remote linkage. This is incorrect because null dentry are processed after primary and remote dentries during journal replay. The erroneous null dentry makes the dentry of rename destination disappear. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:39 +08:00
Yan, Zheng	5ae715be5c	mds: xlock stray dentry when handling rename or unlink This prevents MDS from reintegrating stray before rename/unlink finishes Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:39 +08:00
Yan, Zheng	262795744b	mds: don't trigger assertion when discover races with rename Discover reply that adds replica dentry and inode can race with rename if slave request for rename sends discover and waits, but waked up by reply for different discover. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 19:25:39 +08:00
Yan, Zheng	e10267b531	mds: fix Locker::simple_eval() Locker::simple_eval() checks if the loner wants CEPH_CAP_GEXCL to decide if it should change the lock to EXCL state, but it checks if CEPH_CAP_GEXCL is issued to the loner to decide if it should change the lock to SYNC state. So if the loner wants CEPH_CAP_GEXCL, but doesn't have CEPH_CAP_GEXCL, Locker::simple_eval() will keep switching the lock state. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 13:55:43 +08:00
Yan, Zheng	7e23321b72	mds: don't renew revoking lease MDS may receives lease renew request while lease is being revoked, just ignore the renew request. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-01-02 13:54:51 +08:00
Sage Weil	eb02eaede5	Merge remote-tracking branch 'gh/wip-bobtail-docs'	2013-01-01 10:36:57 -08:00

... 3 4 5 6 7 ...

23375 Commits