RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-18 01:16:55 +00:00

Author	SHA1	Message	Date
Yan, Zheng	a4ed7ea8b8	mds: send lock action message when auth MDS is in proper state. For rejoining object, don't send lock ACK message because lock states are still uncertain. The lock ACK may confuse object's auth MDS and trigger assertion. If object's auth MDS is not active, just skip sending NUDGE, REQRDLOCK and REQSCATTER messages. MDCache::handle_mds_recovery() will take care of them. Also defer caps release message until clientreplay or active Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:26:23 -07:00
Yan, Zheng	7ad7c347d4	mds: issue caps when lock state in replica become SYNC because client can request READ caps from non-auth MDS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:26:23 -07:00
Yan, Zheng	10b1a5663f	mds: share inode max size after MDS recovers The MDS may crash after journaling the new max size, but before sending the new max size to the client. Later when the MDS recovers, the client re-requests the new max size, but the MDS finds max size unchanged. So the client waits for the new max size forever. This issue can be avoided by checking client cap's last_sent, share inode max size if it is zero. (reconnected cap's last_sent is zero) Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:26:23 -07:00
Yan, Zheng	b2342a9c31	mds: take object's versionlock when rejoinning xlock Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:26:23 -07:00
Yan, Zheng	6862fe7a14	mds: reqid for rejoinning authpin/wrlock need to be list Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-04-01 09:26:00 -07:00
Yan, Zheng	d1a257498c	mds: handle linkage mismatch during cache rejoin For MDS cluster, not all file system namespace operations that impact multiple MDS use two phase commit. Some operations use dentry link/unlink message to update replica dentry's linkage after they are committed by the master MDS. It's possible the master MDS crashes after journaling an operation, but before sending the dentry link/unlink messages. Later when the MDS recovers and receives cache rejoin messages from the surviving MDS, it will find linkage mismatch. The original cache rejoin code does not properly handle the case that dentry unlink messages were missing. Unlinked inodes were linked to stray dentries. So the cache rejoin ack message need push replicas of these stray dentries to the surviving MDS. This patch also adds code that handles cache expiration in the middle of cache rejoining. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:25:59 -07:00
Yan, Zheng	ce0b74e55e	mds: encode dirfrag base in cache rejoin ack Cache rejoin ack message already encodes inode base, make it also encode dirfrag base. This allowes the message to replicate stray dentries like MDentryUnlink message. The function will be used by later patch. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:24:41 -07:00
Yan, Zheng	9f66d0454f	mds: include replica nonce in MMDSCacheRejoin::inode_strong So the recovering MDS can properly handle cache expire messages. Also increase the nonce value when sending the cache rejoin acks. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Also update the MMDSCacheRejoin encoding to the new format. Signed-off-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:22:38 -07:00
Yan, Zheng	01fd55a64c	mds: remove MDCache::rejoin_fetch_dirfrags() In commit `77946dcdae` (mds: fetch missing inodes from disk), I introduced MDCache::rejoin_fetch_dirfrags(). But it basicly duplicates the function of MDCache::open_undef_dirfrags(), so just remove rejoin_fetch_dirfrags() and make open_undef_dirfrags() also handle undefined inodes. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	e62e48bb32	mds: fix MDS recovery involving cross authority rename For mds cluster, rename operation may involve multiple MDS. If the rename source's auth MDS crashes after some witness MDS have prepared the rename but before the rename is committing. Later when the MDS recovers, its subtree map and linkages are different from the prepared MDS'. This causes problems for both subtree resolve and cache rejoin. The solution is, if the rename source's auth MDS fails, the prepared witness MDS query the master MDS if the operation is committing. If it's not, rollback the rename, then send resolve message to the recovering MDS. Another similar case is a prepared witness MDS crashes when the rename source's auth MDS has prepared or is preparing the operation. when the witness recovers, the master just delay sending the resolve ack message until the it commits the operation. This patch also updates Server::handle_client_rename(). Make preparing the rename source's auth MDS be the final step before committing the rename. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	3ab86637b3	mds: send resolve acks after master updates are safely logged Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	75346d8f3d	mds: send cache rejoin messages after gathering all resolves Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	97bc0d26e6	mds: don't send MDentry{Link,Unlink} before receiving cache rejoin The active MDS calls MDCache::rejoin_scour_survivor_replicas() when it receives the cache rejoin message. The function will remove the objects replicated by MDentry{Link,Unlink} from replica map. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	e381bb3930	mds: set resolve/rejoin gather MDS set in advance For active MDS, it may receive resolve/rejoin message before receiving the mdsmap message that claims the MDS cluster is in resolving/rejoning state. So instead of set the gather MDS set when receiving the mdsmap. set them in advance when detecting MDS' failure. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	ed85dd61a5	mds: don't send resolve message between active MDS When MDS cluster is resolving, current behavior is sending subtree resolve message to all other MDS and waiting for all other MDS' resolve message. The problem is that active MDS can have diffent subtree map due to rename. Besides gathering active MDS's resolve messages are also racy. The only function for these messages is disambiguate other MDS' import. We can replace it by import finish notification. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	30dbb1d4e5	mds: compose and send resolve messages in batch Resolve messages for all MDS are the same, so we can compose and send them in batch. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	a6d9eb8c58	mds: don't delay processing replica buffer in slave request Replicated objects need to be added into the cache immediately Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	131271655f	mds: unify slave request waiting When requesting remote xlock or remote wrlock, the master request is put into lock object's REMOTEXLOCK waiting queue. The problem is that remote wrlock's target can be different from lock's auth MDS. When the lock's auth MDS recovers, MDCache::handle_mds_recovery() may wake incorrect request. So just unify slave request waiting, dispatch the master request when receiving slave request reply. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-04-01 09:17:19 -07:00
Yan, Zheng	ef9a4f6605	mds: defer eval gather locks when removing replica Locks' states should not change between composing the cache rejoin ack messages and sending the message. If Locker::eval_gather() is called in MDCache::{inode,dentry}_remove_replica(), it may wake requests and change locks' states. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:17:09 -07:00
Yan, Zheng	12e7c3d171	mds: avoid sending duplicated table prepare/commit This patch makes table client defer sending table prepare/commit messages until receiving table server's 'ready' message. This avoid duplicated table prepare/commit messages. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:16:59 -07:00
Yan, Zheng	a5dce808b5	mds: make sure table request id unique When a MDS becomes active, the table server re-sends 'agree' messages for old prepared request. If the recoverd MDS starts a new table request at the same time, The new request's ID can happen to be the same as old prepared request's ID, because current table client code assigns request ID from zero after MDS restarts. This patch make table server send 'ready' messages when table clients become active or itself becomes active. The 'ready' message updates table client's last_reqid to avoid request ID collision. The message also replaces the roles of finish_recovery() and handle_mds_recovery() callbacks for table client. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:16:48 -07:00
Yan, Zheng	bb83a5d63c	mds: consider MDS as recovered when it reaches clientreplay state. MDS in clientreplsy state already starts servering requests. It also make MDS::handle_mds_recovery() and MDS::recovery_done() match. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-04-01 09:16:36 -07:00
Yan, Zheng	4ad35b2a83	mds: mark connection down when MDS fails So if the MDS restarts and uses the same address, it does not get old messages. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-03-31 16:57:14 +08:00
Yan, Zheng	fbcc64dffd	mds: fix MDCache::adjust_bounded_subtree_auth() There are cases that need both create new bound and swallow intervening subtree. For example: A MDS exports subtree A with bound B and imports subtree B with bound C at the same time. The MDS crashes, exporting subtree A fails, but importing subtree B succeed. During recovery, the MDS may create new bound C and swallow subtree B. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-03-31 16:57:14 +08:00
Yan, Zheng	573a4ae1a2	mds: process finished contexts in batch If there are several unstable locks in an inode, current Locker::eval(CInode*,) processes each lock's finished contexts seperately. This may cause very deep call stack if finished contexts also call Locker::eval() on the same inode. An extreme example is: Locker::eval() wakes an open request(). Server::handle_client_open() starts a log entry, then call Locker::issue_new_caps(). Locker::issue_new_caps() calls Locker::eval() and wakes another request. The later request also tries starting a log entry. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-03-31 16:57:14 +08:00
Yan, Zheng	5cbaae6648	mds: preserve subtree bounds until slave commit When replaying an operation that rename a directory inode to non-auth subtree, if the inode has subtree bounds, we should prevent them from being trimmed until slave commit. This patch also fixes a bug in ESlaveUpdate::replay(). EMetaBlob::replay() should be called before MDCache::finish_uncommitted_slave_update(). Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2013-03-31 16:57:14 +08:00
Sage Weil	ce8793ce3b	Merge pull request #175 from dachary/wip-4594 fix null character in object name triggering segfault Reviewed-by: Sage Weil <sage@inktank.com>	2013-03-30 18:22:01 -07:00
Loic Dachary	c344ff170d	fix null character in object name triggering segfault Parsing \n in lfn_parse_object_name is implemented with out->append('\0'); which segfaults when using libstdc++ and g++ version 4.6.3 on Debian GNU/Linux. It is replaced with (*out) += '\0'; to avoid the bugous implicit conversion. There is no append(charT) method in C++98 or C++11, which means it relies on an implicit conversion that is bugous. It would be better to rely on the basic_string& operator+=(charT c); method as defined in ISO 14882-1998 (page 385) thru ISO 14882-2012 (page 640) A set of tests is added to generate and parse object names. They need access to the private function lfn_parse_object_name because there is no convenient protected method to exercise it. The tests contain a LFNIndex derived class, TestWrapLFNIndex which is made a friend of LFNIndex to gain access to the private methods. http://tracker.ceph.com/issues/4594 refs #4594 Signed-off-by: Loic Dachary <loic@dachary.org>	2013-03-30 14:28:34 +01:00
Sage Weil	2b8eb31b85	Merge branch 'wip-4490'	2013-03-29 18:02:15 -07:00
Sage Weil	e611937f3e	mon: OSDMonitor: add 'osd pool set-quota' command Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-03-29 17:59:35 -07:00
John Wilkins	95328089b8	doc: Added entries for Pool, PG, & CRUSH. Moved heartbeat link. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-03-29 17:38:48 -07:00
John Wilkins	bcc5c65305	doc: Added heartbeat configuration settings. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-03-29 17:38:02 -07:00
John Wilkins	6157d68369	doc: Moved PG info to separate page. Moved heartbeat to mon-osd doc. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-03-29 17:36:23 -07:00
John Wilkins	ca77aabbf1	doc: Rewrote monitor configuration section. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-03-29 17:34:45 -07:00
John Wilkins	ea3c833d0f	doc: Moved to separate section for parallelism. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-03-29 17:32:47 -07:00
John Wilkins	ba73b8301a	doc: Cleanup. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2013-03-29 17:32:00 -07:00
Sage Weil	e9b3f2e6e9	ceph-disk list: say 'unknown cluster $UUID' when cluster is unknown This makes it clearer that an old osd is in fact old. Signed-off-by: Sage Weil <sage@inktank.com>	2013-03-29 17:30:28 -07:00
Greg Farnum	9e7ddf677f	config_opts: fix rgw_port comments to be plaintext Signed-off-by: Greg Farnum <greg@inktank.com>	2013-03-29 17:05:41 -07:00
Samuel Just	3da3129e07	ReplicatedPG: check for full if delta_stats.num_bytes > 0 Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-03-29 16:47:29 -07:00
Joao Eduardo Luis	9b09073259	mon: Monitor: check if 'pss' arg is !NULL on parse_pos_long() We already do it all throughout the function, but this one place didn't. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-03-29 16:47:29 -07:00
Joao Eduardo Luis	e2a936d2ae	common: util: add 'unit_to_bytesize()' function Converts from a numerical value that may or may not contain an unit modifier ('1024', '1K', '2M', ..., '1E') and returns the parsed size in bytes. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-03-29 16:47:28 -07:00
Joao Eduardo Luis	23c2fa7fc2	osd: osd_types: add pool quota related fields	2013-03-29 16:03:21 -07:00
Sage Weil	562e1716bd	ceph-disk: handle missing journal_uuid field gracefully Only lower if we know it's not None. Signed-off-by: Sage Weil <sage@inktank.com>	2013-03-29 13:59:04 -07:00
Josh Durgin	b504e444fc	Merge remote branch 'origin/next'	2013-03-29 12:58:01 -07:00
Josh Durgin	95c4a81be1	Merge pull request #170 from ceph/wip-rbd-aio-flush Reviewed-by: Sage Weil <sage.weil@inktank.com>	2013-03-29 13:20:32 -07:00
Josh Durgin	4c4d5591bd	librados: move snapc creation to caller for aio_operate The common case already has a snapshot context, so avoid duplicating it (copying a potentially large vector) in IoCtxImpl::aio_operate(). Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-03-29 12:47:17 -07:00
Sage Weil	43e451f6ee	Merge pull request #166 from ceph/wip-disk-list Wip disk list Reviewed-by: Dan Mick <dan.mick@inktank.com>	2013-03-29 12:24:47 -07:00
Yan, Zheng	3cbd0366b7	client: update cap->implemented when handling revoke Fixes #4578 Tested-by: Noah Watkins <noahwatkins@gmail.com>	2013-03-29 11:26:01 -07:00
athanatos	f9c3bba374	Merge pull request #161 from dachary/wip-4560 unit tests for LFNIndex	2013-03-29 10:50:55 -07:00
Greg Farnum	4f8ba0e775	msgr: allow users to mark_down a NULL Connection* Signed-off-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sam Just <sam.just@inktank.com>	2013-03-29 10:42:04 -07:00

1 2 3 4 5 ...

25150 Commits