RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-26 21:43:10 +00:00

Author	SHA1	Message	Date
Sage Weil	00f436c144	Merge pull request #904 from ceph/wip-mds-cluster2 Wip mds cluster2 Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-16 17:03:27 -08:00
Sage Weil	c5bccfef88	ceph_test_rados_api_tier: fix HitSetRead test race with split Recalculate the hash on each iteration in case we are racing with split. Fixes: #7013 Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-16 16:52:35 -08:00
Sage Weil	94da54ff95	Merge pull request #954 from ceph/wip-7009 mon: move supported_commands fields, methods into Monitor, and fix leak Reviewed-by: Greg Farnum <greg@inktank.com>	2013-12-16 16:31:39 -08:00
Sage Weil	7e618c937b	mon: move supported_commands fields, methods into Monitor, and fix leak We were leaking the static leader_supported_mon_commands. Move this into the class so that we can clean up in the destructor. Rename get_command_descriptions -> format_command_descriptions. Fixes: #7009 Signed-off-by: Sage Weil <sage@inktank.com>	2013-12-16 16:09:44 -08:00
Sage Weil	1597d4e9f5	Merge pull request #951 from ceph/wip-linux-version common: introduce get_linux_version() Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-16 09:27:43 -08:00
Ilya Dryomov	824b3d8e84	FileJournal: use pclose() to close a popen() stream In FileJournal::_check_disk_write_cache(), use pclose() instead of fclose() to close a stream, created by popen(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2013-12-16 18:57:22 +02:00
Ilya Dryomov	6696ab6479	FileJournal: switch to get_linux_version() For the purposes of FileJournal::_check_disk_write_cache(), use get_linux_version(), which is based on uname(2), instead of parsing the contents of /proc/version. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2013-12-16 18:57:22 +02:00
Ilya Dryomov	fcf6e9878b	common: introduce get_linux_version() get_linux_version() returns a version of the currently running kernel, encoded as in int, and is contained in common/linux_version.[ch]. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2013-12-16 18:57:21 +02:00
Ilya Dryomov	a2babe27e8	configure: break up AC_CHECK_HEADERS into one header-file per line Break up AC_CHECK_HEADERS macro into one header-file per line so it's easier to read and make changes. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2013-12-16 18:57:21 +02:00
Yan, Zheng	4526d13a9d	mds: fix stale session handling for multiple mds Don't add new caps to stale session when importing inodes. Don't touch session when importing caps because it confuses the stale session detection. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 14:24:52 +08:00
Yan, Zheng	43f7268f5d	mds: properly set dirty flag when journalling import Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 14:24:52 +08:00
Yan, Zheng	802df76f68	mds: properly update mdsdir's authority during recovery dirfrag of mdsdir doesn't inherit its parent inode's authority. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 14:24:52 +08:00
Yan, Zheng	b6d1d8f186	mds: finish opening sessions even if import aborted Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 14:24:52 +08:00
Yan, Zheng	80005f1ece	mds: fix discover path race When C_MDC_RetryDiscoverPath executed, we may have already become auth mds of base Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 14:24:50 +08:00
Sage Weil	58d68995c4	Merge pull request #947 from dachary/wip-6824 mon: set ceph osd (down\|out\|in\|rm) error code on failure Reviewed-by: Sage Weil <sage@inktank.com>	2013-12-15 21:16:48 -08:00
Yan, Zheng	5fdcc568c6	mds: fix bug in MDCache::open_ino_finish It's wrong to erase open_ino_info_t after finishing contexts, because MDCache::open_ino() can be called again when finishing contexts. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:25 +08:00
Yan, Zheng	71d1eb374a	mds: add CEPH_FEATURE_EXPORT_PEER and bump the protocal version Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:25 +08:00
Yan, Zheng	d0b744a1d6	client: handle session flush message Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:25 +08:00
Yan, Zheng	05b192faab	mds: simplify how to export non-auth caps Introduce a new flag in cap import message. If client finds the flag is set, it releases exporter's caps (send release to the exporter). This saves the cap export message and a "mds to mds" message. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:25 +08:00
Yan, Zheng	9dc52ff04b	mds: send cap import messages to clients after importing subtree succeeds When importing subtree, the importer sends cap import messages to clients before the import subtree operation is considered as success. If the exporter crashes before EExport event is journalled, the importer needs to re-export client caps. This confuses clients, and makes them lose track of auth caps. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:25 +08:00
Yan, Zheng	6a565881f6	mds: re-send cap exports in resolve message. For rename operation that changes inode's authority, if master mds of the operation crashed, inode's original auth mds sends export messages to clients when it receives the master mds' resolve ack message, Client can't reply on the export message to add caps for the master mds, then reconnect the cap when the master mds enters reconnect stage. Because client may receive the export message after receiving mdsmap that claims the master mds is in reconnect stage. The fix is include cap exports in resolve message, so the master mds can send import messages to clients when it enters the rejoin stage. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:25 +08:00
Yan, Zheng	4fdeb00df2	mds: include counterpart's information in cap import/export messages when exporting indoes with client caps, the importer sends cap import messages to clients, the exporter sends cap export messages to clients. A client can receive these two messages in any order. If a client first receives cap import message, it adds the imported caps. but the caps from the exporter are still considered as valid. This can compromise consistence. If MDS crashes while importing caps, clients can only receive cap export messages, but don't receive cap import messages. These clients don't know which MDS is the cap importer, so they can't send cap reconnect when the MDS recovers. We can handle above issues by including counterpart's information in cap import/export messages. If a client first receives cap import message, it added the imported caps, then removes the the exporter's caps. If a client first receives cap export message, it removes the exported caps, then adds caps for the importer. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:25 +08:00
Yan, Zheng	ef902ee0b9	mds: send info of imported caps back to the exporter (rename) use MMDSSlaveRequest::OP_FINISH slave request to send information of rename imported caps back to the exporter. This is preparation for including counterpart's information in cap import/export message. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:25 +08:00
Yan, Zheng	85171fd6c2	mds: send info of imported caps back to the exporter (cache rejoin) Use cache rejoin ack message to send information of rejoin imported caps back to the exporter. Also move the code that exports reconnect caps to MDCache::handle_cache_rejoin_ack() This is preparation for including counterpart's information in cap import/export message. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	ff8b9ac358	mds: send info of imported caps back to the exporter (export dir) Introduce a new class Capability::Import and use it to send information of imported caps back to the exporter. This is preparation for including counterpart's information in cap import/export message. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	d00ec7915c	mds: flush session messages before exporting caps Following sequence of events can happen when exporting inodes: - client sends open file request to mds.0 - mds.0 handles the request and sends inode stat back to the client - mds.0 export the inode to mds.1 - mds.1 sends cap import message to the client - mds.0 sends cap export message to the client - client receives the cap import message from mds.1, but the client still doesn't have corresponding inode in the cache. So the client releases the imported caps. - client receives the open file reply from mds.0 - client receives the cap export message from mds.0. After the end of these events, the client doesn't have any cap for the opened file. To fix the message ordering issue, this patch introduces a new session operation FLUSHMSG. Before exporting caps, we send a FLUSHMSG seesion message to client and wait for the acknowledgment. When receiveing the FLUSHMSG_ACK message from client, we are sure that clients have received all messages sent previously. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	77515b7a3c	mds: increase cap sequence when sharing max size For case: - client voluntarily releases some caps through cap update message - mds shares the new max by sending cap grant message - mds recevies the cap update message If mds doesn't increase the cap sequence when sharing the max size. It can't determine if the cap update message was sent before or after client reveived the cap grant message that updates max size. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	65259796ae	mds: include inode version in auth mds' lock messages encode inode version in auth mds' lock messages, so that version of replica inodes get updated. This is important because client use inode version in mds reply to check if the cached inode is already up-to-date. It skips updating the inode if it thinks the inode is already up-to-date. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	f134c77267	mds: avoid allocating MDRequest::More when cleanup request Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	e6c4d32e64	mds: waiting for slave reuqest to finish If MDS receives a client request, but find there is an existing slave request. It's possible that other MDS forwarded the request to us, but the MMDSSlaveRequest::OP_FINISH message arrives after the client request. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	1536e814da	mds: check lock state before eval_gather Locker::eval_gather() can dispatch requests, which may change other locks' states. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	e1818692d1	mds: don't request CEPH_CAP_PIN from auth mds avoid triggering assert(in->get_loner() >= 0 && in->mds_caps_wanted.empty()) in Locker::file_xsyn() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	87ca260488	mds: fix sending resolve message need to send resolve message when mds is in reconnect state Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:24 +08:00
Yan, Zheng	b7d78918de	mds: keep dentry lock in sync state unlike locks of other types, dentry lock in unreadable state can block path traverse, so it should be in sync state as much as possible. This patch make Locker::try_eval() change dentry lock's state to sync even when the dentry is freezing. Also make migrator check imported dentries' lock states, change locks' states to sync if necessary. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	d8440c4cae	mds: avoid leaving bare-bone dirfrags in the cache Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	b2a137007f	mds: re-issue caps after importing inode After importing inode, the issued caps can be less than the caps client wants. So always re-issue caps after importing inode. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	3ac08860d4	mds: avoid issuing caps when inode is frozen Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	31f5b0275e	mds: fix rename notify commit `1d86f77edf` (mds: fix cross-authorty rename race) introduced rename notify, but it puts the code in wrong bracket. This patch also fixes a rename notify related bug in MDCache::handle_mds_failure() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	bd561772ba	mds: re-send discover if want_xlocked becomes true If want_xlocked becomes true, we can not rely on previously sent discover because it's likely the previous discover is blocked on the xlocked dentry. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	913f7fd8db	mds: fix empty directory check Since commit 310032ee81(fix mds scatter_writebehind starvation), rdlock a scatter lock does not always propagate dirty fragstats to corresponding inode. So Server::_dir_is_nonempty() needs to check each dirfrag's stat intead of checking inode's dirstat. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	2fea08b59c	mds: merge delayed cache expire Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	498d5c4998	mds: process delayed expire if exporting dir cancelled in warnning state we may add delayed expire when exporting dir is in warnning state Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	0aed0d48c7	mds: handle cache rejoin corner case A recovering MDS may receives strong cache rejoin from a survivor, then the survivor restarts, the recovering MDS receives week cache rejoin from the same MDS. Before processing the week cache rejoin, we should scour replicas added by the obsoleted strong cache rejoin. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	5a902a0e5d	mds: unify nonce type MDSCacheObject::replica_nonce is defined as __s16, but nonce type in MDSCacheObject::replica_map is int. This mismatch may confuse MDCache::handle_cache_expire(). this patch unifies the nonce type as uint32 Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:23 +08:00
Yan, Zheng	0344d9af74	mds: rework stale import/export message detection Current code uses import state to detect obsolete import/export messages. it does not work for the case: cancel a subtree export, export the same subtree again, the messages for the first export get dispatched. This patch introduces "transation ID" for subtree exports. Each subtree export has a unique TID, the ID is recorded in all import/export related messages. By comparing the TID, we can reliably detect stale messages. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:22 +08:00
Yan, Zheng	9471fdc613	mds: put import/export related states together Current code uses several STL maps to record import/export related states. A map lookup is required for each state access, this is not efficient. It's better to put import/export related states together. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:22 +08:00
Yan, Zheng	ab93aa59bf	mds: freeze tree deadlock detection. there are two situations that result freeze tree deadlock. - mds.0 authpins an item in subtree A - mds.0 sends request to mds.1 to authpin an item in subtree B - mds.0 freezes subtree A - mds.1 authpins an item in subtree B - mds.1 sends request to mds.0 to authpin an item in subtree A - mds.1 freezes subtree B - mds.1 receives the remote authpin request from mds.0 (wait because subtree B is freezing) - mds.0 receives the remote authpin request from mds.1 (wait because subtree A is freezing) - client request authpins items in subtree B - freeze subtree B - import subtree A which is parent of subtree B (authpins parent inode of subtree B, see CDir::set_dir_auth()) - freeze subtree A - client request tries authpinning items in subtree A (wait because subtree A is freezing) Enforcing a authpinning order can avoid the deadlock, but it's very expensive. The deadlock is rare, so I think deadlock detection is more suitable for the case. This patch introduces freeze tree deadlock detection. We record the start time of freezing tree. If we fail to freeze the tree within a given duration, cancel the process of freezing tree. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2013-12-16 12:15:22 +08:00
Sage Weil	edc4224de4	Merge remote-tracking branch 'gh/wip-hitset' Reviewed-by: Greg Farnum <greg@inktank.com> Conflicts: src/common/config_opts.h src/osd/ReplicatedPG.cc src/osdc/Objecter.cc src/vstart.sh	2013-12-15 16:57:23 -08:00
Sage Weil	f192a600c5	Revert "common/Formatter: add newline to flushed output if m_pretty" This reverts commit `d6146b0d91`. As Yehuda points out, this does not properly handle cases where we flush the same output stream multiple times.	2013-12-15 16:23:09 -08:00
Sage Weil	c7b44d6675	Revert "common: fix perf_counters unittests for trailing newline in m_pretty" This reverts commit `ba5572397c`.	2013-12-15 16:22:59 -08:00

1 2 3 4 5 ...

30209 Commits