RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-02 09:02:34 +00:00

Author	SHA1	Message	Date
Haomai Wang	50771dd7e6	AsyncConnection: Enhance replace process Make handle_connect_msg follow lock rule: unlock any lock before acquire messenger's lock. Otherwise, deadlock will happen. Enhance lock condition check because connection's state maybe change while unlock itself and lock again. Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:12 +08:00
Haomai Wang	a1753902dc	AsyncConnection: set state_offset=0 in case of reuse this connection Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:12 +08:00
Haomai Wang	2f9238361c	Event: Fix incorrect memset Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:12 +08:00
Haomai Wang	4b900a6f82	test_msgr: Add SyntheticWorkload to do message measurement Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:12 +08:00
Haomai Wang	e823af41df	AsyncConnection: Don't alloc buffer when reenter "READ_FRONT" state Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:12 +08:00
Haomai Wang	9fc24d4eb9	test_msgr: Add test for a message with large payload Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:11 +08:00
Haomai Wang	34cbd4c76c	AsyncConnection: Avoid calling callback after delteing AsyncMessenger Now when calling mark_down/mark_down_all, it will dispatch a reset event. If we call Messenger::shutdown/wait, and it will let reset event called after Messenger dealloc. Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:11 +08:00
Haomai Wang	9a84a905fd	test_msgr: Add random usleep to Dispatcher impl Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:11 +08:00
Haomai Wang	e7db911489	AsyncMessenger: wait for dispatch event done In order to avoid deadlock like: 1. mark_down_all with holding lock 2. ms_dispatch_reset 3. get_connection want to get lock 4. deadlock We signal a workerpool barrier to wait for all in-queue events done. Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:11 +08:00
Haomai Wang	e84d1344fe	AsyncConnection: Add omissive STATE_WAIT state Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:11 +08:00
Haomai Wang	cb3e1bf40b	AsyncConnection: Adjust backoff wakeup granularity Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:10 +08:00
Haomai Wang	44a01894d9	AsyncConnection: using send_keepalive instead of _send_keepalive_or_ack Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:10 +08:00
Haomai Wang	a98b9e2f70	AsyncConnection: Fix mark_down race condition Previously, if caller want to mark_down one connection and caller is event thread callback, it will block for the wakeup. Meanwhile, the expected event thread which will signal the blocked thread may also want to mark_down connection which is own by already blocked thread. So deadlock is happen. As tradeoff, introduce lock to file_events which can avoid create/delete file_event callback. So we don't need wait for callback again. Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:10 +08:00
Haomai Wang	24fd12f48d	MessengerTest: Add markdown with caller lock tests Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:10 +08:00
Haomai Wang	abb4e68200	AsyncMessenger: Retry binding on addresses if binding fails Learn from commit(`2d4dca757e`) for SimpleMessenger: If binding on a IP-Address fails, delay and retry again. This happens mainly on IPv6 deployments. Due to DAD (Duplicate Address Detection) or SLAAC it can be that IPv6 is not yet available when the daemons start. Monitor daemons try to bind on a static IPv6 address and that might not be available yet and that causes the monitor not to start. Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:10 +08:00
Haomai Wang	0a7c331c49	AsyncMessenger: allow RESETSESSION whenever we forget an endpoint Learn from SimpleMessenger(`8cd1fdd7a7`) Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:10 +08:00
Haomai Wang	d93bdade3e	AsyncConnection: Using buffer read to avoid small read overhead Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:09 +08:00
Haomai Wang	8d2af2faee	AsyncMessenger: Using EventCenter instead of poll for bind Totally avoid extra thread in AsyncMessenger now. The bind socket will be regarded as a normal socket and will dispatch a random Worker thread to handle accept event. Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:09 +08:00
Haomai Wang	f4fcff16b6	AsyncMessenger: Bind async thread to special cpu core Now, 2-4 async op thread can fully meet a OSD's network demand with SSD backend. So we can bind limited thread to special cores, it can improve async event loop performance because most of structure and method will processed within thread. For example, ms_async_op_threads = 2 ms_async_affinity_cores = 0,3 Signed-off-by: Haomai Wang <haomaiwang@gmail.com>	2015-01-16 03:07:09 +08:00
xinxin shu	9db596974c	fix command 'ceph pg dump_stuck degraded' undersized not valid: undersized not in inactive\|unclean\|stale undersized not valid: undersized doesn't represent an int Invalid command: unused arguments: ['undersized'] pg dump_stuck {inactive\|unclean\|stale [inactive\|unclean\|stale...]} {<int>} : show information about stuck pgs Signed-off-by: xinxin shu <xinxin.shu@intel.com>	2015-01-16 01:33:07 +08:00
Joao Eduardo Luis	34081562a8	mon: Monitor: drop StoreConverter code We no longer convert stores on upgrade. Users coming from bobtail or before sould go through an interim version such as cuttlefish, dumpling, firefly or giant. Signed-off-by: Joao Eduardo Luis <joao@redhat.com>	2015-01-15 16:06:21 +00:00
Joao Eduardo Luis	1d814b76b8	ceph_mon: no longer attempt store conversion on start People upgrading from bobtail or previous clusters should first go through an interim version (quite a few to pick from: cuttlefish, dumpling, firefly, giant). Signed-off-by: Joao Eduardo Luis <joao@redhat.com>	2015-01-15 16:02:28 +00:00
Gregory Farnum	d4a64474e5	Merge pull request #3376 from dachary/wip-10547-formatter common: restore format fallback semantic Reviewed-by: Greg Farnum <gfarnum@redhat.com>	2015-01-15 07:11:17 -08:00
Joao Eduardo Luis	447d46991c	mon: Monitor: health to clog writes every X seconds on the second 3600 will mean every hour, on the hour; 60 will mean every minute, on the minute. This will allow the monitors to spit out the info at regular intervals, regardless the time at which they formed quorum or which monitor is now the leader. Signed-off-by: Joao Eduardo Luis <joao@redhat.com>	2015-01-15 14:58:36 +00:00
Joao Eduardo Luis	ae1032e2f0	mon: Monitor: cache 'summary' string to avoid dups on clog By caching the summary string we can avoid writing dups on clog. We will still write dups every 'mon_health_to_clog_interval', to make sure that we still output health status every now and then, but we increased the interval from 120 seconds to 3600 seconds -- once every hour unless the health status changes. Signed-off-by: Joao Eduardo Luis <joao@redhat.com>	2015-01-15 14:58:35 +00:00
Joao Eduardo Luis	fcd7aa00f5	mon: Monitor: reset health status cache on _reset() Signed-off-by: Joao Eduardo Luis <joao@redhat.com>	2015-01-15 14:58:35 +00:00
Joao Eduardo Luis	81a2faf359	mon: Monitor: write health status to clog every X seconds Instead of writing the health status only when a user action calls get_health(), have the monitor writing it every X seconds. Adds a new config option 'mon_health_to_clog_tick_interval' (default: 60 [seconds]), and changes the default value of 'mon_health_to_clog_interval' from 60 (seconds) to 120 (seconds). If 'mon_health_to_clog' is 'true' and 'mon_health_to_clog_tick_interval' is greater than 0.0, the monitor will now start a tick event when it wins an election (meaning, only the leader will write this info to clog). This tick will, by default, run every 60 seconds. It will call Monitor::get_health() to obtain current health summary and overall status. If overall status is the same as the cached status, then it will attempt to ignore it. The status will not be ignored if the last write to clog happened more than 'mon_health_to_clog_interval' seconds ago (default: 120). Signed-off-by: Joao Eduardo Luis <joao@redhat.com>	2015-01-15 14:58:35 +00:00
Joao Eduardo Luis	e2d66ae3cf	mon: Monitor: 'get_health()' returns overall health status Signed-off-by: Joao Eduardo Luis <joao@redhat.com>	2015-01-15 14:58:35 +00:00
Joao Eduardo Luis	7ce770d9c2	mon: Monitor: health summary to clog on get_health() Output health summary to clog on Monitor::get_health() (called during, e.g., 'ceph -s', 'ceph health' and alikes) if 'mon_health_to_clog' is true (default: false) and if last update is at least 'mon_health_to_clog_interval' old (default: 60.0 (seconds)). This patch is far from optimal for several reasons though: 1. health summary is still generated on-the-fly by the monitor each time Monitor::get_health() is called. 2. health summary will only be outputted to clog IF and WHEN Monitor::get_health() is called. 3. patch does not account for duplicate summaries. We may have the same string outputted every time Monitor::get_health() is called (as long as enough time passed since we last wrote to clog) 4. each monitor will output to clog independently from the other monitors. This means that running a 'ceph -s' 3 times in a row, on a cluster with at least 3 monitors, may result in writing the same string 3 times. 5. We reduce the amount of writes to clog by caching the last overall health status. We only write to clog if the overall status is different from the cached value OR enough time has passed since we last wrote to clog. This may result in ignoring new contributing factors to overall cluster health that by themselves do not change the overall status; and even though we will pick on them once enough time has passed, we may end up losing intermediate states (which may be good if they're transient, but not as awesome if they reflect some kind of instability). Fixes: #9440 (even if in a poor manner) Signed-off-by: Joao Eduardo Luis <joao@redhat.com>	2015-01-15 14:58:35 +00:00
John Spray	889969e21d	mon/MDSMonitor: make 'mds fail' idempotent for IDs Was returning ENOENT, should succeed for 'fail' on a non-existent name, as the fail operation makes it cease to exist. Signed-off-by: John Spray <john.spray@redhat.com>	2015-01-15 14:23:26 +00:00
Loic Dachary	b957fa8ecf	tests: adapt to new json-pretty format The json-pretty format was modified for readability and now includes additional newlines / spaces. Either switch to json to avoid dealing with space changes or modify the expected output to include them. http://tracker.ceph.com/issues/10547 Fixes: #10547 Signed-off-by: Loic Dachary <ldachary@redhat.com>	2015-01-15 13:29:32 +01:00
Loic Dachary	97609a3309	test: rename test_activate_osd It was incorrectly shadowing test_run_osd. Signed-off-by: Loic Dachary <ldachary@redhat.com>	2015-01-15 13:27:01 +01:00
Loic Dachary	8d8ce96b58	common: restore format fallback semantic When Formatter::create replaced new_formatter, the handling of an invalid format was also incorrectly changed. When an invalid format (for instance "plain") was specified, new_formatter returned a NULL pointer which was sometime handled by creating a json-pretty formatter and sometimes differently. A new Formatter::create prototype with a fallback argument is added and is used if it is not the empty string and that the format is not known. This prototype is used where new_formatter returning NULL was replaced by a json-pretty formatter. http://tracker.ceph.com/issues/10547 Fixes: #10547 Signed-off-by: Loic Dachary <ldachary@redhat.com>	2015-01-15 13:26:26 +01:00
Yan, Zheng	a6cb74702d	Merge pull request #3370 from ceph/wip-10382 mds: handle heartbeat_reset during shutdown	2015-01-15 19:54:55 +08:00
Loic Dachary	e9aeaf813e	mailmap: Loic Dachary name normalization Signed-off-by: Loic Dachary <ldachary@redhat.com>	2015-01-15 11:14:06 +01:00
Loic Dachary	d80ded9dc6	mailmap: David Zhang affiliation Signed-off-by: Loic Dachary <ldachary@redhat.com>	2015-01-15 11:14:01 +01:00
xinxin shu	d532f3ed2e	remove unused hold_map_lock in _open_lock_pg Signed-off-by: xinxin shu <xinxin.shu@intel.com>	2015-01-15 12:47:07 +08:00
Yunchuan Wen	9748655921	man: add help for rbd merge-diff command Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Li Wang <liwang@ubuntukylin.com>	2015-01-15 02:31:41 +00:00
Dan Mick	9542416890	Merge pull request #3366 from ceph/wip-formatter formatter: improve pretty output, rename factory method Reviewed-by: Dan Mick <dan.mick@redhat.com>	2015-01-14 15:25:25 -08:00
Loic Dachary	5b0e8aef67	mailmap: Yehuda Sadeh name normalization Signed-off-by: Loic Dachary <ldachary@redhat.com>	2015-01-15 00:12:07 +01:00
Sage Weil	3f03a7b2ee	doc/release-notes: v0.91 Signed-off-by: Sage Weil <sage@redhat.com>	2015-01-14 15:11:19 -08:00
Sage Weil	4ca69313e5	doc/release-notes: typo Signed-off-by: Sage Weil <sage@redhat.com>	2015-01-14 15:11:19 -08:00
Josh Durgin	e7cc6117ad	qa: ignore duplicates in rados ls These can happen with split or with state changes due to reordering results within the hash range requested. It's easy enough to filter them out at this stage. Backport: giant, firefly Signed-off-by: Josh Durgin <jdurgin@redhat.com>	2015-01-14 15:02:38 -08:00
Gregory Farnum	6fa29f6f19	Merge pull request #3372 from ceph/wip-10539 qa: fail_all_mds between fs reset and fs rm Reviewed-by: Greg Farnum <gfarnum@redhat.com>	2015-01-14 14:50:46 -08:00
John Spray	e5591f8a98	qa: fail_all_mds between fs reset and fs rm Because fs reset opens a brief window for the previously failed MDSs to spring back into life. Fixes: #10539 Signed-off-by: John Spray <john.spray@redhat.com>	2015-01-14 22:08:09 +00:00
Loic Dachary	26a2df2835	mailmap: Josh Durgin name normalization Signed-off-by: Loic Dachary <ldachary@redhat.com>	2015-01-14 23:00:32 +01:00
Sage Weil	d6a9d25cf1	doc/release-notes: v0.80.8 Signed-off-by: Sage Weil <sage@redhat.com>	2015-01-14 13:48:32 -08:00
Matt Benjamin	45e9cd5bd4	Fix make check blockers. Replace ceph-helpers.sh check for ms_nocrc with the new formula for this. Fixes make check for default build. Additionally, fix linkage of several unittests when building with --enable-xio. xio: add missing noinst headers The common/address_helper.h file was not mentioned, also msg/xio/XioSubmit.h. Fix for Message.cc compilation error when Xio disabled. Mention simple_dispatcher.h and xio_dispatcher.h in noinst_HEADERS. xio: require boost-regex. Make address_helper conditional on Xio. This carries over to simple_client/simple_server, for convenience. Signed-off-by: Matt Benjamin <matt@cohortfs.com>	2015-01-14 16:44:47 -05:00
Vu Pham	daefad7a4b	xio: enable accelio debug on level 2 Enable accelio debug (mostly on connection) on level 2 and sync with XioConnection debug events Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Matt Benjamin <matt@cohortfs.com>	2015-01-14 16:44:37 -05:00
Vu Pham	aa5f1955a8	xio: Get the right Accelio errno code Get the right Accelio errno code on xio_send_msg in order to correctly requeue or fail the xmsg Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Matt Benjamin <matt@cohortfs.com>	2015-01-14 16:44:30 -05:00

... 2 3 4 5 6 ...

38569 Commits