RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-25 04:43:17 +00:00

Author	SHA1	Message	Date
John Spray	0f0964263c	mds: tidy up backtrace pool handling Include the pool ID in the log message when we get a write error (and don't say "dir ino" because this is also the path for files) Move the duplicated logic for picking the pool out into get_backtrace_pool() In get_backtrace_pool(), assert that files do indeed have a pool set. Signed-off-by: John Spray <john.spray@redhat.com>	2016-04-06 13:05:17 +01:00
John Spray	4ddcf415ef	mds: health metric for being read only This is the state we get after an OSD write error, so it's definitely something we want to tell the user about in ceph status. Signed-off-by: John Spray <john.spray@redhat.com>	2016-04-06 13:05:16 +01:00
Jason Dillaman	7c70281e00	Merge pull request #8459 from jdurgin/wip-rbd-op-threads librbd: disallow unsafe rbd_op_threads values Reviewed-by: Jason Dillaman <dillaman@redhat.com>	2016-04-06 07:46:53 -04:00
John Spray	1238bd8a27	Merge pull request #8455 from liewegas/wip-legacy-layout-zero mds: fix file_layout_t legacy encoding snafu Reviewed-by: John Spray <john.spray@redhat.com>	2016-04-06 11:41:24 +01:00
xie xingguo	79b19a6ca5	osd: cancel scrub if noscrub is set for pool or all The sched_scrub() method can be called by various code path, such as OSD::tick() or triggered by a scrub_reserve_reply message. The sched_scrub() will check whether or not the noscrub is globally set or set for a specified pool before really starting to schedule a scrub job. However, if we set noscrub flag for a specified pool, there are other pools for which scrub are still legal and thus shall be granted. The problem here is that we may stopping a pg's scrub in an intermidate stage due to setting of the corresponding pool's noscrub flag whithout releasing the reservation. Which as a result shall prevent other pgs of a different pool from going scrubbing because we have already hit the reservation limit. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>	2016-04-06 13:36:58 +08:00
xie xingguo	4d3aef75ec	osd: reset tp handle when search for boundary of chunky-scrub One of our tests in our local testbed shows that if the number of snapshots become extremely huge, the process of chunky-scrub() may encouter heart-beat failure. This is because it takes a real long time for the procedure to traverse and determine the boundary for a single run of chunk scrub under this case. This pr tries to solve the above the problem by resetting the tp handle passed in once in a while(after a certain number of loops, 64 by default) since the search can become very time-consumptive. Furthermore, the BUILD_MAP stage later on shall encouter the same problem but has already got fixed in the same way. Therefore, although the test case is rare, but this change is defensive and make our code strong and thus shall be considered as worthwhile. Fixes: tracker.ceph.com/issues/12892 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>	2016-04-06 11:07:38 +08:00
Sage Weil	25d80078e2	os/bluestore: use short, relative paths with bluefs If we're using bluefs, only pass in the short relative path (db, db.wal, db.slow). The leading components are ignored and only lead to errors if the configuration provides relative paths that do not match (e.g., if one if using ceph-objectstore-tool). Fixes: http://tracker.ceph.com/issues/15376 Signed-off-by: Sage Weil <sage@redhat.com>	2016-04-05 21:26:06 -04:00
Jenkins Build Slave User	ce50389b77	10.1.1	2016-04-06 00:45:19 +00:00
Josh Durgin	6c0ab75bce	librbd: disallow unsafe rbd_op_threads values Don't use this config option in librbd until http://tracker.ceph.com/issues/15034 is avoided. The option itself is still useful for mirroring threads, where ordering is unimportant. Signed-off-by: Josh Durgin <jdurgin@redhat.com>	2016-04-05 15:32:42 -07:00
Matt Benjamin	bb4c2cacb2	librgw/rgw_file: correctly handle object permissions Implement the full object permission model for librgw (aka, NFS and similar) operations. Fixes DIRS1 unit tests. Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>	2016-04-05 18:30:32 -04:00
Matt Benjamin	6851822afe	rgw_file: print DIRS1 read parameters at verbose Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>	2016-04-05 18:30:26 -04:00
Matt Benjamin	d84f55f3fe	rgw_file: fix attributes for "special" test cases If a caller does an atomic create using rgw_lookup() and RGW_LOOKUP_FLAG_CREATE, it needs to fix up the attributes using create_stat(). For use outside of test cases, it probably needs an interlock also, but for now, do just enough to satisfy existing attribute checks. Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>	2016-04-05 18:30:20 -04:00
Matt Benjamin	1bd1ffda8d	rgw_file unit tests: validate Unix owners in DIRS1 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>	2016-04-05 18:30:14 -04:00
Robin H. Johnson	8e2c804a3a	authtool: update --help and manpage to match code. Signed-off-by: Robin H. Johnson <robin.johnson@dreamhost.com>	2016-04-05 22:10:13 +00:00
Robin H. Johnson	dffd867285	build: Respect TMPDIR for virtualenv. Gentoo's normal build process uses a sandbox to catch writes outside the build environment; this includes providing a value other than /tmp for TMPDIR. Use TMPDIR by default for CEPH_BUILD_VIRTUALENV. Signed-off-by: Robin H. Johnson <robin.johnson@dreamhost.com>	2016-04-05 21:28:41 +00:00
Sage Weil	048251b66c	common/fs_types: dump pool_id signed Signed-off-by: Sage Weil <sage@redhat.com>	2016-04-05 16:37:27 -04:00
Sage Weil	cd41ca2968	mds: fix legacy layout decode with pool 0 If you data pool was pool 0, this was transforming that to -1 unconditionally, which broke upgrades. We only want do that for a fully zeroed ceph_file_layout, so that it still maps to a file_layout_t. If any fields are set, though, we trust the fl_pgpool to be a valid pool. Signed-off-by: Sage Weil <sage@redhat.com>	2016-04-05 16:37:25 -04:00
Orit Wasserman	0b09e477a5	Merge pull request #8447 from cbodley/wip-cmake-mrun mrun: update path to cmake binaries	2016-04-05 21:42:48 +02:00
Casey Bodley	b8e045844f	rgw: retry read_log_info() while master is down Signed-off-by: Casey Bodley <cbodley@redhat.com>	2016-04-05 14:07:22 -04:00
Kefu Chai	03bf796075	Merge pull request #8430 from wjin/fix crush: fix error log Reviewed-by: Kefu Chai <kchai@redhat.com>	2016-04-06 00:37:41 +08:00
James Page	05cafcf19f	Drop any systemd imposed process/thread limits If systemd has task accounting enabled, a default of 512 tasks will be applied to all systemd units. For ceph, this is way to low even for a modest cluster, so stop this restriction being applied and allow administrators to apply limits using sysctl. Signed-off-by: James Page <james.page@ubuntu.com>	2016-04-05 17:33:57 +01:00
Casey Bodley	02ab8a2203	mrun: update path to cmake binaries Signed-off-by: Casey Bodley <cbodley@redhat.com>	2016-04-05 11:20:15 -04:00
Sage Weil	67f8f1fdc0	os/bluestore/BlueFS: add some perfcounters Most utilization-related. Signed-off-by: Sage Weil <sage@redhat.com>	2016-04-05 11:19:04 -04:00
Sage Weil	75ddd73c31	os/bluestore/BlueFS: revamp bdev ids You cannot tell from the old bdev vector which device was which. - use a fixed id for each type/slot - go from fast(small) to slow(big) - normalize the allocation fallback to try any slower device. - clean up the BlueStore instantiation/setup accordingly Signed-off-by: Sage Weil <sage@redhat.com>	2016-04-05 11:19:04 -04:00
Sage Weil	a5564a664c	os/ObjectStore: make device uuid probe output something friendly Otherwise, all you see is errors about the probes that failed (e.g., a failure to decode a non-bluestore superblock as bluestore). Signed-off-by: Sage Weil <sage@redhat.com>	2016-04-05 11:10:54 -04:00
Javier M. Mellid	4f6523dc72	rgw: aws4 uri encoding bugfix Fixes: http://tracker.ceph.com/issues/15358 Signed-off-by: Javier M. Mellid <jmunhoz@igalia.com>	2016-04-05 16:00:10 +02:00
Sage Weil	bc9607b800	mon/OSDMonitor: fix off-by-one for osd_map_message_max For most messages we were sending osd_map_message_max + 1 maps. Signed-off-by: Sage Weil <sage@redhat.com>	2016-04-05 09:58:02 -04:00
Loic Dachary	24b924b30b	Merge pull request #8131 from ErwanAliasr1/evelu-fast-check tests: Improving 'make check' execution time Reviewed-by: Loic Dachary <ldachary@redhat.com> Reviewed-by: Kefu Chai <kchai@redhat.com>	2016-04-05 15:52:52 +02:00
Sage Weil	81cc288931	osd: improve full map requests If we don't get all the full maps we want, request more immediately. Signed-off-by: Sage Weil <sage@redhat.com>	2016-04-05 09:45:33 -04:00
Sage Weil	2e22f54c5f	osd: create rerequest_full_maps() helper Signed-off-by: Sage Weil <sage@redhat.com>	2016-04-05 09:45:04 -04:00
Yan, Zheng	961a46f57e	client: fix pool permisson check handle pool namespace Signed-off-by: Yan, Zheng <zyan@redhat.com>	2016-04-05 21:21:44 +08:00
Kefu Chai	d2481280ee	config: fix setuser_match_path typo Signed-off-by: Kefu Chai <kchai@redhat.com>	2016-04-05 08:53:25 -04:00
Sage Weil	c851e0b954	Merge pull request #8433 from ceph/wip-setuser-match-path global/global_init: expand metavariables in setuser_match_path Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Kefu Chai <kchai@redhat.com>	2016-04-05 08:53:04 -04:00
Orit Wasserman	51b6007520	Merge pull request #8438 from tchaikov/wip-fix-cmake cmake: fix the build of test_rados_api_list	2016-04-05 11:40:44 +02:00
Erwan Velu	d5ec33fc18	tests: Removing one ceph-dencoder call in check-generated.sh The first ceph-dencoder call is very unlikely to fail and represent a bottleneck as the parallel computation are only done one this test is completed. The idea of this patch is to run immediately the 4 dencoders process in parallel and check the resulting error code. If one fail then we report the failure. As the failure is very unlikely, that saves time and makes the code simplier too. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:26 +02:00
Erwan Velu	4af1aa6a9d	tests: Fixing python statement in ceph_objectstore_tool.py As reported by Kefu, "if ++try == 150" doesn't do what we are expecting. This is C-style coding which is invalid in Python. So this patch is splitting the increment and the test. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	c5fa83f586	tests: Avoiding a fixed 10sec sleep in test_mon_cephdf_commands() The current code was waiting 10s to expect the file being put. If the file was put in a shorter time than 10s, the test just waits for nothing reducing the execution speed of that test. This patch simply check if the file is actually available every second during 10sec to exit prematurely. This patch saves exactly 10 sec on a local system, surely a little bit less on an infra but still saves time. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	62bdde2cd2	tests: Optmizing sleep sequence in cephtool/test.sh The actual code double the wait time between two calls leading to a possible 511s of waiting time which sounds a little bit excessive. This patch offer to reduce the global wait time to 300s and test more often the rados status to exit the loop earlier. In a local test, that saves 6 secs per run. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	8a49a86901	tests: Moving sleep call after action in ceph_watch_wait() ceph_watch_wait() is doing a sleep _before_ doing the test which could stop this loop. It's better doing the action first as it could exit immediately and avoid a useless sleep. That's a minor optimization but everything count when trying to get something smooth. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	1b7991e92e	tests: Reducing sleep loops in ceph_objectstore_tool This python script is making excessive sleep calls while running some ceph commands. Waiting up to 5 seconds to get the proper health status can be shorten to avoid the worst case of waiting almost 5 seconds for nothing. This patch removes also two sleeps calls after a wait_for_health call which is already supposed to provides a clean state. Waiting respectively 20 & 15 seconds after that call is just loosing time which is precious at make check time. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	0d254d8916	tests: Reducing sleep time for osd to be up OSDs are taking some time to be up but waiting 10 secs seems execessive here between two loops. In the worst case, we can be in a situation of waiting 10secs for nothing as we are just a few microsecs after the osd is up. This patch simply reduce the sleep from 10 to 1 seconds. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	0eea2436d9	tests: Optimizing kill_daemons() sleep time It could sounds like nothing but the actual sleeping rampup is counter productive. The code does : kill <proc>; sleep 0; kill <proc>; sleep 0; kill <proc; sleep 1; and then it grows up 120 seconds by a smooth rampup. But actually there is almost no chance the process dies so fast meaning that by default we switch to the sleep 1. Moving from sleep 0 to sleep 1 doesn't seems a big win but as kill_daemons() is called very often we can save a lot of time by then end. This patch offer to sleep first a 1/10th of second instead of 0 and then 1/20th of second instead of 0. The sleep call is also moved after the kill call as it's not necessary waiting before executing the command. This patch makes the running time of a test like osd-scrub-repair.sh dropping from 7m30 to 7m7. Saving another ~30seconds is an interesting win at make check level. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	0dccb6c164	tests: Making "objectstore" calls parallel in osd-scrub-repair.sh osd-scrub-repair is making several similar objectore calls in a sequential way while they could be easily parallelized. Each single objectore call can spent up to dozen of seconds so making the call parallel is saving a lot of time while keeping the code pretty simple. This particular patch saves approx. 2 minutes on the actual code on a recent laptop. The global running time of osd-scrub-repair drops from 9m33 to 7m37 ! Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	84197f1641	tests: Optimizing wait_for_clean() wait_for_clean() is a very common call when running the make check. It does wait the cluster to be stable before continuing. This script was doing the same calls twice and could be optimized by making the useful calls only once. is_clean() function was checking num_pgs & get_num_active_clean() The main loop itself was also calling get_num_active_clean() This patch is inlining the is_clean() inside this loop to benefit from a single get_num_active_clean() call. This avoid a useless call of (ceph + xmlstarlet). This patch does move all the 'timer reset' conditions into an else avoiding spawning other ceph+xmlstarlet call while we already know we should reset the timer. The last modification is to reduce the sleeping time as the state of the cluster is changing very fast. This whole patch could looks like almost not a big win but for a test like test/osd/osd-scrub-repair.sh, we drop from 9m56 to 9m30 while reducing the number system calls. At the scale of make check, that's a lot of saving. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	b3f7392d9d	tests: Reducing commands in get_num_active_clean() get_num_active_clean() is called very often but spawn 1 useless process. The current "grep -v \| wc -l" can be easily replaced by "grep -cv" which do the same while spawning one process less. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	d8f07c3ff6	tests: Killing daemons in parallel The current code of kill_daemons() was killing daemons one after the other and wait it to actually die before switching to the next one. This patch makes the kill_daemons() loop being run in parallel to avoid this bottleneck. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	0ac3ac71ec	tests: Adding parallelism to check-generated.sh This script had the following performance issue : - 4 ceph-dencoders spawn sequentialy - running twice the same dencoder command This patch is adding parallelism around the 4 sequential calls but also prevent from testing the deterministic feature twice. On a recent laptop, this patch drops the running time from 7mn to 3m46 while keeping the loadavg < 2. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	d66c852b46	tests: Adding parallelism for sequential ceph-dencoder calls The current code was running sequentially two ceph-dencoder calls. This process is executed pretty fast but adding sequentiality and by the number of loops to execute, it have a cost. This patch is just making this two calls being run in parallel. As a result, the test/encoding/readable.sh test is running in 4m50 instead of 6. The associate loadavg isn't impacted as it stays at 6 while being run with nproc=8. This patch save 1/6th of building time without impact the loadavg. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	8b6be11a36	tests: Adding parallelism to encoding/readable.sh When running make -j x check, we face a weird situation where the makefile targets are spawn in parallel up to "x" but one of those target is very very long and sequential. The "readable.sh" test is trying to run ~7.9K tests where 5.3K are actually executed. The current code is taking 23mn on a recent laptop (Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz, 32GB of RAM & SSD). This patch implements parallelism to speed up this process which is not really CPU and neither IO bound. By default, readable.sh is now using the number of logical processors to determine the level of parallelism (by using nproc). If needed, defining the MAX_PARALLEL_JOBS variable will override this default value. On the same system, where nproc=8, the resulting execution time is 5m55 seconds : 4x faster than the original code. The global 'make check' is therefore getting faster too and dropped from 30 to 16 minutes : 2x faster than the original code. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00
Erwan Velu	db31cc6cbc	tests: Adding parallelism helpers in ceph-helpers.sh This commit introduce two new functions in ceph-helpers.sh to ease parallelism in tests. It's based on two functions : run_in_background() & wait_background() The first one allow you to spawn processes or functions in background and saves the associated pid in a variable passed as first argument. The second one waits for thoses pids to complete and report their exit status. If one or more failed then wait_background() reports a failure. A typical usage looks like : pids1="" run_in_background pids1 bash -c 'sleep 5; exit 0' run_in_background pids1 bash -c 'sleep 1; exit 1' run_in_background pids1 my_bash_function wait_background pids1 The variable that contains pids is local making possible to do nested calls of thoses two new functions. Signed-off-by: Erwan Velu <erwan@redhat.com>	2016-04-05 09:36:25 +02:00

... 4 5 6 7 8 ...

52238 Commits