Commit Graph

51966 Commits

Author SHA1 Message Date
Sage Weil
82e9f94523 Merge pull request #8428 from liewegas/wip-rest-mds
ceph-rest-api: fix fs/flag/set
2016-04-06 16:10:53 -04:00
Sage Weil
ed31ad64a4 Merge pull request #7981 from liewegas/wip-14364
osdc/Objecter: fix narrow race with tid assignment

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-06 15:52:10 -04:00
Sage Weil
b3d27f8ca7 Merge pull request #8403 from dx9/wip-ceph-dencoder-esessions-fix
mds: Add cmapv to ESessions default constructor initializer list

Reviewed-by: Sage Weil <sage@redhat.com>
2016-04-06 15:51:40 -04:00
Sage Weil
683a46cd5a Merge pull request #8419 from adamemerson/wip-32bit-time
common: fix time_t cast in decode

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-04-06 15:49:22 -04:00
Sage Weil
78d4fada4b Merge pull request #8431 from liewegas/wip-bluestore
os/bluestore: revamp BlueFS bdev management and add perfcounters
2016-04-06 15:48:45 -04:00
Orit Wasserman
f64a9e3b42 Merge pull request #8445 from jmunhoz/fix-aws4-uri-encoding
rgw: aws4 uri encoding bugfix
Review-by: Orit Wasserman <owasserm@redhat.com>
2016-04-06 20:37:58 +02:00
Sage Weil
7b1ed5dd14 Merge pull request #8450 from javacruft/tasksmax-infinity
systemd: drop any systemd imposed process/thread limits

Reviewed-by: Sage Weil <sage@redhat.com>
2016-04-06 09:36:15 -04:00
Sage Weil
9414befb89 debian/rules: include ceph-mds-*.conf upstart files in ceph-mds
These were lost by a typo in 0cbe3dea69

Fixes: http://tracker.ceph.com/issues/15395
Signed-off-by: Sage Weil <sage@redhat.com>
2016-04-06 08:55:49 -04:00
Jason Dillaman
7c70281e00 Merge pull request #8459 from jdurgin/wip-rbd-op-threads
librbd: disallow unsafe rbd_op_threads values

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2016-04-06 07:46:53 -04:00
John Spray
1238bd8a27 Merge pull request #8455 from liewegas/wip-legacy-layout-zero
mds: fix file_layout_t legacy encoding snafu

Reviewed-by: John Spray <john.spray@redhat.com>
2016-04-06 11:41:24 +01:00
Sage Weil
25d80078e2 os/bluestore: use short, relative paths with bluefs
If we're using bluefs, only pass in the short relative
path (db, db.wal, db.slow).  The leading components
are ignored and only lead to errors if the configuration
provides relative paths that do not match (e.g., if one
if using ceph-objectstore-tool).

Fixes: http://tracker.ceph.com/issues/15376
Signed-off-by: Sage Weil <sage@redhat.com>
2016-04-05 21:26:06 -04:00
Josh Durgin
6c0ab75bce librbd: disallow unsafe rbd_op_threads values
Don't use this config option in librbd until
http://tracker.ceph.com/issues/15034 is avoided.

The option itself is still useful for mirroring threads, where
ordering is unimportant.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2016-04-05 15:32:42 -07:00
Sage Weil
048251b66c common/fs_types: dump pool_id signed
Signed-off-by: Sage Weil <sage@redhat.com>
2016-04-05 16:37:27 -04:00
Sage Weil
cd41ca2968 mds: fix legacy layout decode with pool 0
If you data pool was pool 0, this was transforming
that to -1 unconditionally, which broke upgrades.  We
only want do that for a fully zeroed ceph_file_layout,
so that it still maps to a file_layout_t.  If any fields
are set, though, we trust the fl_pgpool to be a valid
pool.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-04-05 16:37:25 -04:00
Orit Wasserman
0b09e477a5 Merge pull request #8447 from cbodley/wip-cmake-mrun
mrun: update path to cmake binaries
2016-04-05 21:42:48 +02:00
Kefu Chai
03bf796075 Merge pull request #8430 from wjin/fix
crush: fix error log

Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-04-06 00:37:41 +08:00
James Page
05cafcf19f Drop any systemd imposed process/thread limits
If systemd has task accounting enabled, a default of 512 tasks
will be applied to all systemd units.

For ceph, this is way to low even for a modest cluster, so stop
this restriction being applied and allow administrators to apply
limits using sysctl.

Signed-off-by: James Page <james.page@ubuntu.com>
2016-04-05 17:33:57 +01:00
Casey Bodley
02ab8a2203 mrun: update path to cmake binaries
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2016-04-05 11:20:15 -04:00
Sage Weil
67f8f1fdc0 os/bluestore/BlueFS: add some perfcounters
Most utilization-related.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-04-05 11:19:04 -04:00
Sage Weil
75ddd73c31 os/bluestore/BlueFS: revamp bdev ids
You cannot tell from the old bdev vector which device
was which.

- use a fixed id for each type/slot
- go from fast(small) to slow(big)
- normalize the allocation fallback to try any slower
  device.
- clean up the BlueStore instantiation/setup accordingly

Signed-off-by: Sage Weil <sage@redhat.com>
2016-04-05 11:19:04 -04:00
Javier M. Mellid
4f6523dc72 rgw: aws4 uri encoding bugfix
Fixes: http://tracker.ceph.com/issues/15358

Signed-off-by: Javier M. Mellid <jmunhoz@igalia.com>
2016-04-05 16:00:10 +02:00
Loic Dachary
24b924b30b Merge pull request #8131 from ErwanAliasr1/evelu-fast-check
tests: Improving 'make check' execution time

Reviewed-by: Loic Dachary <ldachary@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-04-05 15:52:52 +02:00
Kefu Chai
d2481280ee config: fix setuser_match_path typo
Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-04-05 08:53:25 -04:00
Sage Weil
c851e0b954 Merge pull request #8433 from ceph/wip-setuser-match-path
global/global_init: expand metavariables in setuser_match_path

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-04-05 08:53:04 -04:00
Orit Wasserman
51b6007520 Merge pull request #8438 from tchaikov/wip-fix-cmake
cmake: fix the build of test_rados_api_list
2016-04-05 11:40:44 +02:00
Erwan Velu
d5ec33fc18 tests: Removing one ceph-dencoder call in check-generated.sh
The first ceph-dencoder call is very unlikely to fail and represent a
bottleneck as the parallel computation are only done one this test is
completed.

The idea of this patch is to run immediately the 4 dencoders process in
parallel and check the resulting error code. If one fail then we report the
failure.

As the failure is very unlikely, that saves time and makes the code
simplier too.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:26 +02:00
Erwan Velu
4af1aa6a9d tests: Fixing python statement in ceph_objectstore_tool.py
As reported by Kefu, "if ++try == 150" doesn't do what we are
expecting. This is C-style coding which is invalid in Python.

So this patch is splitting the increment and the test.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
c5fa83f586 tests: Avoiding a fixed 10sec sleep in test_mon_cephdf_commands()
The current code was waiting 10s to expect the file being put.
If the file was put in a shorter time than 10s, the test just waits for
nothing reducing the execution speed of that test.

This patch simply check if the file is actually available every second
during 10sec to exit prematurely.

This patch saves exactly 10 sec on a local system, surely a little bit
less on an infra but still saves time.
Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
62bdde2cd2 tests: Optmizing sleep sequence in cephtool/test.sh
The actual code double the wait time between two calls leading to a
possible 511s of waiting time which sounds a little bit excessive.

This patch offer to reduce the global wait time to 300s and test more
often the rados status to exit the loop earlier. In a local test, that
saves 6 secs per run.
Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
8a49a86901 tests: Moving sleep call after action in ceph_watch_wait()
ceph_watch_wait() is doing a sleep _before_ doing the test which could
stop this loop.

It's better doing the action first as it could exit immediately and
avoid a useless sleep.

That's a minor optimization but everything count when trying to get
something smooth.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
1b7991e92e tests: Reducing sleep loops in ceph_objectstore_tool
This python script is making excessive sleep calls while running some
ceph commands.

Waiting up to 5 seconds to get the proper health status can be shorten
to avoid the worst case of waiting almost 5 seconds for nothing.

This patch removes also two sleeps calls after a wait_for_health call
which is already supposed to provides a clean state. Waiting
respectively 20 & 15 seconds after that call is just loosing time which
is precious at make check time.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
0d254d8916 tests: Reducing sleep time for osd to be up
OSDs are taking some time to be up but waiting 10 secs seems execessive
here between two loops. In the worst case, we can be in a situation of
waiting 10secs for nothing as we are just a few microsecs after the osd
is up.

This patch simply reduce the sleep from 10 to 1 seconds.
Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
0eea2436d9 tests: Optimizing kill_daemons() sleep time
It could sounds like nothing but the actual sleeping rampup is counter
productive.

The code does : kill <proc>; sleep 0; kill <proc>; sleep 0; kill <proc;
sleep 1; and then it grows up 120 seconds by a smooth rampup.

But actually there is almost no chance the process dies so fast meaning
that by default we switch to the sleep 1.

Moving from sleep 0 to sleep 1 doesn't seems a big win but as
kill_daemons() is called very often we can save a lot of time by then
end.

This patch offer to sleep first a 1/10th of second instead of 0 and then
1/20th of second instead of 0.

The sleep call is also moved after the kill call as it's not necessary
waiting before executing the command.

This patch makes the running time of a test like osd-scrub-repair.sh
dropping from 7m30 to 7m7.

Saving another ~30seconds is an interesting win at make check level.
Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
0dccb6c164 tests: Making "objectstore" calls parallel in osd-scrub-repair.sh
osd-scrub-repair is making several similar objectore calls in a
sequential way while they could be easily parallelized.

Each single objectore call can spent up to dozen of seconds so making
the call parallel is saving a lot of time while keeping the code pretty
simple.

This particular patch saves approx. 2 minutes on the actual code on a recent
laptop. The global running time of osd-scrub-repair drops from 9m33 to
7m37 !
Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
84197f1641 tests: Optimizing wait_for_clean()
wait_for_clean() is a very common call when running the make check.
It does wait the cluster to be stable before continuing.

This script was doing the same calls twice and could be optimized by
making the useful calls only once.

is_clean() function was checking num_pgs & get_num_active_clean()
The main loop itself was also calling get_num_active_clean()

This patch is inlining the is_clean() inside this loop to benefit from a
single get_num_active_clean() call. This avoid a useless call of (ceph +
xmlstarlet).

This patch does move all the 'timer reset' conditions into an else
avoiding spawning other ceph+xmlstarlet call while we already know we
should reset the timer.

The last modification is to reduce the sleeping time as the state of the
cluster is changing very fast.

This whole patch could looks like almost not a big win but for a test
like test/osd/osd-scrub-repair.sh, we drop from 9m56 to 9m30 while
reducing the number system calls.

At the scale of make check, that's a lot of saving.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
b3f7392d9d tests: Reducing commands in get_num_active_clean()
get_num_active_clean() is called very often but spawn 1 useless process.
The current "grep -v | wc -l" can be easily replaced by "grep -cv" which
do the same while spawning one process less.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
d8f07c3ff6 tests: Killing daemons in parallel
The current code of kill_daemons() was killing daemons one after the
other and wait it to actually die before switching to the next one.

This patch makes the kill_daemons() loop being run in parallel to avoid
this bottleneck.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
0ac3ac71ec tests: Adding parallelism to check-generated.sh
This script had the following performance issue :
- 4 ceph-dencoders spawn sequentialy
- running twice the same dencoder command

This patch is adding parallelism around the 4 sequential calls but also
prevent from testing the deterministic feature twice.

On a recent laptop, this patch drops the running time from 7mn to 3m46
while keeping the loadavg < 2.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
d66c852b46 tests: Adding parallelism for sequential ceph-dencoder calls
The current code was running sequentially two ceph-dencoder calls.
This process is executed pretty fast but adding sequentiality and by the number
of loops to execute, it have a cost.

This patch is just making this two calls being run in parallel.

As a result, the test/encoding/readable.sh test is running in 4m50 instead of 6.
The associate loadavg isn't impacted as it stays at 6 while being run with
nproc=8.

This patch save 1/6th of building time without impact the loadavg.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
8b6be11a36 tests: Adding parallelism to encoding/readable.sh
When running make -j x check, we face a weird situation where the makefile
targets are spawn in parallel up to "x" but one of those target is very very
long and sequential.

The "readable.sh" test is trying to run ~7.9K tests where 5.3K are actually
executed.

The current code is taking 23mn on a recent laptop (Intel(R) Core(TM)
i7-4810MQ CPU @ 2.80GHz, 32GB of RAM & SSD).

This patch implements parallelism to speed up this process which is not really CPU and
neither IO bound.

By default, readable.sh is now using the number of logical processors to determine
the level of parallelism (by using nproc). If needed, defining the MAX_PARALLEL_JOBS
variable will override this default value.

On the same system, where nproc=8, the resulting execution time is 5m55 seconds :
4x faster than the original code.

The global 'make check' is therefore getting faster too and dropped from 30 to
16 minutes : 2x faster than the original code.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Erwan Velu
db31cc6cbc tests: Adding parallelism helpers in ceph-helpers.sh
This commit introduce two new functions in ceph-helpers.sh to ease
parallelism in tests.

It's based on two functions : run_in_background() & wait_background()

The first one allow you to spawn processes or functions in background and saves
the associated pid in a variable passed as first argument.

The second one waits for thoses pids to complete and report their exit status.
If one or more failed then wait_background() reports a failure.

A typical usage looks like :

 pids1=""
 run_in_background pids1 bash -c 'sleep 5; exit 0'
 run_in_background pids1 bash -c 'sleep 1; exit 1'
 run_in_background pids1 my_bash_function
 wait_background pids1

The variable that contains pids is local making possible to do nested calls of
thoses two new functions.

Signed-off-by: Erwan Velu <erwan@redhat.com>
2016-04-05 09:36:25 +02:00
Kefu Chai
93ace63ff8 cmake: fix the build of test_rados_api_list
the libglobal linkage was added in 769c0af, so add it to cmake
accordingly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-04-05 14:22:03 +08:00
Josh Durgin
cf5d2777b8 Merge pull request #8435 from dillaman/wip-15370
test: TestMirroringWatcher test cases were not closing images

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-04 18:11:48 -07:00
Jason Dillaman
b7a5f8bba7 test: TestMirroringWatcher test cases were not closing images
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2016-04-04 18:03:59 -04:00
Sage Weil
8231208828 global/global_init: expand metavariables in setuser_match_path
Back in 8290536d7d we moved the
apply_changse (and, indirectly, config var expansion) to happen
after set do the drop privileges, but we need the metavar
expansion for setuser_match_path (which docs suggest setting to
/var/lib/ceph/$type/$cluster-$id).

Fixes: http://tracker.ceph.com/issues/15365
Signed-off-by: Sage Weil <sage@redhat.com>
2016-04-04 17:14:33 -04:00
Sage Weil
0f81ac5d87 Merge pull request #8378 from liewegas/wip-pgls-pgid
osdc/Objecter: use full pgid hash in PGNLS ops

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-04 16:53:51 -04:00
Samuel Just
72f18a26de Merge pull request #8069 from somnathr/wip-dyn-throttle-doc
Adding documentation on how to use new dynamic throttle scheme

Reviewed-by: Samuel Just <sjust@redhat.com>
2016-04-04 12:54:32 -07:00
Sage Weil
ec8318df70 Merge pull request #8429 from ErwanAliasr1/evelu-broken-cephtool-test-mon
tests: Fixing broken test/cephtool-test-mon.sh test

Reviewed-by: Loic Dachary <ldachary@redhat.com>
2016-04-04 15:48:10 -04:00
Orit Wasserman
9eca65f328 Merge pull request #8411 from theanalyst/rgw/unused-var
rgw_admin: remove unused parent_period arg
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
2016-04-04 20:22:43 +02:00
John Coyle
1a6c686125 mds: Add cmapv to ESessions default constructor initializer list
Fixes uninitialized values in cmapv which cause ceph-dencoder tests to fail.

Signed-off-by: John Coyle <dx9err@gmail.com>
2016-04-04 13:59:51 -04:00