Performance evaluations of medium to large size Ceph clusters have
demonstrated negligible performance impact from unnecessarily deep
directory hierarchies but significant performance impact from filestore
split and merge activity. Disable merges by default.
Fixes: http://tracker.ceph.com/issues/24686
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
* refs/pull/22725/head:
qa/workunits/suites/blogbench.sh: use correct dir name
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
This utilizes the recent feature in teuthology [1] to skip hidden files in
suites when building the job matrix.
Idea of this change is to enable referring to the top-level qa directory in a
position-independent way such that copies of a suite to another location do not
break any symlinks.
[1] https://github.com/ceph/teuthology/pull/1185
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mgr/dashboard: Add support for URI encode
Reviewed-by: Ricardo Dias <rdias@suse.com>
Reviewed-by: Ricardo Marques <rimarques@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
Created a decorator and pipe to help encode special URI components in the
frontend.
Modified the backend request handler to decode all the string args.
fixes: http://tracker.ceph.com/issues/24621
Signed-off-by: Tiago Melo <tmelo@suse.com>
If ulimit is set to a 1024 value, ceph-osd will segfault with the
following error :
filestore(td/smoke/0) error (24) Too many open files not handled on operation 0x55565d1fd004 (2182.1.0, or op 0, counting from 0)
This patch is about to insure that before setting up ceph daemons in tests, a valid ulimit value is setup.
Signed-off-by: Erwan Velu <erwan@redhat.com>
get_timeout_delays() is a generic function to compute delays for a long
period of time without saturating the CPU is busy loops.
It works pretty fine when the delay is short like having the following
series when requesting a 20seconds timeout : "0.1 0.2 0.4 0.8 1.6 3.2 6.4 7.3 ".
Here the maximum between two loops is 7.3 which is perfectly fine.
When the timeout reaches 300sec, the same code produces the following
series : "0.1 0.2 0.4 0.8 1.6 3.2 6.4 12.8 25.6 51.2 102.4 95.3 "
In such example there is delays which are nearly 2 minutes !
That is not efficient as the expected event, between two loops, could
arrive just after this long sleep occurs making a minute+ sleep for
nothing. On a local system that could be ok while on a CI, if all jobs
run like CI the overall is pretty unefficient by generating useless CPU
waits.
This patch is about adding a maximum acceptable delay time between two
loops while keeping the same rampup behavior.
On the same 300 seconds delay example, with MAX_TIMEOUT set to 10, we
now have the following series: "0.1 0.2 0.4 0.8 1.6 3.2 6.4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 7.3"
We can see that the long 12/25/51/102/95 values vanished and being
replaced by a series of 10 seconds. It's up to every test defining the
probability of having a soonish event to complete.
The MAX_TIMEOUT is set to 15seconds.
Signed-off-by: Erwan Velu <erwan@redhat.com>
Fixes: http://tracker.ceph.com/issues/24436
To fully support the role based authentication/authorization system it is necessary to replace the RGW proxy controller by separate controllers for RGW user and bucket.
Signed-off-by: Volker Theile <vtheile@suse.com>
Avoid sporadic failures in combination with msgr-failures/many.yaml,
where assert_locked() might take over 10 seconds.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* refs/pull/22596/head:
os/bluestore: use vector instead of set for zombies
os/bluestore: reuse zombie OpSequencers by collection id
qa/suites/rados/objecstore/backends/objectstore: capture coredumps
os/bluestore: more debug output
os/bluestore: print cnode from _open_collections
os/bluestore: print cnode on fsck
qa/suites/rados/objecstore: preserve data dir for ceph_test_objecstore
Reviewed-by: Igor Fedotov <ifedotov@suse.com>