* since the eio tests crashes some of the OSD nodes, before the
change, the tests try to undo the crash before moving on, so it
won't interfere with following tests. a more robust/clean way to
do this is to isolate individual tests in a sandbox, so each eio
test will have its own:
setup + inject + verify crash + teardown
cycle. this change helps to remove the cleanup/undo steps in
invidual test.
* update the disabled tests accordingly.
* use a minimum set of OSDs and R-S(2,1) for the testing to speed
up the test.
* add the new testsuite to check_SCRIPTS
Fixes: #11693
Signed-off-by: Kefu Chai <kchai@redhat.com>
* the "daemon" parameter was not respected.
* update the test_get_config() to check the overrided option instead of
the default one.
* add set_config()
Signed-off-by: Kefu Chai <kchai@redhat.com>
man pages have to be preprocessed now, and can't be installed directly.
skip installing them until we add the cmake-fu to copy what man/Makefile.am
is doing
Signed-off-by: Casey Bodley <casey@cohortfs.com>
If a file has been deleted with a loopback device attached, then the
`losetup --all` output will carry:
/dev/loopX: [0032]:344213 (/.../src/test-ceph-disk/vdf.disk (deleted))
This causes the losetup parsing in reset_leftover_dev() to throw an
error, e.g.:
rreset_leftover_dev: 430: test
'(/home/ddiss/ceph/src/test-ceph-disk/vdf.disk' '(deleted))' =
'(/home/ddiss/ceph/src/test-ceph-disk/vdf.disk)'
test/ceph-disk.sh: line 430: test: too many arguments
Fix this by quoting the path variable for the string comparison.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Run cephtool-test-{mon,osd,mds}.sh with CEPH_CLI_TEST_DUP_COMMAND=1 to
detect idempotency related problems during make check. This is how
ceph-qa-suite/tasks/workunit.py will run
suites/rados/singleton/all/cephtool.yaml and it's easier to fix when
make check fails rather than later on when a fully populated rados suite
has one failed job.
http://tracker.ceph.com/issues/11618 Refs: #11618
Signed-off-by: Loic Dachary <ldachary@redhat.com>
When CEPH_CLI_TEST_DUP_COMMAND=1 is set, ceph osd create will consume
two osd id and return the later. Fix the test to account for that and
not assume the osd id being allocated by osd create is always the
next available osd id.
The other osd create tests do not suffer from the same variation because
they provide a UUID argument that guarantees the same osd id is going to
be returned every time.
http://tracker.ceph.com/issues/11618Fixes: #11618
Signed-off-by: Loic Dachary <ldachary@redhdat.com>
On copy objects, when bucket source is the same as the destination, use attrs
from source bucket.
Fixes: #11639
Signed-off-by: Javier M. Mellid <jmunhoz@igalia.com>
Prior to this commit, the Network Configuration Reference guide and
Troubleshooting guide recommended opening a number of ports that were
unique to the number of daemons that we ran.
This doesn't really cover all use cases. Users can easily restart
daemons in ways that cause the daemons to bind to higher ports. This
leads to OSDs or MDSs binding to ports that are firewalled.
Update the Network Configuration Reference guide and Troubleshooting
guides to simply recommend that users open all the ports between 6800
and 7300 on their OSDs and MDSs.
http://tracker.ceph.com/issues/11688 Refs: #11688
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
The upper limit for OSD/MDS ports changed from 7100 to 7300 in commit
f9ec5a7945. Update the Quick Start
Preflight documentation to reflect this change.
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
The xio_msg pointers to be freed in XioPortal::release_xio_rsp() are no
longer valid after a call to xio_connection_destroy(). We were already
avoiding the call to xio_release_msg() in this case, but were still
dereferencing the xio_msg for its user_context pointer. Moved the check
for is_connected() outside of the loop to avoid any access to msg.
Suggested-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Casey Bodley <casey@cohortfs.com>
accelio is using rdtsc to generate xio_msg.timestamp, which can't be
reliably converted to a timeval. now uses ceph_clock_now() to assign
the Message::recv_stamp and recv_complete_stamp
Signed-off-by: Casey Bodley <casey@cohortfs.com>
A missing nonce in the osd addrs was preventing the monitor from
detecting osd restarts. XioMessenger::bind() now sets the nonce in the
same way that SimpleMessenger and AsyncMessenger do
Signed-off-by: Casey Bodley <casey@cohortfs.com>
Signed-off-by: Vu Pham <vu@mellanox.com>
Better way to assign connections to a specific lane of a portal
Avoiding lane competition/hogging.
This change resolves the slow ramping up and spiky behaviors during
clients starting/running I/Os.
Signed-off-by: Vu Pham <vu@mellanox.com>
Prior to this commit, if a user installed the "ceph-common" Debian
package without installing "ceph", then /usr/bin/ceph would crash
because it was missing the ceph_argparse library.
Ship the ceph_argparse library in "ceph-common" instead of "ceph". (This
was the intention of the original commit that moved argparse to "ceph",
2a23eac549)
http://tracker.ceph.com/issues/11388 Refs: #11388
Reported-by: Jens Rosenboom <j.rosenboom@x-ion.de>
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
When doing seq and rand read benchmarks using rados bench, a quite large
portion of cpu time is consumed by doing object verification. This patch
adds an option to disable this verification when it's not needed, in turn
giving better cluster utilization. rados -p storage bench 600 rand scores
without --no-verification:
Total time run: 600.228901
Total reads made: 144982
Read size: 4194304
Bandwidth (MB/sec): 966
Average IOPS: 241
Stddev IOPS: 38
Max IOPS: 909522486
Min IOPS: 0
Average Latency: 0.0662
Max latency: 1.51
Min latency: 0.004
real 10m1.173s
user 5m41.162s
sys 11m42.961s
Same command, but with --no-verify:
Total time run: 600.161379
Total reads made: 174142
Read size: 4194304
Bandwidth (MB/sec): 1.16e+03
Average IOPS: 290
Stddev IOPS: 20
Max IOPS: 909522486
Min IOPS: 0
Average Latency: 0.0551
Max latency: 1.12
Min latency: 0.00343
real 10m1.172s
user 4m13.792s
sys 13m38.556s
Note the decreased latencies, increased bandwidth and more reads performed.
Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
- to avoid the scrub wave when the osd_scrub_max_interval reaches in a
high-load OSD, the scrub time is randomized.
- extract scrub_load_below_threshold() out of scrub_should_schedule()
- schedule an automatic scrub job at a time which is uniformly distributed
over [now+osd_scrub_min_interval,
now+osd_scrub_min_interval*(1+osd_scrub_time_limit]. before
this change this sort of scrubs will be performed once the hard interval
is end or system load is below the threshold, but with this change, the
jobs will be performed as long as the load is low or the interval of
the scheduled scrubs is longer than conf.osd_scrub_max_interval. all
automatic jobs should be performed in the configured time period, otherwise
they are postponed.
- the requested scrub job will be scheduled right away, before this change
it is queued with the timestamp of `now` and postponed after
osd_scrub_min_interval.
Fixes: #10973
Signed-off-by: Kefu Chai <kchai@redhat.com>