Previously even directories were getting added to the
list of inodes to check in rejoin. This was a (small)
waste of time, with these dir inodes getting passed
all the way down into check_inode_max_size for no purpose.
Signed-off-by: John Spray <john.spray@redhat.com>
Include the pool ID in the log message when
we get a write error (and don't say "dir ino"
because this is also the path for files)
Move the duplicated logic for picking the pool out
into get_backtrace_pool()
In get_backtrace_pool(), assert that files do indeed
have a pool set.
Signed-off-by: John Spray <john.spray@redhat.com>
This is the state we get after an OSD write
error, so it's definitely something we want
to tell the user about in ceph status.
Signed-off-by: John Spray <john.spray@redhat.com>
The sched_scrub() method can be called by various code path, such as
OSD::tick() or triggered by a scrub_reserve_reply message.
The sched_scrub() will check whether or not the noscrub is globally set or
set for a specified pool before really starting to schedule a scrub job.
However, if we set noscrub flag for a specified pool, there are other pools
for which scrub are still legal and thus shall be granted.
The problem here is that we may stopping a pg's scrub in an intermidate stage
due to setting of the corresponding pool's noscrub flag whithout releasing
the reservation. Which as a result shall prevent other pgs of a different
pool from going scrubbing because we have already hit the reservation limit.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
One of our tests in our local testbed shows that if the number of snapshots
become extremely huge, the process of chunky-scrub() may encouter
heart-beat failure. This is because it takes a real long time for the
procedure to traverse and determine the boundary for a single run of
chunk scrub under this case.
This pr tries to solve the above the problem by resetting the tp handle
passed in once in a while(after a certain number of loops, 64 by default)
since the search can become very time-consumptive. Furthermore, the
BUILD_MAP stage later on shall encouter the same problem but has already
got fixed in the same way. Therefore, although the test case is rare,
but this change is defensive and make our code strong and thus shall be
considered as worthwhile.
Fixes: tracker.ceph.com/issues/12892
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
If we're using bluefs, only pass in the short relative
path (db, db.wal, db.slow). The leading components
are ignored and only lead to errors if the configuration
provides relative paths that do not match (e.g., if one
if using ceph-objectstore-tool).
Fixes: http://tracker.ceph.com/issues/15376
Signed-off-by: Sage Weil <sage@redhat.com>
Don't use this config option in librbd until
http://tracker.ceph.com/issues/15034 is avoided.
The option itself is still useful for mirroring threads, where
ordering is unimportant.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Implement the full object permission model for librgw (aka, NFS
and similar) operations.
Fixes DIRS1 unit tests.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
If a caller does an atomic create using rgw_lookup() and
RGW_LOOKUP_FLAG_CREATE, it needs to fix up the attributes using
create_stat().
For use outside of test cases, it probably needs an interlock also,
but for now, do just enough to satisfy existing attribute checks.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Gentoo's normal build process uses a sandbox to catch writes outside the
build environment; this includes providing a value other than /tmp for
TMPDIR. Use TMPDIR by default for CEPH_BUILD_VIRTUALENV.
Signed-off-by: Robin H. Johnson <robin.johnson@dreamhost.com>
If you data pool was pool 0, this was transforming
that to -1 unconditionally, which broke upgrades. We
only want do that for a fully zeroed ceph_file_layout,
so that it still maps to a file_layout_t. If any fields
are set, though, we trust the fl_pgpool to be a valid
pool.
Signed-off-by: Sage Weil <sage@redhat.com>
If systemd has task accounting enabled, a default of 512 tasks
will be applied to all systemd units.
For ceph, this is way to low even for a modest cluster, so stop
this restriction being applied and allow administrators to apply
limits using sysctl.
Signed-off-by: James Page <james.page@ubuntu.com>
You cannot tell from the old bdev vector which device
was which.
- use a fixed id for each type/slot
- go from fast(small) to slow(big)
- normalize the allocation fallback to try any slower
device.
- clean up the BlueStore instantiation/setup accordingly
Signed-off-by: Sage Weil <sage@redhat.com>
Otherwise, all you see is errors about the probes that failed (e.g., a
failure to decode a non-bluestore superblock as bluestore).
Signed-off-by: Sage Weil <sage@redhat.com>
The first ceph-dencoder call is very unlikely to fail and represent a
bottleneck as the parallel computation are only done one this test is
completed.
The idea of this patch is to run immediately the 4 dencoders process in
parallel and check the resulting error code. If one fail then we report the
failure.
As the failure is very unlikely, that saves time and makes the code
simplier too.
Signed-off-by: Erwan Velu <erwan@redhat.com>
As reported by Kefu, "if ++try == 150" doesn't do what we are
expecting. This is C-style coding which is invalid in Python.
So this patch is splitting the increment and the test.
Signed-off-by: Erwan Velu <erwan@redhat.com>
The current code was waiting 10s to expect the file being put.
If the file was put in a shorter time than 10s, the test just waits for
nothing reducing the execution speed of that test.
This patch simply check if the file is actually available every second
during 10sec to exit prematurely.
This patch saves exactly 10 sec on a local system, surely a little bit
less on an infra but still saves time.
Signed-off-by: Erwan Velu <erwan@redhat.com>
The actual code double the wait time between two calls leading to a
possible 511s of waiting time which sounds a little bit excessive.
This patch offer to reduce the global wait time to 300s and test more
often the rados status to exit the loop earlier. In a local test, that
saves 6 secs per run.
Signed-off-by: Erwan Velu <erwan@redhat.com>
ceph_watch_wait() is doing a sleep _before_ doing the test which could
stop this loop.
It's better doing the action first as it could exit immediately and
avoid a useless sleep.
That's a minor optimization but everything count when trying to get
something smooth.
Signed-off-by: Erwan Velu <erwan@redhat.com>
This python script is making excessive sleep calls while running some
ceph commands.
Waiting up to 5 seconds to get the proper health status can be shorten
to avoid the worst case of waiting almost 5 seconds for nothing.
This patch removes also two sleeps calls after a wait_for_health call
which is already supposed to provides a clean state. Waiting
respectively 20 & 15 seconds after that call is just loosing time which
is precious at make check time.
Signed-off-by: Erwan Velu <erwan@redhat.com>
OSDs are taking some time to be up but waiting 10 secs seems execessive
here between two loops. In the worst case, we can be in a situation of
waiting 10secs for nothing as we are just a few microsecs after the osd
is up.
This patch simply reduce the sleep from 10 to 1 seconds.
Signed-off-by: Erwan Velu <erwan@redhat.com>
It could sounds like nothing but the actual sleeping rampup is counter
productive.
The code does : kill <proc>; sleep 0; kill <proc>; sleep 0; kill <proc;
sleep 1; and then it grows up 120 seconds by a smooth rampup.
But actually there is almost no chance the process dies so fast meaning
that by default we switch to the sleep 1.
Moving from sleep 0 to sleep 1 doesn't seems a big win but as
kill_daemons() is called very often we can save a lot of time by then
end.
This patch offer to sleep first a 1/10th of second instead of 0 and then
1/20th of second instead of 0.
The sleep call is also moved after the kill call as it's not necessary
waiting before executing the command.
This patch makes the running time of a test like osd-scrub-repair.sh
dropping from 7m30 to 7m7.
Saving another ~30seconds is an interesting win at make check level.
Signed-off-by: Erwan Velu <erwan@redhat.com>
osd-scrub-repair is making several similar objectore calls in a
sequential way while they could be easily parallelized.
Each single objectore call can spent up to dozen of seconds so making
the call parallel is saving a lot of time while keeping the code pretty
simple.
This particular patch saves approx. 2 minutes on the actual code on a recent
laptop. The global running time of osd-scrub-repair drops from 9m33 to
7m37 !
Signed-off-by: Erwan Velu <erwan@redhat.com>
wait_for_clean() is a very common call when running the make check.
It does wait the cluster to be stable before continuing.
This script was doing the same calls twice and could be optimized by
making the useful calls only once.
is_clean() function was checking num_pgs & get_num_active_clean()
The main loop itself was also calling get_num_active_clean()
This patch is inlining the is_clean() inside this loop to benefit from a
single get_num_active_clean() call. This avoid a useless call of (ceph +
xmlstarlet).
This patch does move all the 'timer reset' conditions into an else
avoiding spawning other ceph+xmlstarlet call while we already know we
should reset the timer.
The last modification is to reduce the sleeping time as the state of the
cluster is changing very fast.
This whole patch could looks like almost not a big win but for a test
like test/osd/osd-scrub-repair.sh, we drop from 9m56 to 9m30 while
reducing the number system calls.
At the scale of make check, that's a lot of saving.
Signed-off-by: Erwan Velu <erwan@redhat.com>
get_num_active_clean() is called very often but spawn 1 useless process.
The current "grep -v | wc -l" can be easily replaced by "grep -cv" which
do the same while spawning one process less.
Signed-off-by: Erwan Velu <erwan@redhat.com>