All the options are uint64_t, but the ObjectCacher was converting them
to int64_t. There's never any reason for these to be negative, so
change the type.
Adjust a few conditionals so that they only convert known-positive
signed values to uint64_t before comparing with the target and max
values. Leave the actual stats accounting as loff_t for now, since
bugs in accounting will have bad effects if negative values wrap
around.
Backport: emperor, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Explain why people should be using the "raw" image format for RBD
volumes created for use by QEMU: using any other format adds only
overhead, but no extra value (since RBDs are also CoW and
thin-provisioned), plus the Qcow2 storage driver is not migration safe
when caching is enabled, whereas the RBD driver is.
Also, fix a minor glitch in the example qemu-img commands ("-f rbd"
and "-O rbd" should really be "-f raw" and "-O raw").
Finally, drop the "-f" option altogether on qemu-img commands where it
makes no sense (info and resize).
Signed-off-by: Florian Haas <florian@hastexo.com>
A few places were not checking the return values of commands, since
they could not fail before timeouts were added.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Some functions could not previously return errors, but they had an
int return value, which can now receive ETIMEDOUT.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This makes it possible to test timeouts reliably by delaying certain
messages effectively forever, but still being able to e.g. connect and
authenticate to the monitors.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This captures almost all operations from librados other than mon_commands().
Get the values for the timeouts from the Objecter constructor, so only
librados uses them.
Add C_Cancel_*_Op, finish_*_op(), and *_op_cancel() for each type of
operation, to mirror those for Op. Create a callback and schedule it
in the existing timer thread if the timeouts are specified.
Fixes: #6507
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Currently CEPH_HAVE_SETPIPE_SZ is not set even if F_SETPIPE_SZ is
available, because AC_COMPILE_IFELSE test program as written always
fails to compile. F_SETPIPE_SZ is a macro, so use AC_EGREP_CPP which
works on the preprocessor output instead of trying to compile.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
It seems to be reasonable easy to complete a flush before the next client
request is processed. Crazy...
Same with the flush vs write race.
Fixes: #7329
Signed-off-by: Sage Weil <sage@inktank.com>
Back in 27f4d1f6bc we refactored the CRUSH
code to allow adjustment of the retry counts on a per-pool basis. That
commit had an off-by-one bug: the previous "tries" counter was a *retry*
count, not a *try* count, but the new code was passing in 1 meaning
there should be no retries.
Fix the ftotal vs tries comparison to use < instead of <= to fix the
problem. Note that the original code used <= here, which means the
global "choose_total_tries" tunable is actually counting retries.
Compensate for that by adding 1 in crush_do_rule when we pull the tunable
into the local variable.
This was noticed looking at output from a user provided osdmap.
Unfortunately the map doesn't illustrate the change in mapping behavior
and I haven't managed to construct one yet that does. Inspection of the
crush debug output now aligns with prior versions, though.
Signed-off-by: Sage Weil <sage@inktank.com>
This fixes a valgrind error from OSD::handle_osd_map where primary is not
initialized and is compared after the call to pg_to_acting_osds().
We are still not distinguishing from "no mapping" to "pool doesn't exist,
no mapping". That is a somewhat larger change, though.
Signed-off-by: Sage Weil <sage@inktank.com>
Need to initialize the truncated variable, as we sometimes ignore error
response (e.g., with ENOENT), and in such cases we can't expect it to be
set.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 9ecf3467a3)
This command will map all pgs from all pools (or just one pool) to osds
and summarize the placement and calculate the actual standard deviation and
the expected value.
Signed-off-by: Sage Weil <sage@inktank.com>