Commit Graph

31357 Commits

Author SHA1 Message Date
Kri5
b88b257241 Merge pull request #1220 from dachary/wip-filestore
tests: fix objectstore tests
2014-02-12 12:26:28 +01:00
Loic Dachary
b64f1e39a8 tests: fix objectstore tests
The objectstore test from 1a588f18ba was
missing a few changes.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-02-12 11:52:37 +01:00
Loic Dachary
a065e2c0d0 Merge pull request #1212 from ywang19/master
correct one command line at building packages section

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-02-12 09:06:31 +01:00
Sage Weil
ed0980c925 Merge pull request #1168 from yuyuyu101/wip-refactor-objectstore-test
Rename test/filestore to test/objectstore

Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-11 21:17:05 -08:00
Sage Weil
7f76e78bf2 Merge pull request #1218 from yuyuyu101/wip-misc-fix
Fix bad dealloctor

Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-11 21:12:09 -08:00
Haomai Wang
b5c10bf059 Fix bad dealloctor
Memory allocated by malloc() should be deallocated by free(), not 'delete'

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-02-12 12:04:30 +08:00
ywang19
a4b3b786ff correct one command line at building packages section
Signed-off-by: Wang, Yaguang <yaguang.wang@intel.com>
2014-02-12 10:38:16 +08:00
Sage Weil
33692a2c02 osdmaptool: fix cli test
Encoding the extra tunable byte threw off the output here.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 18:38:16 -08:00
Sage Weil
fed83969b2 tset_bufferlist: fix signed/unsigned comparison
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 18:16:58 -08:00
Loic Dachary
8533b6ac2e Merge pull request #1185 from ceph/wip-crush
crush: "vary_r" tunable

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-02-11 23:42:53 +01:00
Sage Weil
d136eb4cbd mon: allow firefly crush tunables to be selected
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 11:12:56 -08:00
Sage Weil
e3309bce03 doc/rados/operations/crush: describe new vary_r tunable
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 11:12:56 -08:00
Sage Weil
525b2d2663 crush: add firefly tunables baseline test
This is a user's map that gives different results when the vary_r tunable
is adjusted.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 11:12:56 -08:00
Sage Weil
37f840b499 crushtool: new cli tests for the vary-r tunable
These illustrate the variation in mapping results as the vary_r tunable
is adjusted.  Note:

1- For the vary_r=0 case, we have several inputs that map to only a single
output:

      rule 3 (delltestrule) num_rep 4 result size == 1:\t27/1024 (esc)
      rule 3 (delltestrule) num_rep 4 result size == 2:\t997/1024 (esc)

This is the behavior we are fixing.  For all of the other values of
vary_r, we get 2 outputs for all inputs.

2- If we use vary_r 1, which is likely the most efficient computation,
we get lots of inputs that change.  By setting larger values of vary_r,
we can trade a bit of extra computation to get a mapping that is more
similar to the legacy behavior. This is useful for legacy clusters:

    $ for f in `seq 1 4` ; do diff -u test-map-vary-r-0.t test-map-vary-r-$f.t | grep -c -- +  ; done
    3030
    1629
    645
    228

The crushmap here comes from a user who was seeing a bad mapping for certain
pgs after some OSDs were reweighted by utilization.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 11:11:25 -08:00
Sage Weil
e88f843c99 crush: add infrastructure around SET_CHOOSELEAF_VARY_R rule step/command
This will let you vary the vary_r tunable on a per-rule basis.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 08:48:14 -08:00
Sage Weil
f944ccc20a crush: add SET_CHOOSELEAF_VARY_R step
This lets you adjust the vary_r tunable on a per-rule basis.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 08:48:14 -08:00
Sage Weil
e20a55d906 crush: add infrastructure around new chooseleaf_vary_r tunable
- encoding
- feature bit
- decompile/compile

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 08:48:14 -08:00
Sage Weil
dd200c924e Merge pull request #1207 from dachary/wip-7378
common: admin socket fallback to json-pretty format

Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-11 08:30:31 -08:00
Loic Dachary
c36a6ed4c3 Merge pull request #1198 from dachary/wip-mailmap
mailmap updates

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
2014-02-11 16:14:49 +01:00
Loic Dachary
ac16fd6e1a mailmap: Derek Yarnell is with University of Mississippi
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-02-11 10:38:26 +01:00
Loic Dachary
9e43f939aa mailmap: Dmitry Smirnov is with Debian GNU/Linux
Reviewed-by: Dmitry Smirnov <onlyjob@member.fsf.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-02-11 10:38:26 +01:00
Loic Dachary
0869fcb40b mailmap: Eric Mourgaya is with Credit Mutuel Arkea
and name normalization

Reviewed-by: Eric Mourgaya <eric.mourgaya@arkea.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-02-11 10:37:52 +01:00
Loic Dachary
165e76d4d0 common: admin socket fallback to json-pretty format
If the format argument to a command sent to the admin socket is not
among the supported formats ( json, json-pretty, xml, xml-pretty ) the
new_formatter function will return null and the AdminSocketHook::call
function must fall back to a sensible default.

The CephContextHook::call and HelpHook::call failed to do that and a
malformed format argument would cause the mon to crash. A check is added
to each of them and fallback to json-pretty if the format is not
recognized.

To further protect AdminSocketHook::call implementations from similar
problems the format argument is checked immediately after accepting the
command in AdminSocket::do_accept and replaced with json-pretty if it is
not known.

A test case is added for both CephContextHook::call and HelpHook::call
to demonstrate the problem exists and is fixed by the patch.

Three other instances of unsafe calls to new_formatter were found and
a fallback to json-pretty was added. All other calls have been audited
and appear to be safe.

http://tracker.ceph.com/issues/7378 fixes #7378

Backport: emperor, dumpling
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-02-11 09:21:35 +01:00
John Wilkins
576465faca Merge pull request #1209 from fghaas/master
doc: highlight that "raw" is the only useful RBD format for QEMU
2014-02-10 15:52:31 -08:00
Florian Haas
9292cc215a doc: highlight that "raw" is the only useful RBD format for QEMU
Explain why people should be using the "raw" image format for RBD
volumes created for use by QEMU: using any other format adds only
overhead, but no extra value (since RBDs are also CoW and
thin-provisioned), plus the Qcow2 storage driver is not migration safe
when caching is enabled, whereas the RBD driver is.

Also, fix a minor glitch in the example qemu-img commands ("-f rbd"
and "-O rbd" should really be "-f raw" and "-O raw").

Finally, drop the "-f" option altogether on qemu-img commands where it
makes no sense (info and resize).

Signed-off-by: Florian Haas <florian@hastexo.com>
2014-02-11 00:30:07 +01:00
Josh Durgin
32aa9fdf66 Merge branch wip-librados-timeout
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
2014-02-10 14:12:35 -08:00
Josh Durgin
af5d0fcd90 Merge pull request #1205 from ceph/wip-7334
use `partx` for CentOS/RHEL instead of `partprobe`
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-02-10 13:09:40 -08:00
Josh Durgin
78240c266a Merge pull request #1204 from ceph/wip-fsetpipesz-fix
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-02-10 12:59:03 -08:00
Josh Durgin
9e62beb80b qa: add script for testing rados client timeout options
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-02-10 12:53:12 -08:00
Josh Durgin
79c1874346 rados: check return values for commands that can now fail
A few places were not checking the return values of commands, since
they could not fail before timeouts were added.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-02-10 12:53:12 -08:00
Josh Durgin
8e9459e897 librados: check and return on error so timeouts work
Some functions could not previously return errors, but they had an
int return value, which can now receive ETIMEDOUT.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-02-10 12:53:12 -08:00
Josh Durgin
d389e617c1 msg/Pipe: add option to restrict delay injection to specific msg type
This makes it possible to test timeouts reliably by delaying certain
messages effectively forever, but still being able to e.g. connect and
authenticate to the monitors.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-02-10 12:53:12 -08:00
Josh Durgin
671a76d64b MonClient: add a timeout on commands for librados
Just use the conf option directly, since librados is the only caller.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-02-10 12:53:11 -08:00
Josh Durgin
3e1f7bbb42 Objecter: implement mon and osd operation timeouts
This captures almost all operations from librados other than mon_commands().

Get the values for the timeouts from the Objecter constructor, so only
librados uses them.

Add C_Cancel_*_Op, finish_*_op(), and *_op_cancel() for each type of
operation, to mirror those for Op. Create a callback and schedule it
in the existing timer thread if the timeouts are specified.

Fixes: #6507
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-02-10 12:53:11 -08:00
Alfredo Deza
9bcc42a3e6 alert the user about error messages from partx
Signed-off-by: Alfredo Deza <alfredo@deza.pe>
2014-02-10 15:09:39 -05:00
Alfredo Deza
42900ff9da use partx for red hat or centos instead of partprobe
Signed-off-by: Alfredo Deza <alfredo@deza.pe>
2014-02-10 15:09:18 -05:00
Sage Weil
2c5783cc6c Merge remote-tracking branch 'gh/next' 2014-02-10 10:19:55 -08:00
Ilya Dryomov
6926272056 common/buffer: fix build breakage for CEPH_HAVE_SETPIPE_SZ
common/buffer.cc fails to build if CEPH_HAVE_SETPIPE_SZ is defined.
Fix it.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-02-10 19:37:30 +02:00
Ilya Dryomov
a5f479c2aa configure: fix F_SETPIPE_SZ detection
Currently CEPH_HAVE_SETPIPE_SZ is not set even if F_SETPIPE_SZ is
available, because AC_COMPILE_IFELSE test program as written always
fails to compile.  F_SETPIPE_SZ is a macro, so use AC_EGREP_CPP which
works on the preprocessor output instead of trying to compile.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-02-10 19:37:30 +02:00
Ilya Dryomov
450163ec40 configure: don't check for arpa/nameser_compat.h twice
Nuke redundant check and move the real one into the common
AC_CHECK_HEADERS stanza.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-02-10 19:37:30 +02:00
Loic Dachary
dbaf71aa26 mailmap: Moritz Möller is with Bigpoint.com
Reviewed-by: Moritz Möller <mm@mxs.de>
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-02-10 14:46:32 +01:00
Sage Weil
39b393d7fa Merge remote-tracking branch 'gh/wip-7329' into next 2014-02-09 10:37:47 -08:00
Sage Weil
575566b168 ceph_test_rados_api_tier: try harder to trigger the flush vs try-flush race
It seems to be reasonable easy to complete a flush before the next client
request is processed.  Crazy...

Same with the flush vs write race.

Fixes: #7329
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-08 20:20:21 -08:00
Loic Dachary
270714d54e Merge pull request #1201 from ceph/wip-7370
crush: fix tries/retries bug that was recently introduced

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-02-09 01:12:46 +01:00
Loic Dachary
7fe10f1271 Merge pull request #1115 from jcsp/tell_cleanup
Remove some almost-duplicate COMMAND definitions

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-02-09 00:58:39 +01:00
Loic Dachary
cf2d71cef5 Merge pull request #1127 from dmsimard/log_links
Doc: Fix 404 broken links to logging and debug configuration

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
2014-02-09 00:33:42 +01:00
Sage Weil
a8e6c9fbf8 crush: add chooseleaf_vary_r tunable
The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-08 12:27:30 -08:00
Sage Weil
f17caba8ae crush: allow crush rules to set (re)tries counts to 0
These two fields are misnomers; they are *retry* counts.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-08 12:23:05 -08:00
Sage Weil
795704fd61 crush: fix off-by-one errors in total_tries refactor
Back in 27f4d1f6bc we refactored the CRUSH
code to allow adjustment of the retry counts on a per-pool basis.  That
commit had an off-by-one bug: the previous "tries" counter was a *retry*
count, not a *try* count, but the new code was passing in 1 meaning
there should be no retries.

Fix the ftotal vs tries comparison to use < instead of <= to fix the
problem.  Note that the original code used <= here, which means the
global "choose_total_tries" tunable is actually counting retries.
Compensate for that by adding 1 in crush_do_rule when we pull the tunable
into the local variable.

This was noticed looking at output from a user provided osdmap.
Unfortunately the map doesn't illustrate the change in mapping behavior
and I haven't managed to construct one yet that does.  Inspection of the
crush debug output now aligns with prior versions, though.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-08 12:21:33 -08:00
Sage Weil
ed32c4002f crushtool: add cli test for off-by-one tries vs retries bug
See bug #7370.  This passes on dumpling and breaks prior to the #7370 fix.

Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-08 12:21:26 -08:00