Commit Graph

31232 Commits

Author SHA1 Message Date
Sage Weil
d136eb4cbd mon: allow firefly crush tunables to be selected
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 11:12:56 -08:00
Sage Weil
e3309bce03 doc/rados/operations/crush: describe new vary_r tunable
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 11:12:56 -08:00
Sage Weil
525b2d2663 crush: add firefly tunables baseline test
This is a user's map that gives different results when the vary_r tunable
is adjusted.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 11:12:56 -08:00
Sage Weil
37f840b499 crushtool: new cli tests for the vary-r tunable
These illustrate the variation in mapping results as the vary_r tunable
is adjusted.  Note:

1- For the vary_r=0 case, we have several inputs that map to only a single
output:

      rule 3 (delltestrule) num_rep 4 result size == 1:\t27/1024 (esc)
      rule 3 (delltestrule) num_rep 4 result size == 2:\t997/1024 (esc)

This is the behavior we are fixing.  For all of the other values of
vary_r, we get 2 outputs for all inputs.

2- If we use vary_r 1, which is likely the most efficient computation,
we get lots of inputs that change.  By setting larger values of vary_r,
we can trade a bit of extra computation to get a mapping that is more
similar to the legacy behavior. This is useful for legacy clusters:

    $ for f in `seq 1 4` ; do diff -u test-map-vary-r-0.t test-map-vary-r-$f.t | grep -c -- +  ; done
    3030
    1629
    645
    228

The crushmap here comes from a user who was seeing a bad mapping for certain
pgs after some OSDs were reweighted by utilization.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 11:11:25 -08:00
Sage Weil
e88f843c99 crush: add infrastructure around SET_CHOOSELEAF_VARY_R rule step/command
This will let you vary the vary_r tunable on a per-rule basis.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 08:48:14 -08:00
Sage Weil
f944ccc20a crush: add SET_CHOOSELEAF_VARY_R step
This lets you adjust the vary_r tunable on a per-rule basis.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 08:48:14 -08:00
Sage Weil
e20a55d906 crush: add infrastructure around new chooseleaf_vary_r tunable
- encoding
- feature bit
- decompile/compile

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-11 08:48:14 -08:00
Sage Weil
a8e6c9fbf8 crush: add chooseleaf_vary_r tunable
The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-08 12:27:30 -08:00
Sage Weil
f17caba8ae crush: allow crush rules to set (re)tries counts to 0
These two fields are misnomers; they are *retry* counts.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-08 12:23:05 -08:00
Sage Weil
795704fd61 crush: fix off-by-one errors in total_tries refactor
Back in 27f4d1f6bc we refactored the CRUSH
code to allow adjustment of the retry counts on a per-pool basis.  That
commit had an off-by-one bug: the previous "tries" counter was a *retry*
count, not a *try* count, but the new code was passing in 1 meaning
there should be no retries.

Fix the ftotal vs tries comparison to use < instead of <= to fix the
problem.  Note that the original code used <= here, which means the
global "choose_total_tries" tunable is actually counting retries.
Compensate for that by adding 1 in crush_do_rule when we pull the tunable
into the local variable.

This was noticed looking at output from a user provided osdmap.
Unfortunately the map doesn't illustrate the change in mapping behavior
and I haven't managed to construct one yet that does.  Inspection of the
crush debug output now aligns with prior versions, though.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-08 12:21:33 -08:00
Sage Weil
ed32c4002f crushtool: add cli test for off-by-one tries vs retries bug
See bug #7370.  This passes on dumpling and breaks prior to the #7370 fix.

Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-08 12:21:26 -08:00
tamil
3d656600e9 script to test rgw multi part uploads using s3 interface
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
(cherry picked from commit 5d59dd9cd6)
2014-02-07 22:27:05 -08:00
tamil
d4e5db58fa Merge branch 'next' of github.com:ceph/ceph into next 2014-02-07 17:10:10 -08:00
tamil
0bac064e90 added script to test rgw user quota
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2014-02-07 17:09:30 -08:00
Gregory Farnum
b04ae1341f Merge pull request #1197 from ceph/wip-osdmap-primary
osd/OSDMap: populate *primary when pool dne

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-02-07 10:21:40 -08:00
Sage Weil
3a5fa8765f osd/OSDMap: populate *primary when pool dne
This fixes a valgrind error from OSD::handle_osd_map where primary is not
initialized and is compared after the call to pg_to_acting_osds().

We are still not distinguishing from "no mapping" to "pool doesn't exist,
no mapping".  That is a somewhat larger change, though.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-07 09:38:37 -08:00
Yehuda Sadeh
5b7e2b2297 rgw: initialize variable before call
Need to initialize the truncated variable, as we sometimes ignore error
response (e.g., with ENOENT), and in such cases we can't expect it to be
set.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 9ecf3467a3)
2014-02-07 08:43:47 -08:00
Sage Weil
bb6d3f81a7 rest/test.py: use larger max_file_size for mds set test
Current min is 64k.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-07 06:11:57 -08:00
Sage Weil
3d4353108b Merge pull request #1190 from ceph/wip-snaptest-next
qa/workunits/snaps: New allow_new_snaps syntax
2014-02-05 13:04:44 -08:00
John Spray
ce0e3bd188 qa/workunits/snaps: New allow_new_snaps syntax
These were probably just obscuring other failures.

Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: John Spray <john.spray@inktank.com>
2014-02-05 21:00:12 +00:00
Sage Weil
eb18c0a8d3 Merge pull request #1183 from ceph/wip-7336
rgw: fix rgw_read_user_buckets() use of max param

Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-04 21:33:54 -08:00
Yehuda Sadeh
04b1ae466e rgw: fix rgw_read_user_buckets() use of max param
Fixes: #7336

The rgw_read_user_buckets() treated the max param as the max number of
entries to request in a single op, but always fetched the entire list
of buckets. This is wrong, as it should have treated it as the total
number of entries requested. All the callers assume the latter.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-02-04 10:37:11 -08:00
Sage Weil
60ca6f699b client: fix warnings
client/Client.cc: In member function 'int Client::_read(Fh*, int64_t, uint64_t, ceph::bufferlist*)':
warning: client/Client.cc:5893:27: comparison between signed and unsigned integer expressions [-Wsign-compare]
client/Client.cc: In member function 'int Client::_write(Fh*, int64_t, uint64_t, const char*)':
warning: client/Client.cc:6235:30: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-03 21:12:41 -08:00
Sage Weil
a23a2c8f01 os/KeyValueStore: fix warning
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-03 17:50:32 -08:00
Sage Weil
8e30db8f2a rest: add a few rest api tests
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-03 17:50:32 -08:00
Sage Weil
eb9ffd5a79 mon: use 'mds set inline_data ...' for enable/disable of inline data
This makes the management interface a bit more consistent.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-03 17:50:32 -08:00
Sage Weil
408b0c8e75 mon: fix 'mds set allow_new_snaps'
We had already added this as a flag (set/unset) when I generalized the
'mds set_max_mds' to be 'ceph mds set <var> <val>'.  Switch the snaps
flag to be a key/value to with true/false (similar to the hashpspool
pool flag) since there are fewer users and the var/val approach is more
general.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2014-02-03 17:50:18 -08:00
Ken Dreyer
f9b9f52420 Merge branch 'next' 2014-02-03 22:39:26 +00:00
Ken Dreyer
3b990136bf v0.76 2014-02-03 18:26:25 +00:00
Sage Weil
7ff2b541c2 client: use 64-bit value in sync read eof logic
The file size can jump to a value that is very much larger than our current
position (for example, it could be a disk image file that gets a sparse
write at a large offset).  Use a 64-bit value so that 'some' doesn't
overflow.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: John Spray <john.spray@inktank.com>
2014-02-03 08:54:14 -08:00
Sage Weil
e4ff4720d5 Merge remote-tracking branch 'gh/next'
Conflicts:
	src/mon/OSDMonitor.cc
	src/osd/OSDMap.h
2014-02-02 09:40:11 -08:00
Sage Weil
29eac1d14e Merge remote-tracking branch 'gh/wip-inline'
Passed fs suite, sage-2014-02-01_22:18:10-fs-wip-inline-testing-basic-plana,
modulo a snap test error in the suite.

Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-02 09:16:06 -08:00
Sage Weil
9c3a4d8af6 Merge pull request #1165 from mo22/client_fuse_multithreading
client: ceph-fuse use fuse_session_loop_mt to allow multithreaded operat...

Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-01 21:00:41 -08:00
John Wilkins
b717e11b52 Merge pull request #1174 from alram/master
doc: rgw: el6 documentation fixes
2014-01-31 14:20:37 -08:00
Alexandre Marangone
ee4cfda151 doc: rgw: el6 documentation fixes
- fix a couple of typo for repo configuration and service restart
- the ceph package must be installed on RPM distro since the init
script relies on ceph-conf
- Note on radosgw service name for RPM distro

Signed-off-by: Alexandre Marangone <alexandre.marangone@inktank.com>
2014-01-31 13:55:55 -08:00
David Zafman
dffe6019c3 Merge pull request #1162 from ceph/wip-5997
Fixes: #5997

 Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-01-31 12:35:56 -08:00
David Zafman
48fbccece5 osd: Change some be_compare_scrub_objects() args to const
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-01-31 11:00:22 -08:00
David Zafman
ce1ea619f6 osd: Change be_scan_list() arg to const
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-01-31 11:00:22 -08:00
David Zafman
e1bfed52f9 common: buffer::ptr::cmp() is a const function
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-01-31 11:00:22 -08:00
David Zafman
34eb549cd4 osd: Move the rest of scrubbing routines to the backend
Move enum scrub_error_type to osd_types.h
Move PG::_compare_scrub_objects to ReplicatedBackend::be_compare_scrub_objects
Move PG::_select_auth_object to ReplicatedBackend::be_select_auth_object
Move PG::_compare_scrubmaps to ReplicatedBackend::be_compare_scrubmaps

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-01-31 11:00:22 -08:00
David Zafman
f9128e89a3 osd: Move PG::_scan_list() to backend as ReplicatedBackend::be_scan_list()
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-01-31 11:00:22 -08:00
David Zafman
37447e758e osd: Add scrub_supported() backend interface
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-01-31 11:00:22 -08:00
Sage Weil
560f5f1f88 OSDMap: fix deepish_copy_from
Start with a shallow copy!

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d0f13f5414)

Conflicts:

	src/osd/OSDMap.h
2014-01-31 07:57:20 -08:00
Sage Weil
d5080799c8 OSDMonitor: use deepish_copy_from for remove_down_pg_temp
This is a backport of 368852f6c0.

Make a deep copy of the OSDMap to avoid clobbering the in-memory copy with
the call to apply_incremental.

Fixes: #7060
Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-31 07:57:04 -08:00
Sage Weil
61914d82bf OSDMap: deepish_copy_from()
Make a deep(ish) copy of another OSDMap.  Unfortunatley we can't make the
compiler-generated copy operator/constructors private until c++11.  :(

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit bd54b9841b)
2014-01-31 07:57:01 -08:00
Sage Weil
802692ed8e os/KeyValueStore: fix warning
./os/KeyValueStore.h: In member function ‘std::string KeyValueStore::strip_object_key(uint64_t)’:
warning: ./os/KeyValueStore.h:173:31: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]

Signed-off-by: Sage Weil <sage@inktank.com>
2014-01-31 07:19:10 -08:00
Sage Weil
f8316f1a1a Merge branch 'wip-inline' of git://github.com/kylinstorage/ceph
Conflicts:
	src/include/ceph_features.h
2014-01-31 07:00:49 -08:00
Sage Weil
3a53d6deae Merge pull request #1171 from ceph/wip-osdmap-features
mon: encode full osdmap with same feature bits as the incremental

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-01-30 21:01:16 -08:00
Josh Durgin
abcc17bf3f Merge pull request #1169 from dachary/wip-ceph-disk
Reviewed-by: Sage Weil <sage.weil@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-01-30 16:04:09 -08:00
Josh Durgin
3665815738 Merge remote-tracking branch 'origin/next'
Conflicts:
	src/test/ceph-disk.sh
2014-01-30 15:40:09 -08:00