Commit Graph

69492 Commits

Author SHA1 Message Date
Sage Weil
bed114db0a os/bluestore: add bluestore_prefer_wal_size[_hdd,_ssd] options
Add option to prefer a WAL write if the write is below a size threshold,
even if we could avoid it.  This lets you trade some write-amp (by
journaling data to rocksdb) for latency in cases where the WAL device is
much faster than the main device.

This affects:

 - writes to new extents locations below min_alloc_size
 - writes to unallocated space below min_alloc_size
 - "big" writes above min_alloc_size that are below the prefer_wal_size
   threshold.

Note that it's applied to individual blobs, not the entirety of the write,
so if your have a larger write torn into two pieces/blobs that are below
the threshold then they will both go through the wal.

Set different defaults for HDD and SSD, since this makes more sense for HDD
where seeks are expensive.

Add some test cases to exercise the option.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 18:10:55 -05:00
Sage Weil
296708091c qa/tasks/ceph_manager: use new luminous set-full-ratio etc
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 16:39:09 -05:00
Yehuda Sadeh
e9228f3460 Merge pull request #13410 from yehudasa/wip-tracing-fix
tracing: don't include oid when tracing at dequeue_op()

Reviewed-by: Sage Weil <sage@redhat.com>
2017-03-07 13:31:47 -08:00
Sage Weil
4272214136 Merge pull request #13839 from theanalyst/release/10.2.6/changelog
doc: add changelog for v10.2.6 Jewel release
2017-03-07 15:30:04 -06:00
Abhishek Lekshmanan
32e128c093 doc: add changelog for v10.2.6 Jewel release
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
2017-03-07 21:44:23 +01:00
John Spray
92e7e890c3 Merge pull request #13704 from batrick/mds-counter-unify
mds: remove some redundant object counters

Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-03-07 19:50:11 +00:00
Sage Weil
c4b73f19a7 osdc/Objecter: resend RWORDERED ops on full
Our condition for respecting the FULL flag is complex, and involves
the WRITE | RWORDERED flags vs the FULL_FORCE | FULL_TRY flags.  Previously,
we could block a read bc of RWORDRED but not resend it later.

Fix by capturing the complex condition in a respects_full() bool and using
it both for the blocking-on-send and resending-on-possibly-notfull-later
checks.

Fixes: http://tracker.ceph.com/issues/19133
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 13:33:44 -05:00
Sage Weil
a202b68d18 qa/tasks/thrashosds: chance_thrash_cluster_full
Induce a momentarily full cluster.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-07 13:33:44 -05:00
Daniel Gryniewicz
0007adb5b7 Merge pull request #13832 from linuxbox2/wip-rgw-fs_inst
rgw_file:  fix fs_inst progression
2017-03-07 12:52:44 -05:00
Yuri Weinstein
05412184b5 Merge pull request #10240 from songbaisen/b2
mon: remove the redudant jugement in paxosservice is_writeable function

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-03-07 08:57:40 -08:00
Matt Benjamin
0e988edfb6 rgw_file: fix fs_inst progression
Reported by Gui Hecheng<guimark@126.com>.  This change is a
variation on proposed fix by Dan Gryniewicz<dang@redhat.com>
to take root_fh.state.dev as fs_inst for new handles.

Fixes: http://tracker.ceph.com/issues/19214

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
2017-03-07 11:43:39 -05:00
Radoslaw Zarzynski
6440750f53 qa/tasks/rgw.py: start Apache before RadosGW.
At the end of start_rgw() we wait till establishing HTTP connections
with RadosGW become possible. However, if RadosGW uses the FastCGI,
the condition can't be fulfilled without spawning HTTP server first.

Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
2017-03-07 17:31:52 +01:00
Mykola Golub
fe31bca22f librbd: relax "is parent mirrored" check when enabling mirroring for pool
If the parent is in the same pool and has the journaling feature enabled
we can assume the mirroring will eventually be enabled for it.

Fixes: http://tracker.ceph.com/issues/19130
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
2017-03-07 17:16:09 +01:00
Sage Weil
3f5269d8b5 Merge pull request #13323 from yehudasa/wip-18079-2
librados: use cursor for nobjects listing

Reviewed-by: Sage Weil <sage@redhat.com>
2017-03-07 08:41:08 -06:00
John Spray
73100305e5 Merge pull request #13262 from batrick/multimds-thrasher
Add multimds:thrash sub-suite and fix bugs in thrasher for multimds

Reviewed-by: John Spray <john.spray@redhat.com>
2017-03-07 14:29:18 +00:00
John Spray
76589ed9e1 doc: instructions and guidance for multimds
Inspired by http://tracker.ceph.com/issues/19135

Signed-off-by: John Spray <john.spray@redhat.com>
2017-03-07 14:08:22 +00:00
Kefu Chai
cec7e1a0db Merge pull request #13560 from wjwithagen/wip-wjw-ceph-disk-tests
ceph-disk/tests: Certain partition types do not work on FreeBSD

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-03-07 15:37:18 +08:00
Haomai Wang
d124e6f669 Merge pull request #13810 from yuyuyu101/wip-rdma-inflight
msg/async/rdma: destroy QueuePair if needed

Reviewed-by: Adir lev <adirl@mellanox.com>
2017-03-07 15:25:55 +08:00
Kefu Chai
a07452d9d0 Merge pull request #13742 from liupan1111/wip-cleanup-journal
os/filestore: use existing variable for same func.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-03-07 12:24:50 +08:00
Kefu Chai
17ae338be3 Merge pull request #12177 from kylinstorage/wip-remove-unneeded-loop
os/filestore/FileStore.cc: remove unneeded loop

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-03-07 12:21:32 +08:00
Kefu Chai
413efbc60f Merge pull request #13741 from rzarzynski/wip-bs-fastcrc32-in-rocks
os/bluestore: enable SSE-assisted CRC32 calculations in RocksDB

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-03-07 12:06:22 +08:00
Kefu Chai
bdc22afd9b Merge pull request #13768 from tchaikov/wip-clang-fixes
librados, osd: clang fixes

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
2017-03-07 12:05:36 +08:00
Kefu Chai
895561376c Merge pull request #13794 from liewegas/wip-clog-newlines
common: remove \n on clog messages

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
2017-03-07 12:04:52 +08:00
Kefu Chai
d8bea23c5d Merge pull request #13796 from liewegas/wip-debian-base-dbg
debian/control: add ceph-base-dbg

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-03-07 12:04:25 +08:00
Sage Weil
3c80e15c3b qa/suites/upgrade/jewel-x/parallel: upgrade mons before osds
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 22:02:19 -05:00
Sage Weil
1a0ad2b488 qa/suites/upgrade/jewel-x/parallel: expand workload matrix
These should run independently against a racing upgrade.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 22:02:19 -05:00
Haomai Wang
53b4cece12 Merge pull request #13799 from optimistyzy/36_1
bluestore, NVMEDEVICE: Specify the max io completion in conf

Reviewed-by: Haomai Wang <haomai@xsky.com>
Reviewed-by: Pan Liu <liupan1111@gmail.com>
2017-03-07 09:48:28 +08:00
Jason Dillaman
91ae4cd794 Merge pull request #13806 from vshankar/rbd-internal-api-move
test/librbd: move tests using non-public api to internal

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Mykola Golub <mgolub@mirantis.com>
2017-03-06 20:45:48 -05:00
runsisi
709d94ab76 librbd: rados callback cleanup
Signed-off-by: runsisi <runsisi@zte.com.cn>
2017-03-07 08:35:27 +08:00
Sage Weil
699df7d2b5 test/cli/osdmaptool: fix osdmap output
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 17:21:23 -05:00
Sage Weil
d795c3c457 mon/OSDMonitor: generate health warnings for luminous
Note that this tells us how many OSDs are full or nearfull; it
does not include detailed warnings telling you exactly what the
utilization is because we don't have the full osd_stat_t
available.  We leave it to ceph-mgr to generate those health
messages.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 17:21:22 -05:00
Sage Weil
8bab735909 mon/PGMonitor: stop generating health warnings with luminous
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 17:21:22 -05:00
Sage Weil
707e43d5ae mon/OSDMonitor: set cluster flags based on osd flags (luminous)
For luminous, set cluster flags based on osd flags.  Until
require_luminous is set, stick with the old pgmap-based behavior.
Move the new check to encode_pending so that the cluster flag is
set in the same epoch that the osd state(s) change.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 17:21:22 -05:00
Sage Weil
00a8bfa554 osd: request a fullness state change during tick if needed
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 17:21:22 -05:00
Sage Weil
53c1868f36 osd: require fullness state changes (as needed) before boot
This ensures that we don't have a down osd that is marked full
go up, then realize it's not actually full, and then clear its
full flag.  That would result in a cluster full blip that isn't
needed. This can easily happen if the full_ratio in the osdmap is
increased while the OSD is down.

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 17:21:22 -05:00
Sage Weil
9f0bc152ab osd: restructure and simplify internal fullness checks
First, eliminate the useless nearfull failsafe--all it did was
generate a log message, which we can do based on the OSDMap
states.

Add some new helpers.

Unify the cluster nearfull/full vs failsafe states so that
failsafe is a "really" full state that is more severe than
full, so we have NONE, NEARFULL, FULL, FAILSAFE.

Pull the full/nearfull ratios out of the OSDMap (remember that
we require luminous mons, so these will be initialized).

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 17:21:21 -05:00
Sage Weil
394e45ad81 mon/PGMonitor: disable old 'pg set_[near]full_ratio ...' in luminous
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 16:42:34 -05:00
Sage Weil
03287f7d22 qa/workunits/cephtool/test.sh: change [near]full_ratio tests
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 16:42:34 -05:00
Sage Weil
6422e0a220 mon/OSDMonitor: implement new 'osd set-[near]full-ratio ...' commands
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 16:42:33 -05:00
Sage Weil
0da7561785 mon/OSDMonitor: initialize osdmap ratios from pgmap on upgrade
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 16:42:33 -05:00
Sage Weil
15f89706c5 mon/OSDMonitor: set osdmap ratios on mkfs
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 16:42:33 -05:00
Sage Weil
14b1ab1456 mon/OSDMonitor: handle MOSDFull messages from OSDs
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 16:42:33 -05:00
Josh Durgin
3580d224d7 Merge pull request #13755 from liewegas/wip-19131
osd/osd_internal_types: wake snaptrimmer on put_read lock, too

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-03-06 12:48:25 -08:00
Patrick Donnelly
6260343c1b
mds: print rank as int
If the MDS has no rank then its whoami field would be printed as:

    {"cluster_fsid":"4c1bae66-03fb-4b9a-bd88-108636d29758","whoami":18446744073709551615,"id":54239,"want_state":"up:boot","state":"???","mdsmap_epoch":22,"osdmap_epoch":0,"osdmap_epoch_barrier":0}

Fixes: http://tracker.ceph.com/issues/19201

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-03-06 15:42:21 -05:00
Yehuda Sadeh
5bdfc6d3a8 Merge pull request #13411 from yehudasa/wip-vstart-rgw-fix
vstart: don't configure rgw_dns_name

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2017-03-06 11:29:14 -08:00
Sage Weil
b2d354d563 qa/suites/upgrade/jewel-x/stress-split-erasure-code: box thrashosds
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 14:11:53 -05:00
Sage Weil
56f9387736 qa/suites/upgrade/jewel-x/stress-split: finish client.0 upgrade too
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 14:07:53 -05:00
Sage Weil
4e9c362e30 osd: rename failsafe [near]full getters appropriately
...and make most of these methods private to clarify the public
interface

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 13:59:59 -05:00
Sage Weil
5c6b9d9dcd osd/OSDMap: add [near]full_ratio to OSDMap[::Incremental]
This used to live in PGMap; we're moving it here for luminous
(which makes more sense anyway!).

Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 13:59:59 -05:00
Sage Weil
8a73202d79 osd: add per-osd FULL and NEARFULL state bits
Signed-off-by: Sage Weil <sage@redhat.com>
2017-03-06 13:59:59 -05:00