RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-31 15:32:38 +00:00

Author	SHA1	Message	Date
liuchang0812	4338856be2	msg/async: convert ms_dpdk_gateway_ipv4_addr to SAFE_OPTION We would like to eliminate `safe=false` parameter in opt.set_val. ms_dpdk_gateway_ipv4_addr is modified in runtime. We convert it to SAFE_OPTION to eliminate `safe=fales` and guard thread safity. Signed-off-by: liuchang0812 <liuchang0812@gmail.com>	2017-03-22 20:39:51 +08:00
liuchang0812	ad032a40b4	msg/async: convert ms_dpdk_host_ipv4_addr to SAFE_OPTION ms_dpdk_host_ipv4_addr is modified in src/test/msgr/test_async_networkstack.cc, It's better to use SAFE_OPTION. Signed-off-by: liuchang0812 <liuchang0812@gmail.com>	2017-03-22 20:39:51 +08:00
liuchang0812	321601d8e4	msg/async: convert ms_dpdk_coremask to SAFE_OPTION ms_dpdk_coremask is modified in test/msgr/test_async_networkstack.cc. It's better to declare SAFE_OPTION. Signed-off-by: liuchang0812 <liuchang0812@gmail.com>	2017-03-22 20:39:51 +08:00
liuchang0812	e6755fa596	common: convert ms_type option to SAFE_OPTION We need to modify ms_type in unittest. That use SAFE_OPTION to declare ms_type is safer than pass `safe=false` to set_val. Signed-off-by: liuchang0812 <liuchang0812@gmail.com>	2017-03-22 20:39:51 +08:00
liuchang0812	8838fafc44	os/bluestore: convert rocksdb_db_paths to SAFE_OPTION We need to change `rocksdb_db_paths` in runtime. pass safe=false to set_val is not a good behavior. We should use SAFE_OPTION. Signed-off-by: liuchang0812 <liuchang0812@gmail.com>	2017-03-22 20:39:50 +08:00
Piotr Dałek	8b415e6aed	TrackedOp: allow dumping historic ops sorted by duration Currently dump_historic_ops dumps ops sorted by their initiation time, which may not have any relation to how long it took, and sorting output of that command by op duration is neither fast nor convenient. New asok command ("dump_historic_ops_by_duration") outputs the same op list, but ordered by their duration time (longest first). Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>	2017-03-22 13:32:04 +01:00
optimistyzy	ef8fceff21	Bluestore, NVMEDevice: add the spdk core mask check This patch adds the spdk core mask check and also set the master core for starting DPDK. Signed-off-by: optimistyzy <optimistyzy@gmail.com>	2017-03-22 20:18:19 +08:00
liuchang0812	384496dbd1	rgw/rgw_op: fix memory leak in RGWGetObjLayout Signed-off-by: liuchang0812 <liuchang0812@gmail.com>	2017-03-22 17:27:20 +08:00
Erwan Velu	413c9fcfbe	ceph-disk: Reporting /sys directory in get_partition_dev() When get_partition_dev() fails, it reports the following message : ceph_disk.main.Error: Error: partition 2 for /dev/sdb does not appear to exist The code search for a directory inside the /sys/block/get_dev_name(os.path.realpath(dev)). The issue here is the error message doesn't report that path when failing while it might be involved in. This patch is about reporting where the code was looking at when trying to estimate if the partition was available. Signed-off-by: Erwan Velu <erwan@redhat.com>	2017-03-22 10:11:44 +01:00
Kefu Chai	c41fe1eae1	vstart.sh: do nothing if $CEPH_NUM_* is 0 Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-03-22 13:16:42 +08:00
Kefu Chai	302d8d5f61	vstart.sh: extract start_{osd,mon,mgr,mds} into functions Signed-off-by: Kefu Chai <kchai@redhat.com>	2017-03-22 13:16:41 +08:00
Sage Weil	7412b19d11	os/bluestore: default 16KB min_alloc_size on ssd Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 21:27:23 -05:00
Orit Wasserman	68bc509413	Merge pull request #13963 from cbodley/wip-18725 rgw-admin: remove deprecated regionmap commands Reviewed-by: Orit Wasserman <owasserm@redhat.com>	2017-03-21 23:44:22 +02:00
Sage Weil	ccd1db23fa	os/bluestore/BlueFS: measure used bytes, not free bytes This is more useful info for humans. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:45 -05:00
Sage Weil	7e22e7d5a7	os/bluestore: fix many perfcounter types Most of these are counters, not gauges. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	3fa9a11112	os/bluestore: surface key metrics for daemonperf Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	e441b9e603	os/bluestore/BlueFS: log key bluefs metrics Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	b713d6971e	os/bluestore: log kv latencies Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	4bf44769c4	osd: exclude 'objecter' perfcounters from daemonperf Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	3419826be5	common/perf_counters: allow perfcounters to be excluded from daemonperf By omitting the 'nick' we exclude a whole group of metrics from the daemonperf results. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	5a8686ae95	ceph: daemonperf: order metrics to match asok json dump The daemons report this in a particular order; match that in the daemonperf output. This corresponds to the numeric value of the l_* enum. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	66b42be685	Merge pull request #13888 from liewegas/wip-bluestore-dw os/bluestore: fix deferred writes; improve flush Reviewed-by: Igor Fedotov <ifedotov@mirantis.com>	2017-03-21 15:05:56 -05:00
Casey Bodley	5444c9d092	Merge pull request #13902 from Wilhelmshaven/rm_redundant_code rgw: remove redundant codes in rgw_cache.h Reviewed-by: Casey Bodley <cbodley@redhat.com>	2017-03-21 15:43:48 -04:00
Sage Weil	def17606fc	os/bluestore: handle zombie OpSequencers It's possible for the Sequencer to go away while the OpSequencer still has txcs in flight. We were handling the case where the osr was on the deferred_queue, but it may be off the deferred_queue but waiting for the commit to happen, and we still need to wait for that. Fix this by introducing a 'zombie' state for the osr, in which we keep the osr in the osr_set. Clean up the OpSequencer methods and a few other method names. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:31 -05:00
Sage Weil	d8fa788ca8	os/bluestore: clean up flush_all() Add assertions if we fail to flush everything. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:31 -05:00
Sage Weil	9732b6c8e9	os/bluestore: move cached items around on collection split We've been avoiding doing this for a while and it has finally caught up with us: the SharedBlob may outlive the split due to deferred IO, and a read on the child collection may load a competing Blob and SharedBlob and read from the on-disk blocks that haven't been written yet. Fix by preserving the one-SharedBlob-instance invariant by moving cache items to the new Collection and cache shard like we should have from the beginning. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	e4d547ede7	os/bluestore: simplify flush() wake-up condition Clearer, and fewer wakeups. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	52c93f5b71	ceph_test_objectstore: set bluestore cache shards to 5 Better test coverage! Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	d93d6d0968	unittest_bluestore_types: fix Collection using tests We can't use a bare Collection since we get/put refs, the last put will delete it, and the dtor asserts nref == 0 (no faking a ref and deliberately leaking!). Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	4de29d0f14	os/bluestore/KernelDevice: drop unused flush_lock Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	ed9f54bae7	os/bluestore: better debugging around collections Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	3ad789cef3	os/bluestore: nicer Onode dout prefix Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	e83fc55491	os/bluestore: flush_cache on umount, fsck finish, etc. Otherwise cache items survive beyond umount into the next mount cycle! Also, ensure that we flush_cache before clearing coll_map, as some cache items have references back to the Collection. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	01ef844421	os/bluestore: take Collection ref from SharedBlob These can survive as long as the txc, which can be longer than the Collection. Make sure we have a valid ref as both finish_write and ~SharedBlob use coll for the SharedBlobSet (and coll->store->cct for debug). Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	81e8682be1	os/bluestore: fix perfcounters for deferred io Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	f4d4c9c68a	os/bluestore: remove dead _do_deferred_op code Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	3a3d9ad097	os/bluestore: make throttles tunable online Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	3f9c216145	os/bluestore: prevent throttle deadlock due to deferred writes Kick off deferred IOs if we pass the throttle midpoint or if we would block during submission. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	e46081c8c6	ceph_test_objectstore: fix Synthetic to never modify bufferlists We were modifying bufferlists in place, and kludging around it by making full copies elsewhere. Instead, never modify a buffer. This fixes issues where the buffer we submit to ObjectStore ends up in the cache and we modify in place later, corrupting the implementation's copy. (This was affecting BlueStore.) Rearrange the data methods to be next to each other and clean them up a bit too. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	ba159deb55	os/bluestore: drop obsolete comment Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	1fefeeb39e	os/bluestore: avoid extra dev flush on single device when all io is deferred If we have no non-deferred IO to flush, and we are running bluefs on a single shared device, then we can rely on the bluefs flush to make our current batch of deferred ios stable. Separate deferred into a "done" and "stable" list. If we do sync, put everything from "done" onto "stable". Otherwise, after we do our kv commit via bluefs, move "done" to "stable" then. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	c1f01082a1	os/bluestore: debug alloc release Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	7a3e85f1a0	os/bluestore: flush old/discarded OpSequencers too When the Sequencer goes away it get deregistered. If there are still deferred IOs in flight, we need to wait for those too. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	a4b9012268	os/bluestore: batch up to bluestore_deferred_batch_ops before submitting Allow several deferred writes to accumulate before we submit them. In general we have no time pressure, and on HDD (and perhaps sometimes SSD) it is beneficial to accumulate and batch these so that they result in fewer seeks. On HDD, this is particularly true of seeks away from the journal. And on sequential workloads this can avoid seeks. In may even allow the block layer or SSD firmware to merge IOs and perform fewer writes. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	44d498332c	os/bluestore: only discard deallocated regions of a blob if !shared If a blob is shared, we can't discard deallocated regions: there may be deferred buffers in flight and we might get a read via the clone. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	d3a425faf8	os/bluestore: avoid waking up kv thread on deferred write completion In a simple HDD workload with queue depth of 1, we halve our throughput because the kv thread does a full commit twice per IO: once for the initial commit, and then again to clean up the deferred write record. The second wakeup is unnecessary; we can clean it up on the next commit. We do need to do this wakeup in a few cases, though, when draining the OpSequencers: (1) on replay during startup, and (2) on shutdown in _osr_drain_all(). Send everything through _osr_drain_all() for simplicity. This doubles HDD qd=1 IOPS from ~50 to ~100 on my 7200 rpm test device (rados bench 30 write -b 4096 -t 1). Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	78b9cea09f	os/bluestore: move many initializations into header This is less fragile, especially with 2 constructors. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	6db031be4d	os/bluestore: restructure deferred write queue First, eliminate the work queue--it's useless. We are dispatching aio and should not block. And if a single thread isn't sufficient to do it, it probably means we should be parallelizing kv_sync_thread too (which is our only caller that matters). Repurpose the old osr-list -> txc-list-per-osr queue structure to manage the queuing. For any given osr, dispatch one batch of aios at a time, taking care to collapse any overwrites so that the latest write wins. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	5fafd1fcc2	os/bluestore: fix OpSequencer/Sequencer lifecycle Make osr_set refcounts so that it can tolerate a Sequencer destruction racing with flush or a Sequencer that outlives the BlueStore instance itself. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	3dc82d57e4	os/bluestore: move _osr_reap_done Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00

... 3 4 5 6 7 ...

70095 Commits