RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-03 09:32:43 +00:00

Author	SHA1	Message	Date
Orit Wasserman	68bc509413	Merge pull request #13963 from cbodley/wip-18725 rgw-admin: remove deprecated regionmap commands Reviewed-by: Orit Wasserman <owasserm@redhat.com>	2017-03-21 23:44:22 +02:00
Sage Weil	ccd1db23fa	os/bluestore/BlueFS: measure used bytes, not free bytes This is more useful info for humans. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:45 -05:00
Sage Weil	7e22e7d5a7	os/bluestore: fix many perfcounter types Most of these are counters, not gauges. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	3fa9a11112	os/bluestore: surface key metrics for daemonperf Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	e441b9e603	os/bluestore/BlueFS: log key bluefs metrics Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	b713d6971e	os/bluestore: log kv latencies Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	4bf44769c4	osd: exclude 'objecter' perfcounters from daemonperf Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	3419826be5	common/perf_counters: allow perfcounters to be excluded from daemonperf By omitting the 'nick' we exclude a whole group of metrics from the daemonperf results. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	5a8686ae95	ceph: daemonperf: order metrics to match asok json dump The daemons report this in a particular order; match that in the daemonperf output. This corresponds to the numeric value of the l_* enum. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 15:06:44 -05:00
Sage Weil	66b42be685	Merge pull request #13888 from liewegas/wip-bluestore-dw os/bluestore: fix deferred writes; improve flush Reviewed-by: Igor Fedotov <ifedotov@mirantis.com>	2017-03-21 15:05:56 -05:00
Casey Bodley	5444c9d092	Merge pull request #13902 from Wilhelmshaven/rm_redundant_code rgw: remove redundant codes in rgw_cache.h Reviewed-by: Casey Bodley <cbodley@redhat.com>	2017-03-21 15:43:48 -04:00
Sage Weil	def17606fc	os/bluestore: handle zombie OpSequencers It's possible for the Sequencer to go away while the OpSequencer still has txcs in flight. We were handling the case where the osr was on the deferred_queue, but it may be off the deferred_queue but waiting for the commit to happen, and we still need to wait for that. Fix this by introducing a 'zombie' state for the osr, in which we keep the osr in the osr_set. Clean up the OpSequencer methods and a few other method names. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:31 -05:00
Sage Weil	d8fa788ca8	os/bluestore: clean up flush_all() Add assertions if we fail to flush everything. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:31 -05:00
Sage Weil	9732b6c8e9	os/bluestore: move cached items around on collection split We've been avoiding doing this for a while and it has finally caught up with us: the SharedBlob may outlive the split due to deferred IO, and a read on the child collection may load a competing Blob and SharedBlob and read from the on-disk blocks that haven't been written yet. Fix by preserving the one-SharedBlob-instance invariant by moving cache items to the new Collection and cache shard like we should have from the beginning. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	e4d547ede7	os/bluestore: simplify flush() wake-up condition Clearer, and fewer wakeups. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	52c93f5b71	ceph_test_objectstore: set bluestore cache shards to 5 Better test coverage! Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	d93d6d0968	unittest_bluestore_types: fix Collection using tests We can't use a bare Collection since we get/put refs, the last put will delete it, and the dtor asserts nref == 0 (no faking a ref and deliberately leaking!). Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	4de29d0f14	os/bluestore/KernelDevice: drop unused flush_lock Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	ed9f54bae7	os/bluestore: better debugging around collections Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	3ad789cef3	os/bluestore: nicer Onode dout prefix Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	e83fc55491	os/bluestore: flush_cache on umount, fsck finish, etc. Otherwise cache items survive beyond umount into the next mount cycle! Also, ensure that we flush_cache before clearing coll_map, as some cache items have references back to the Collection. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	01ef844421	os/bluestore: take Collection ref from SharedBlob These can survive as long as the txc, which can be longer than the Collection. Make sure we have a valid ref as both finish_write and ~SharedBlob use coll for the SharedBlobSet (and coll->store->cct for debug). Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	81e8682be1	os/bluestore: fix perfcounters for deferred io Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:30 -05:00
Sage Weil	f4d4c9c68a	os/bluestore: remove dead _do_deferred_op code Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	3a3d9ad097	os/bluestore: make throttles tunable online Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	3f9c216145	os/bluestore: prevent throttle deadlock due to deferred writes Kick off deferred IOs if we pass the throttle midpoint or if we would block during submission. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	e46081c8c6	ceph_test_objectstore: fix Synthetic to never modify bufferlists We were modifying bufferlists in place, and kludging around it by making full copies elsewhere. Instead, never modify a buffer. This fixes issues where the buffer we submit to ObjectStore ends up in the cache and we modify in place later, corrupting the implementation's copy. (This was affecting BlueStore.) Rearrange the data methods to be next to each other and clean them up a bit too. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	ba159deb55	os/bluestore: drop obsolete comment Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	1fefeeb39e	os/bluestore: avoid extra dev flush on single device when all io is deferred If we have no non-deferred IO to flush, and we are running bluefs on a single shared device, then we can rely on the bluefs flush to make our current batch of deferred ios stable. Separate deferred into a "done" and "stable" list. If we do sync, put everything from "done" onto "stable". Otherwise, after we do our kv commit via bluefs, move "done" to "stable" then. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	c1f01082a1	os/bluestore: debug alloc release Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	7a3e85f1a0	os/bluestore: flush old/discarded OpSequencers too When the Sequencer goes away it get deregistered. If there are still deferred IOs in flight, we need to wait for those too. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	a4b9012268	os/bluestore: batch up to bluestore_deferred_batch_ops before submitting Allow several deferred writes to accumulate before we submit them. In general we have no time pressure, and on HDD (and perhaps sometimes SSD) it is beneficial to accumulate and batch these so that they result in fewer seeks. On HDD, this is particularly true of seeks away from the journal. And on sequential workloads this can avoid seeks. In may even allow the block layer or SSD firmware to merge IOs and perform fewer writes. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	44d498332c	os/bluestore: only discard deallocated regions of a blob if !shared If a blob is shared, we can't discard deallocated regions: there may be deferred buffers in flight and we might get a read via the clone. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:29 -05:00
Sage Weil	d3a425faf8	os/bluestore: avoid waking up kv thread on deferred write completion In a simple HDD workload with queue depth of 1, we halve our throughput because the kv thread does a full commit twice per IO: once for the initial commit, and then again to clean up the deferred write record. The second wakeup is unnecessary; we can clean it up on the next commit. We do need to do this wakeup in a few cases, though, when draining the OpSequencers: (1) on replay during startup, and (2) on shutdown in _osr_drain_all(). Send everything through _osr_drain_all() for simplicity. This doubles HDD qd=1 IOPS from ~50 to ~100 on my 7200 rpm test device (rados bench 30 write -b 4096 -t 1). Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	78b9cea09f	os/bluestore: move many initializations into header This is less fragile, especially with 2 constructors. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	6db031be4d	os/bluestore: restructure deferred write queue First, eliminate the work queue--it's useless. We are dispatching aio and should not block. And if a single thread isn't sufficient to do it, it probably means we should be parallelizing kv_sync_thread too (which is our only caller that matters). Repurpose the old osr-list -> txc-list-per-osr queue structure to manage the queuing. For any given osr, dispatch one batch of aios at a time, taking care to collapse any overwrites so that the latest write wins. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	5fafd1fcc2	os/bluestore: fix OpSequencer/Sequencer lifecycle Make osr_set refcounts so that it can tolerate a Sequencer destruction racing with flush or a Sequencer that outlives the BlueStore instance itself. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	3dc82d57e4	os/bluestore: move _osr_reap_done Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	986776d30d	os/bluestore: reimplement/rename _sync -> _flush_all The old implementation is racy and doesn't actually work. Instead, rely on a list of all OpSequencers and drain them all. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	3cf2b0f9b7	os/bluestore: keep all OpSequencers registered Maintain the set of all live OpSequencers. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	9b28d615e7	os/bluestore: keep onode refs for lifetime of obc This ensures that we don't trim an onode from the cache while it has a txc that is still in flight. Which in turn ensures that if we try to read the object, we will have any writing buffers available. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	4aa44d2b49	os/bluestore: make OnodeSpace onode_map private Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	2d0d375809	os/bluestore: make Sequencer::flush() more efficient BlueStore collection methods only need preceding transactions to be applied to the kv db; they do not need to be committed. Note that this is only needed for collection listings; all other read operations are immediately safe after queue_transactions(). Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:28 -05:00
Sage Weil	9ee0c842f2	os/bluestore: add OpSequencer::drain() Currently this is the same as flush, but more precisely it is an internal method that means all txc's must complete. Update _wal_apply() to use it instead of flush(), which is part of the public Sequencer interface. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:27 -05:00
Sage Weil	5cb5a902d2	os/bluestore: revert throttle perfcounters This reverts `3e40595f3c` The individual throttles have their own set of perfcounters; no need to duplicate them here. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:27 -05:00
Sage Weil	78df9b3e4d	os/bluestore: release deferred throttle on io finish, before cleanup The throttle is really about limiting deferred IO; we do not need to actually remove the deferred record from the kv db before queueing more. (In fact, the txc that queues more will do the cleanup.) Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:27 -05:00
Sage Weil	eff1e83145	os/bluestore: separate _txc_finish_kv into _txc_{applied,committed}_kv We can unblock flush()ing threads as soon as we have applied to the kv db, while the callbacks must wait until we have committed. Move methods around a bit to better match the execution order. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:27 -05:00
Sage Weil	3238162bd9	os/bluestore: make flush() only wait for kv commit The only remaining flush() users only need to see previous txc's applied to the kv db (e.g., _omap_clear needs to see the records to delete them). Signed-off-by: Sage Weil <sage@redhat.com> # Conflicts: # src/os/bluestore/BlueStore.h	2017-03-21 13:56:27 -05:00
Sage Weil	a56cd6ba38	os/bluestore: no need to Onode::flush() on truncate We do not release extents until after any deferred IO, so this flush() is unnecessary. Signed-off-by: Sage Weil <sage@redhat.com> # Conflicts: # src/os/bluestore/BlueStore.cc	2017-03-21 13:56:27 -05:00
Sage Weil	83e33a32fd	os/bluestore: no need to Onode::flush() in _do_read We now ensure that deferred writes are in cache until the txc retires, so there is no need to wait here. Signed-off-by: Sage Weil <sage@redhat.com>	2017-03-21 13:56:27 -05:00

... 2 3 4 5 6 ...

70015 Commits