Add option to prefer a WAL write if the write is below a size threshold,
even if we could avoid it. This lets you trade some write-amp (by
journaling data to rocksdb) for latency in cases where the WAL device is
much faster than the main device.
This affects:
- writes to new extents locations below min_alloc_size
- writes to unallocated space below min_alloc_size
- "big" writes above min_alloc_size that are below the prefer_wal_size
threshold.
Note that it's applied to individual blobs, not the entirety of the write,
so if your have a larger write torn into two pieces/blobs that are below
the threshold then they will both go through the wal.
Set different defaults for HDD and SSD, since this makes more sense for HDD
where seeks are expensive.
Add some test cases to exercise the option.
Signed-off-by: Sage Weil <sage@redhat.com>
Our condition for respecting the FULL flag is complex, and involves
the WRITE | RWORDERED flags vs the FULL_FORCE | FULL_TRY flags. Previously,
we could block a read bc of RWORDRED but not resend it later.
Fix by capturing the complex condition in a respects_full() bool and using
it both for the blocking-on-send and resending-on-possibly-notfull-later
checks.
Fixes: http://tracker.ceph.com/issues/19133
Signed-off-by: Sage Weil <sage@redhat.com>
mon: remove the redudant jugement in paxosservice is_writeable function
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reported by Gui Hecheng<guimark@126.com>. This change is a
variation on proposed fix by Dan Gryniewicz<dang@redhat.com>
to take root_fh.state.dev as fs_inst for new handles.
Fixes: http://tracker.ceph.com/issues/19214
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
At the end of start_rgw() we wait till establishing HTTP connections
with RadosGW become possible. However, if RadosGW uses the FastCGI,
the condition can't be fulfilled without spawning HTTP server first.
Signed-off-by: Radoslaw Zarzynski <rzarzynski@mirantis.com>
If the parent is in the same pool and has the journaling feature enabled
we can assume the mirroring will eventually be enabled for it.
Fixes: http://tracker.ceph.com/issues/19130
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
test/librbd: move tests using non-public api to internal
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Mykola Golub <mgolub@mirantis.com>
Note that this tells us how many OSDs are full or nearfull; it
does not include detailed warnings telling you exactly what the
utilization is because we don't have the full osd_stat_t
available. We leave it to ceph-mgr to generate those health
messages.
Signed-off-by: Sage Weil <sage@redhat.com>
For luminous, set cluster flags based on osd flags. Until
require_luminous is set, stick with the old pgmap-based behavior.
Move the new check to encode_pending so that the cluster flag is
set in the same epoch that the osd state(s) change.
Signed-off-by: Sage Weil <sage@redhat.com>
This ensures that we don't have a down osd that is marked full
go up, then realize it's not actually full, and then clear its
full flag. That would result in a cluster full blip that isn't
needed. This can easily happen if the full_ratio in the osdmap is
increased while the OSD is down.
Signed-off-by: Sage Weil <sage@redhat.com>
First, eliminate the useless nearfull failsafe--all it did was
generate a log message, which we can do based on the OSDMap
states.
Add some new helpers.
Unify the cluster nearfull/full vs failsafe states so that
failsafe is a "really" full state that is more severe than
full, so we have NONE, NEARFULL, FULL, FAILSAFE.
Pull the full/nearfull ratios out of the OSDMap (remember that
we require luminous mons, so these will be initialized).
Signed-off-by: Sage Weil <sage@redhat.com>
If the MDS has no rank then its whoami field would be printed as:
{"cluster_fsid":"4c1bae66-03fb-4b9a-bd88-108636d29758","whoami":18446744073709551615,"id":54239,"want_state":"up:boot","state":"???","mdsmap_epoch":22,"osdmap_epoch":0,"osdmap_epoch_barrier":0}
Fixes: http://tracker.ceph.com/issues/19201
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>