Check total pg count for the cluster vs osd count and max pgs per osd
before allowing pool creation, pg_num change, or pool size change.
"in" OSDs are the ones we distribute data too, so this should be the right
count to use. (Whether they happen to be up or down at the moment is
incidental.)
If the user really wants to create the pool, they can change the
configurable limit.
Signed-off-by: Sage Weil <sage@redhat.com>
This introduces two config parameters:
mds_cache_memory_limit: Sets the soft maximum of the cache to the given
byte count. (Like mds_cache_size, this doesn't actually limit the maximum
size of the cache. It just dictates the steady-state size.)
mds_cache_reservation: This replaces mds_health_cache_threshold everywhere
except the Beacon heartbeat sent to the mons. The idea here is to specify a
reservation of memory (5% by default) for operations and the MDS tries to
always maintain that reservation. So, the MDS will recall caps from clients
when it begins dipping into its reservation of memory.
mds_cache_size still limits the cache by Inode count but is now by-default 0
(i.e. unlimited). The new preferred way of specifying cache limits is by memory
size. The default is 1GB.
Fixes: http://tracker.ceph.com/issues/20594
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1464976
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Reordered the RC releases sections back to their respective components,
added a ceph-mon section, added links to documentation wherever
possible, and a few forgotten RGW announcements. Also cleared up the
PendingReleaseNotes upto this point
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
Also cleanup PendingReleasenotes to an empty file so that only newer
changes are tracked, adding the relevant section back to
RC1 where relevant. Moving all the RC1 announcements back to RC2, when
we go to 12.2.0 we'll collapse all of these back to the release
announcments
Signed-off-by: Abhishek Lekshmanan <alekshmanan@suse.com>
This has a few problems:
1- It does not do it's analysis over CRUSH rule roots/classes, which
means that an innocent user of classes will see skewed usage (bc hdds are
more full than ssds, say)
2- It does not take degraded clusters into account, which means the warning
will appear when a fresh OSD is added.
See http://tracker.ceph.com/issues/20730
Signed-off-by: Sage Weil <sage@redhat.com>
rgw: use a namespace for rgw reshard pool for upgrades as well
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
This is used to dump extra weirdness to the health detail structured
output, but we are about to remove all of that in luminous.
Signed-off-by: Sage Weil <sage@redhat.com>
It's still sort of awkward to prefix these commands
with "mgr tell" but this makes them at least
somewhat accessible to the average user.
Signed-off-by: John Spray <john.spray@redhat.com>
Make an incompat change here with a release note since
this only affects pool creation, a rare event, and folks
who have customized their configs (also rare).
Keep it simple: a config sets the default rule, or else we pick
the first TYPE_REPLICATED pool in the crush map.
Signed-off-by: Sage Weil <sage@redhat.com>
This is undocumented and untested -- it was something
written before and superceded by the "recover_dentries"
subcommand. While we're at it, also
s/scavenge_dentries/recover_dentries/
internally.
Signed-off-by: John Spray <john.spray@redhat.com>
- rename the option (max -> warn)
- add an err_..._ratio multiplier
- switch to HEALTH_ERR once requests are blocked long enough
- make the error ratio high (default is 32*128s -> about an hour) so that
we don't trigger on a heavily loaded cluster.
Signed-off-by: Sage Weil <sage@redhat.com>
With bluestore, making the smallest write match min_alloc_size avoids
write amplification. With EC pools this is the stripe unit, or
stripe_width / num_data_chunks. Rather than requiring people to divide
by k to get the smallest ec write, allow it to be specified directly
via stripe_unit. Store it in the ec profile so changing a monitor
config option isn't necessary to set it.
This is particularly important for ec overwrites since they allow random i/o
which should match bluestore's checksum granularity (aka min_alloc_size).
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This had been broken for some time, as since the new
JournalStream stuff, zero padding was no longer a valid
encoding.
Fixes: http://tracker.ceph.com/issues/19691
Signed-off-by: John Spray <john.spray@redhat.com>