Also cleanup PendingReleasenotes to an empty file so that only newer
changes are tracked, adding the relevant section back to
RC1 where relevant. Moving all the RC1 announcements back to RC2, when
we go to 12.2.0 we'll collapse all of these back to the release
announcments
Signed-off-by: Abhishek Lekshmanan <alekshmanan@suse.com>
This has a few problems:
1- It does not do it's analysis over CRUSH rule roots/classes, which
means that an innocent user of classes will see skewed usage (bc hdds are
more full than ssds, say)
2- It does not take degraded clusters into account, which means the warning
will appear when a fresh OSD is added.
See http://tracker.ceph.com/issues/20730
Signed-off-by: Sage Weil <sage@redhat.com>
rgw: use a namespace for rgw reshard pool for upgrades as well
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
This is used to dump extra weirdness to the health detail structured
output, but we are about to remove all of that in luminous.
Signed-off-by: Sage Weil <sage@redhat.com>
It's still sort of awkward to prefix these commands
with "mgr tell" but this makes them at least
somewhat accessible to the average user.
Signed-off-by: John Spray <john.spray@redhat.com>
Make an incompat change here with a release note since
this only affects pool creation, a rare event, and folks
who have customized their configs (also rare).
Keep it simple: a config sets the default rule, or else we pick
the first TYPE_REPLICATED pool in the crush map.
Signed-off-by: Sage Weil <sage@redhat.com>
This is undocumented and untested -- it was something
written before and superceded by the "recover_dentries"
subcommand. While we're at it, also
s/scavenge_dentries/recover_dentries/
internally.
Signed-off-by: John Spray <john.spray@redhat.com>
- rename the option (max -> warn)
- add an err_..._ratio multiplier
- switch to HEALTH_ERR once requests are blocked long enough
- make the error ratio high (default is 32*128s -> about an hour) so that
we don't trigger on a heavily loaded cluster.
Signed-off-by: Sage Weil <sage@redhat.com>
With bluestore, making the smallest write match min_alloc_size avoids
write amplification. With EC pools this is the stripe unit, or
stripe_width / num_data_chunks. Rather than requiring people to divide
by k to get the smallest ec write, allow it to be specified directly
via stripe_unit. Store it in the ec profile so changing a monitor
config option isn't necessary to set it.
This is particularly important for ec overwrites since they allow random i/o
which should match bluestore's checksum granularity (aka min_alloc_size).
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
This had been broken for some time, as since the new
JournalStream stuff, zero padding was no longer a valid
encoding.
Fixes: http://tracker.ceph.com/issues/19691
Signed-off-by: John Spray <john.spray@redhat.com>
In practice this tends to get bubbled up the stack as an error on
the caller, and they usually do not handle it properly. For example,
with librbd, this turns into EIO and break the VM.
Instead, this will manifest as a hung op on the client. That is
also not ideal, but given that the root cause here is generally a
bug, it's not clear what else would be better.
We already log an error in the cluster log, so teuthology runs will
continue to fail.
Signed-off-by: Sage Weil <sage@redhat.com>
Expose public methods that include a new output argument to indicate
whether there are more keys to fetch or not.
Mark the old interfaces deprecated.
Signed-off-by: Sage Weil <sage@redhat.com>
This change does prioritize backfill of PGs which don't
have min_size active copies. Such PGs would cause IO stalls
for clients and would increase throttlers usage.
This change also fixes few subtlle out-of-bounds bugs.
Signed-off-by: Bartłomiej Święcki <bartlomiej.swiecki@corp.ovh.com>
Tell users they need to set this to true before Monitors will allow
pools to be removed.
Also update the Pending Release Notes so that users can find this change
there.
This was changed with commit 5d7f4ea
Signed-off-by: Wido den Hollander <wido@42on.com>
osd: set server-side limits on omap get operations
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
If we have an OSD with a weight that's not 1.0 and mark it out,
we should restore the same weight when we mark it back in. We
already do this when an OSD is automatically marked out, just
not when it is explicitly marked out.
Signed-off-by: Sage Weil <sage@redhat.com>
This assumes that if the mon does not explicitly specify
the kv type that it is leveldb. No prior version of
Ceph has had non-experimental rocksdb, so this is
relatively safe. It's also necessary because the
default is now 'rocksdb' and we shouldn't assume those
old mons are rocksdb.
This will break for users to explicitly specified
rocksdb for the mon despite it being experimental.
Signed-off-by: Sage Weil <sage@redhat.com>
Exclusive lock, object map, fast-diff, and deep-flatten have been
enabled by default for all new images.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The rbd cli will warn about the deprecation when attempting to create
image format 1 images. librbd will log an error message when opening
a format 1 RBD image.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
the symbols of buffer::list::iterator_impl<> were wrongly exposed
in previous infernalis release, and the clients linked against
librados are very likely using them. so we need to document this
change.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Allow librados users to opt to receive ENOSPC or EDQUOT when they submit
an operation against a full cluster. This should only be used if the
librados app can handle those errors gracefully (librbd, for example,
cannot).
Also note that this allows savvy librados users to send delete operations;
they will get either a success or EDQUOT, depending on whether the
operation results in a net drop in space utilization.
Signed-off-by: Sage Weil <sage@redhat.com>
'ceph mon_metadata' was added still during this dev cycle, so there is
no need to deprecate it first.
Fixes: #11545
Signed-off-by: Joao Eduardo Luis <joao@suse.de>
Use a clean name for keyvaluestore (no -dev suffix), but mark as
experimental to ensure users know what they are signing up for.
Signed-off-by: Sage Weil <sage@redhat.com>
Recent versions of Python contain a change to thread shutdown that
causes ceph to hang on exit; see http://bugs.python.org/issue21963.
As it turns out, this is relatively easy to avoid by not spawning
threads on exit, as Rados.__del__() will certainly do by calling
shutdown(); I suspect, but haven't proven, that the problem is
that shutdown() tries to start() a threading.Thread() that never
makes it all the way back to signal start().
Also add a PendingReleaseNote and extra doc comments to clarify.
Fixes: #8797
Signed-off-by: Dan Mick <dan.mick@redhat.com>
Add release note
New librados interface
New pg_nls_response_t over the wire protocol
Ignore internal namespace (.ceph_internal)
Enhance ObjListCtx to keep independent IoCtxImpl so nspace won't change out from under listing code
Add ListObject with private implementation ListObjectImpl to return from iterator
Add EINVAL error for old librados interface when LIBRADOS_ALL_NSPACES set
Add throw to old librados c++ interface when all_nspaces set
Fixes: #9031
Signed-off-by: David Zafman <dzafman@redhat.com>
OSDs will now rely on 'leveldb_*' config options. We do keep however
leveldb's log enabled for OSDs by passing 'leveldb_log=""' as a default
argument to global_init() on ceph_osd.cc -- however, users will be able
to override this at their own discretion.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
'leveldb_*' options are currently used both by the monitor and the osd.
However, the monitor has quite different requirements from those of the
osds.
We need to specify some default values that must squash the defaults we
have for 'leveldb_*' options, while allowing users to overriding them too.
We take this not-exactly-ideal-but-still-good-enough approach of
defining the monitor-specific defaults in the 'default arguments' to
global_init(), thus allowing the user's options to take precedence over
whatever we define.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
From this point onward, users should use leveldb's options and add them
to the appropriate config sections of their configuration file.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
A 'status' or 'health' request will return a HEALTH_WARN whenever the
monitor handling the request has the option set to zero.
Fixes: 7784
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
The FileStore's leveldb currently uses libleveldb's defaults for cache and
write buffer size, which are both 4 MB. Increase the cache size to 128MB and
the write buffer to 8MB.
Tested-by: Dmitry Smirnov <onlyjob@member.fsf.org>
Signed-off-by: Sage Weil <sage@inktank.com>
Reading past the end of a pointer returned by string.data() in c++98
is undefined. While we're fixing this, also allow comparison of xattrs
containing null bytes.
Fixes: #7250
Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Just before sending an op, prepare_mutate_op() is called, creating a
new Op. prepare_read_op() already copied over all the out-params
correctly, but for write operations the individual op return value
pointers were not copied, so they would not be filled in. With this
fixed, librados users can get the per-op return codes again.
Partially fixes: #6483
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Require that all OSDs support TMAP2OMAP before starting the MDS. This
avoids doing some work and then crashing with EOPNOTSUPP, and gives us
a more informative message in the logs.
Signed-off-by: Sage Weil <sage@inktank.com>
rbd_list will return -ENOENT when no rbd_directory object
exists. Handle this in the cli tool and interpret it as success with
an empty list.
Add this to the release notes since it changes command line behavior.
Fixes: #6693
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
--osd-pool-default-crush-replicated-ruleset replaces
--osd-pool-default-crush-rule
If --osd-pool-default-crush-rule is set it takes precedence over
--osd-pool-default-crush-replicated-ruleset and a deprecation warning is
displayed.
The CrushWrapper::get_osd_pool_default_crush_replicated_ruleset helper is
used to implement this behaviour.
Signed-off-by: Loic Dachary <loic@dachary.org>