v12.2.13 Luminous
=================
This is the 13th bug fix release of the Luminous v12.2.x long term stable
release series. We recommend that all users upgrade to this release.
Notable Changes
---------------
* Ceph now packages python bindings for python3.6 instead of
python3.4, because EPEL7 recently switched from python3.4 to
python3.6 as the native python3. see the `announcement _`
for more details on the background of this change.
* We now have telemetry support via a ceph-mgr module. The telemetry module is
absolutely on an opt-in basis, and is meant to collect generic cluster
information and push it to a central endpoint. By default, we're pushing it
to a project endpoint at https://telemetry.ceph.com/report, but this is
customizable using by setting the 'url' config option with::
ceph telemetry config-set url ''
You will have to opt-in on sharing your information with::
ceph telemetry on
You can view exactly what information will be reported first with::
ceph telemetry show
Should you opt-in, your information will be licensed under the
Community Data License Agreement - Sharing - Version 1.0, which you can
read at https://cdla.io/sharing-1-0/
The telemetry module reports information about CephFS file systems,
including:
- how many MDS daemons (in total and per file system)
- which features are (or have been) enabled
- how many data pools
- approximate file system age (year + month of creation)
- how much metadata is being cached per file system
As well as:
- whether IPv4 or IPv6 addresses are used for the monitors
- whether RADOS cache tiering is enabled (and which mode)
- whether pools are replicated or erasure coded, and
which erasure code profile plugin and parameters are in use
- how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use
- aggregate stats about the CRUSH map, like which algorithms are used, how
big buckets are, how many rules are defined, and what tunables are in use
* A health warning is now generated if the average osd heartbeat ping
time exceeds a configurable threshold for any of the intervals
computed. The OSD computes 1 minute, 5 minute and 15 minute
intervals with average, minimum and maximum values. New configuration
option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of
``osd_heartbeat_grace`` to determine the threshold. A value of zero
disables the warning. New configuration option
``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the
computed value, causes a warning
when OSD heartbeat pings take longer than the specified amount.
New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will
list all connections with a ping time longer than the specified threshold or
value determined by the config options, for the average for any of the 3 intervals.
New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will
do the same but only including heartbeats initiated by the specified OSD.
* The configuration value ``osd_calc_pg_upmaps_max_stddev`` used for upmap
balancing has been removed. Instead use the mgr balancer config
``upmap_max_deviation`` which now is an integer number of PGs of deviation
from the target PGs per OSD. This can be set with a command like
``ceph config set mgr mgr/balancer/upmap_max_deviation 2``. The default
``upmap_max_deviation`` is 1. There are situations where crush rules
would not allow a pool to ever have completely balanced PGs. For example, if
crush requires 1 replica on each of 3 racks, but there are fewer OSDs in 1 of
the racks. In those cases, the configuration value can be increased.
Changelog
---------
* bluestore: >2GB bluefs writes (`pr#28965 `_, kungf, Kefu Chai, Sage Weil)
* bluestore: Inspect allocations (`pr#29539 `_, Neha Ojha, Adam Kupczyk)
* bluestore: [AFTER: #28644] luminous: os/bluestore: default to bitmap allocator for bluestore/bluefs (`pr#28972 `_, Igor Fedotov)
* bluestore: add bluestore_ignore_data_csum option (`pr#26247 `_, Sage Weil)
* bluestore: apply shared_alloc_size to shared device with log level change (`pr#29910 `_, Vikhyat Umrao, Josh Durgin, Igor Fedotov, Sage Weil)
* bluestore: avoid length overflow in extents returned by Stupid Alloc (`issue#40703 `_, `pr#29025 `_, Igor Fedotov)
* bluestore: call fault_range properly prior to looking for blob to … (`pr#27529 `_, Igor Fedotov)
* bluestore: common/options: Set concurrent bluestore rocksdb compactions to 2 (`pr#30149 `_, Mark Nelson)
* bluestore: dump before "no spanning blob id" abort (`pr#28030 `_, Igor Fedotov)
* bluestore: fix assertion in StupidAllocator::get_fragmentation (`pr#32523 `_, Lei Liu, Igor Fedotov)
* bluestore: fix duplicate allocations in bmap allocator (`issue#40080 `_, `pr#28644 `_, Igor Fedotov)
* bluestore: fix improper setting of STATE_KV_SUBMITTED (`pr#31674 `_, Igor Fedotov)
* bluestore: fix length overflow (`issue#39247 `_, `pr#27365 `_, Jianpeng Ma)
* bluestore: fix out-of-bound access in bmap allocator (`pr#27739 `_, Igor Fedotov)
* bluestore: load OSD all compression settings unconditionally (`issue#40480 `_, `pr#28895 `_, Igor Fedotov)
* bluestore: os/bluestore/BitmapFreelistManager: disable bluestore_debug_freelist (`pr#27459 `_, Sage Weil)
* bluestore: os/bluestore_tool: bluefs-bdev-expand: indicate bypassed for main dev (`pr#27912 `_, Igor Fedotov)
* bluestore: test/store_test: fix/workaround for BlobReuseOnOverwriteUT and garbageCollection (`pr#27056 `_, Igor Fedotov)
* build/ops: admin/build-doc: use python3 (`pr#30665 `_, Kefu Chai, Jason Dillaman)
* build/ops: admin/build-doc: use python3 (follow-on fix) (`pr#30690 `_, Nathan Cutler)
* build/ops: backport miscellaneous install-deps.sh and ceph.spec.in fixes from master (`issue#13997 `_, `issue#37707 `_, `issue#18163 `_, `issue#22998 `_, `pr#30722 `_, Yao Guotao, Tomasz Setkowski, Andrey Parfenov, Alfredo Deza, Kefu Chai, Nathan Cutler, Yunchuan Wen, Zack Cerza, Brad Hubbard, Loic Dachary)
* build/ops: ceph-test RPM not built for SUSE (`pr#29736 `_, Nathan Cutler)
* build/ops: cmake: pass -march to detect compiler support of arm64 crc/crypto (`issue#36080 `_, `issue#17516 `_, `pr#24169 `_, Kefu Chai)
* build/ops: do_cmake.sh: source not found (`issue#40004 `_, `issue#39981 `_, `pr#28216 `_, Nathan Cutler)
* build/ops: install-deps.sh: Remove CR repo (`issue#13997 `_, `pr#30129 `_, Brad Hubbard, Alfredo Deza)
* build/ops: python-cephfs should depend on python-rados (`issue#37612 `_, `issue#24918 `_, `pr#27950 `_, Kefu Chai)
* build/ops: python3-cephfs should provide python36-cephfs (`pr#30981 `_, Kefu Chai)
* build/ops: rpm: Build with lttng on openSUSE (`issue#39332 `_, `pr#27618 `_, Nathan Cutler)
* build/ops: rpm: explicitly declare python-tox build dependency (`pr#31934 `_, Nathan Cutler)
* ceph-volume: assume msgrV1 for all branches containing mimic (`pr#32796 `_, Jan Fajerski)
* ceph-volume: batch functional idempotency test fails since message is now on stderr (`pr#29791 `_, Jan Fajerski)
* ceph-volume: broken assertion errors after pytest changes (`pr#28929 `_, Alfredo Deza)
* ceph-volume: do not fail when trying to remove crypt mapper (`pr#30556 `_, Guillaume Abrioux)
* ceph-volume: does not recognize wal/db partitions created by ceph-disk (`pr#29462 `_, Jan Fajerski)
* ceph-volume: fix stderr failure to decode/encode when redirected (`pr#30299 `_, Alfredo Deza)
* ceph-volume: fix warnings raised by pytest (`pr#30677 `_, Rishabh Dave)
* ceph-volume: lvm list is O(n^2) (`pr#30094 `_, Rishabh Dave)
* ceph-volume: lvm.activate: Return an error if WAL/DB devices absent (`pr#29038 `_, David Casier)
* ceph-volume: lvm.zap fix cleanup for db partitions (`issue#40664 `_, `pr#30302 `_, Dominik Csapak)
* ceph-volume: missing string substitution when reporting mounts (`issue#40978 `_, `pr#29351 `_, Shyukri Shyukriev)
* ceph-volume: pre-install python-apt and its variants before test runs (`pr#30296 `_, Alfredo Deza)
* ceph-volume: prints errors to stdout with --format json (`issue#38548 `_, `pr#29508 `_, Jan Fajerski)
* ceph-volume: prints log messages to stdout (`pr#29603 `_, Jan Fajerski, Kefu Chai, Alfredo Deza)
* ceph-volume: set a lvm_size property on the fakedevice fixture (`pr#30331 `_, Andrew Schoen)
* ceph-volume: simple: when 'type' file is not present activate fails (`pr#29415 `_, Alfredo Deza)
* ceph-volume: tests add a sleep in tox for slow OSDs after booting (`pr#28927 `_, Alfredo Deza)
* ceph-volume: tests set the noninteractive flag for Debian (`pr#29901 `_, Alfredo Deza)
* ceph-volume: update testing playbook 'deploy.yml' (`pr#29075 `_, Andrew Schoen, Guillaume Abrioux)
* ceph-volume: use the Device.rotational property instead of sys_api (`pr#28519 `_, Andrew Schoen)
* ceph-volume: use the OSD identifier when reporting success (`pr#29771 `_, Alfredo Deza)
* ceph-volume: zap always skips block.db, leaves them around (`issue#40664 `_, `pr#30305 `_, Alfredo Deza)
* cephfs: client: _readdir_cache_cb() may use the readdir_cache already clear (`issue#41148 `_, `pr#30934 `_, huanwen ren)
* cephfs: client: ceph.dir.rctime xattr value incorrectly prefixes 09 to the nanoseconds component (`issue#40166 `_, `pr#28502 `_, David Disseldorp)
* cephfs: client: clean up error checking and return of _lookup_parent (`issue#40085 `_, `pr#28437 `_, Jeff Layton)
* cephfs: client: return -EIO when sync file which unsafe reqs have been dropped (`issue#40877 `_, `pr#30242 `_, simon gao)
* cephfs: client: unlink dentry for inode with llref=0 (`issue#40960 `_, `pr#29830 `_, Xiaoxi CHEN)
* cephfs: kclient: nofail option not supported (`pr#28436 `_, Kenneth Waegeman)
* cephfs: mds/server: check directory split after rename (`issue#39198 `_, `issue#38994 `_, `pr#27801 `_, Shen Hang)
* cephfs: mds: add command that config individual client session (`issue#40811 `_, `pr#31573 `_, "Yan, Zheng")
* cephfs: mds: add reference when setting Connection::priv to existing session (`pr#31049 `_, "Yan, Zheng")
* cephfs: mds: avoid trimming too many log segments after mds failover (`issue#40028 `_, `pr#28543 `_, simon gao)
* cephfs: mds: better output of 'ceph health detail' (`issue#39266 `_, `pr#27848 `_, Shen Hang)
* cephfs: mds: check dir fragment to split dir if mkdir makes it oversized (`pr#29829 `_, Erqi Chen)
* cephfs: mds: cleanup truncating inodes when standby replay mds trim log segments (`pr#31286 `_, "Yan, Zheng")
* cephfs: mds: dont print subtrees if they are too big or too many (`pr#27679 `_, Rishabh Dave)
* cephfs: mds: drop reconnect message from non-existent session (`issue#39191 `_, `issue#39026 `_, `pr#27737 `_, Shen Hang)
* cephfs: mds: fix corner case of replaying open sessions (`pr#28536 `_, "Yan, Zheng")
* cephfs: mds: initialize cap_revoke_eviction_timeout with conf (`issue#38844 `_, `issue#39208 `_, `pr#27840 `_, simon gao)
* cephfs: mds: msg weren't destroyed before handle_client_reconnect returned, if the reconnect msg was from non-existent session (`issue#40588 `_, `issue#40807 `_, `pr#29097 `_, Shen Hang)
* cephfs: mds: remove superfluous error in StrayManager::advance_delayed() (`issue#38679 `_, `pr#28432 `_, "Yan, Zheng")
* cephfs: mds: reset heartbeat inside big loop (`pr#28544 `_, "Yan, Zheng")
* cephfs: mds: there is an assertion when calling Beacon::shutdown() (`issue#38822 `_, `pr#28438 `_, huanwen ren)
* cephfs: mount: key parsing fail when doing a remount (`issue#40163 `_, `pr#29226 `_, Luis Henriques)
* cephfs: pybind/ceph_volume_client: remove ceph mds calls in favor of ceph fs calls (`issue#22038 `_, `issue#22524 `_, `pr#28445 `_, Patrick Donnelly, Ramana Raja)
* cephfs: qa/cephfs: relax min_caps_per_client check (`issue#38270 `_, `issue#38686 `_, `pr#27040 `_, "Yan, Zheng")
* cephfs: qa: misc cache drop fixes (`issue#38340 `_, `issue#38445 `_, `pr#27342 `_, Patrick Donnelly)
* common/config: hold lock while accessing mutable container (`pr#30345 `_, Jason Dillaman)
* common: Keyrings created by ceph auth get are not suitable for ceph auth import (`issue#40548 `_, `issue#22227 `_, `pr#28742 `_, Kefu Chai)
* common: common/ceph_context: avoid unnecessary wait during service thread shutdown (`pr#31020 `_, Jason Dillaman)
* common: common/options.cc: Lower the default value of osd_deep_scrub_large_omap_object_key_threshold (`pr#29175 `_, Neha Ojha)
* common: common/util: handle long lines in /proc/cpuinfo (`issue#38296 `_, `pr#32349 `_, Sage Weil)
* common: compressor/zstd: improvements (`pr#28647 `_, Adam C. Emerson, Sage Weil)
* common: data race in OutputDataSocket (`issue#40188 `_, `issue#40266 `_, `pr#29202 `_, Casey Bodley)
* core: ENOENT in collection_move_rename on EC backfill target (`issue#36739 `_, `issue#38880 `_, `pr#28110 `_, Neha Ojha)
* core: Health warnings on long network ping times (`issue#40586 `_, `issue#40640 `_, `pr#30230 `_, xie xingguo, David Zafman)
* core: Revert "crush: remove invalid upmap items" (`pr#32019 `_, David Zafman)
* core: backport recent messenger fixes (`issue#39243 `_, `issue#38242 `_, `issue#39448 `_, `pr#27583 `_, xie xingguo, Jason Dillaman)
* core: ceph tell osd.xx bench help : gives wrong help (`issue#39006 `_, `issue#39373 `_, `pr#28112 `_, Neha Ojha)
* core: ceph-objectstore-tool: rename dump-import to dump-export (`issue#39343 `_, `issue#39284 `_, `pr#27636 `_, David Zafman)
* core: crc cache should be invalidated when posting preallocated rx buffers (`issue#38436 `_, `pr#29248 `_, Ilya Dryomov)
* core: crush/CrushWrapper: ensure crush_choose_arg_map.size == max_buckets (`issue#38664 `_, `issue#38719 `_, `pr#27085 `_, Sage Weil)
* core: crush: remove invalid upmap items (`pr#31234 `_, huangjun)
* core: lazy omap stat collection (`pr#29190 `_, Brad Hubbard)
* core: mds,osd,mon,msg: use intrusive_ptr for holding Connection::priv (`issue#20924 `_, `pr#29859 `_, Shinobu Kinjo, Kefu Chai, Jianpeng Ma, Samuel Just)
* core: mgr/localpool: pg_num is an int arg to 'osd pool create' (`pr#30446 `_, Sage Weil)
* core: mgr/prometheus: assign a value to osd_dev_node when obj_store is not filestore or bluestore (`pr#31587 `_, jiahuizeng)
* core: mon, osd: parallel clean_pg_upmaps (`issue#40229 `_, `issue#40104 `_, `pr#28594 `_, xie xingguo)
* core: mon,osd: limit MOSDMap messages by size as well as map count (`issue#38276 `_, `pr#28640 `_, Sage Weil)
* core: mon/OSDMonitor: trim not-longer-exist failure reporters (`pr#30905 `_, NancySu05)
* core: mon: Error message displayed when mon_osd_max_split_count would be exceeded is not as user-friendly as it could be (`issue#39353 `_, `issue#39563 `_, `pr#27908 `_, Nathan Cutler, Brad Hubbard)
* core: mon: ensure prepare_failure() marks no_reply on op (`pr#30519 `_, Joao Eduardo Luis)
* core: mon: mon/AuthMonitor: don't validate fs caps on authorize (`pr#28666 `_, Joao Eduardo Luis)
* core: msg: output peer address when detecting bad CRCs (`issue#39367 `_, `pr#27858 `_, Greg Farnum)
* core: osd/OSDMap.cc: don't output over/underfull messages to lderr (`pr#31598 `_, Neha Ojha)
* core: osd/OSDMap: Replace get_out_osds with get_out_existing_osds (`issue#39154 `_, `issue#39420 `_, `pr#27728 `_, Brad Hubbard)
* core: osd/OSDMap: do not trust partially simplified pg_upmap_item (`pr#30926 `_, xie xingguo)
* core: osd/PG: Add PG to large omap log message (`pr#30922 `_, Brad Hubbard)
* core: osd/PG: discover missing objects when an OSD peers and PG is degraded (`pr#27751 `_, Jonas Jelten)
* core: osd/PGLog: preserve original_crt to check rollbackability (`issue#38894 `_, `issue#38905 `_, `issue#36739 `_, `issue#39042 `_, `pr#27715 `_, Neha Ojha)
* core: osd/PeeringState: recover_got - add special handler for empty log (`pr#30896 `_, xie xingguo)
* core: osd/PrimaryLogPG: skip obcs that don't exist during backfill scan_range (`pr#31030 `_, Sage Weil)
* core: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().object (`issue#39537 `_, `issue#26958 `_, `pr#28989 `_, xie xingguo)
* core: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard)) (`pr#31855 `_, Neha Ojha, xie xingguo)
* core: osd/bluestore: Actually wait until completion in write_sync (`pr#29564 `_, Vitaliy Filippov)
* core: osd: Better error message when OSD count is less than osd_pool_default_size (`issue#38617 `_, `issue#38585 `_, `pr#30298 `_, Vikhyat Umrao, Kefu Chai, Sage Weil, zjh)
* core: osd: Diagnostic logging for upmap cleaning (`pr#32666 `_, David Zafman)
* core: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid) || (it_objects != pg_log.get_log().objects.end() && it_objects->second->op == pg_log_entry_t::LOST_REVERT)) in PrimaryLogPG::get_object_context() (`issue#39218 `_, `issue#38931 `_, `issue#38784 `_, `pr#27878 `_, xie xingguo)
* core: osd: Fix for compatibility of encode/decode of osd_stat_t (`pr#31277 `_, David Zafman)
* core: osd: Include dups in copy_after() and copy_up_to() (`issue#39304 `_, `pr#28185 `_, David Zafman)
* core: osd: Remove unused osdmap flags full, nearfull from output (`issue#22350 `_, `pr#30902 `_, Gu Zhongyan, David Zafman)
* core: osd: add hdd, ssd and hybrid variants for osd_snap_trim_sleep (`pr#31857 `_, Neha Ojha)
* core: osd: clear PG_STATE_CLEAN when repair object (`pr#30271 `_, Zengran Zhang)
* core: osd: fix out of order caused by letting old msg from down osd be processed (`pr#31293 `_, Mingxin Liu)
* core: osd: merge replica log on primary need according to replica log's crt (`pr#30917 `_, Zengran Zhang)
* core: osd: refuse to start if we're > N+2 from recorded require_osd_release (`issue#38076 `_, `pr#31858 `_, Sage Weil)
* core: osd: report omap/data/metadata usage (`issue#40638 `_, `pr#28851 `_, Sage Weil)
* core: osd: rollforward may need to mark pglog dirty (`issue#40403 `_, `pr#31036 `_, Zengran Zhang)
* core: osd: scrub error on big objects; make bluestore refuse to start on big objects (`pr#30785 `_, Sage Weil, David Zafman)
* core: osd: shutdown recovery_request_timer earlier (`issue#39204 `_, `pr#27810 `_, Zengran Zhang)
* core: pybind: Rados.get_fsid() returning bytes in python3 (`issue#38873 `_, `issue#38381 `_, `pr#27674 `_, Jason Dillaman)
* core: should report EINVAL in ErasureCode::parse() if m<=0 (`issue#38682 `_, `issue#38750 `_, `pr#28111 `_, Sage Weil)
* doc: Minor rados related documentation fixes (`issue#38896 `_, `issue#38902 `_, `pr#27185 `_, David Zafman)
* doc: Missing Documentation for radosgw-admin reshard commands (man pages) (`issue#40092 `_, `issue#21617 `_, `pr#28329 `_, Orit Wasserman)
* doc: Update layout.rst (`pr#26381 `_, ypdai)
* doc: describe metadata_heap cleanup (`issue#18174 `_, `pr#30071 `_, Dan van der Ster)
* doc: doc/rbd: s/guess/xml/ for codeblock lexer (`pr#31091 `_, Kefu Chai)
* doc: doc/rgw: document CreateBucketConfiguration for s3 PUT Bucket api (`issue#39597 `_, `pr#31647 `_, Casey Bodley)
* doc: doc/rgw: document use of 'realm pull' instead of 'period pull' (`issue#39655 `_, `pr#30132 `_, Casey Bodley)
* doc: fixed --read-only argument value in multisite doc (`pr#31655 `_, Chenjiong Deng)
* doc: osd_recovery_priority is not documented (but osd_recovery_op_priority is) (`pr#27471 `_, David Zafman)
* doc: update bluestore cache settings and clarify data fraction (`issue#39522 `_, `pr#31257 `_, Jan Fajerski)
* doc: wrong datatype describing crush_rule (`pr#32267 `_, Kefu Chai)
* doc: wrong value of usage log default in logging section (`issue#37892 `_, `issue#37856 `_, `pr#29015 `_, Abhishek Lekshmanan)
* mgr: Change default upmap_max_deviation to 5 (`pr#32586 `_, David Zafman)
* mgr: Release GIL and Balancer fixes (`pr#31992 `_, Kefu Chai, Noah Watkins, David Zafman)
* mgr: mgr/BaseMgrModule: drop GIL in set_config (`issue#39040 `_, `issue#36766 `_, `pr#27808 `_, John Spray, xie xingguo, Sage Weil)
* mgr: mgr/balancer: blame if upmap won't actually work (`issue#38781 `_, `pr#26498