Commit Graph

128154 Commits

Author SHA1 Message Date
Sebastian Wagner
8de88a1d0a
mgr/cephadm/inventory: remove unused filter_by_label
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-29 11:26:37 +01:00
Sebastian Wagner
c3c4beb61c
Merge pull request #44011 from adk3798/repr-device
python-common: add string representation for Device and DeviceSelection classes

Reviewed-by: Michael Fritch <mfritch@suse.com>
2021-11-29 09:50:28 +01:00
Mykola Golub
64054795de
Merge pull request #44114 from orozery/librbd-memory-leaks
librbd: fix various memory leaks

Reviewed-by: Mykola Golub <mgolub@suse.com>
2021-11-29 09:36:08 +02:00
Samuel Just
c4c324c2e0
Merge pull request #43530 from myoungwon/wip-seastore-nvme-device
seastore: add nvme commands to nvme device class

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-11-28 20:21:39 -08:00
Samuel Just
30ad010566
Merge pull request #44068 from rzarzynski/wip-crimson-weakref-in-sharedlru
crimson/common: don't assume pointer-from-SharedLRU can't outlive it.

Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
2021-11-28 17:59:54 -08:00
Samuel Just
e6217f189a
Merge pull request #44110 from rzarzynski/wip-crimson-alienstore-syncumountread
crimson/os: fix a shutdown-related race condition in AlienStore.

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-11-28 16:36:47 -08:00
Samuel Just
81d7403a7a
Merge pull request #43481 from myoungwon/wip-dedup-tool-repair
tool: add repair command to ceph-dedup-tool

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-11-28 16:10:46 -08:00
Or Ozeri
5de8791da7 librbd/crypto: remove unused member from ShutDownCryptoRequest
m_crypto is not used - remove it.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
23831579b4 test/librbd: fix memory leak in TestMockShutDownCryptoRequest
fix memory leak in TestMockShutDownCryptoRequest.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
23bb3e458c test/librbd: fix memory leak in TestMockCryptoLoadRequest
fix memory leak in TestMockCryptoLoadRequest.CryptoAlreadyLoaded

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
bb0ccb3cc4 test/librbd: fix memory leak in TestMockCryptoCryptoObjectDispatch
fix memory leak in TestMockCryptoCryptoObjectDispatch.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
9992bbaa53 librbd/crypto: fix memory leak in openssl/DataCryptor
Re-initializing the same datacryptor, causes a memory leak of the old encryption key.
This commit fixes this issue.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
044280dcbe librbd/crypto: fix memory leak in ShutDownCryptoRequest
If crypto object dispatch does not exist, a context pointer is leaked.
This commit fixes this issue.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
0f61c82d2e test/librbd: fix memory leak in TestMockParentCacheObjectDispatch
fix memory leak in TestMockParentCacheObjectDispatch.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:34 +02:00
Or Ozeri
bcca300d26 test/librbd: fix memory leak in TestMockCryptoLuksFormatRequest
fix memory leak in TestMockCryptoLuksFormatRequest.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:33 +02:00
Or Ozeri
79501173b7 test/librbd: fix memory leak in TestMockCryptoLuksLoadRequest
fix memory leak in TestMockCryptoLuksLoadRequest.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:33 +02:00
Or Ozeri
91c3b0314c test/librbd: fix bad TearDown in TestCryptoOpensslDataCryptor
Fix the TearDown function in TestCryptoOpensslDataCryptor
to call the right class parent function.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:33 +02:00
Or Ozeri
09ae3bd03d test/librbd: fix memory leak in TestCryptoOpensslDataCryptor
One of the tests leaks an encryption context.
This commit fixes this issue.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:33 +02:00
Or Ozeri
3af5bb7c61 librbd/crypto: fix memory leak in when DataCryptor fails
If DataCryptor fails, either in init_context or update_context,
the encryption context is not returned, which causes a memory leak.
This commit fixes this issue.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:33 +02:00
Or Ozeri
78abde0d25 test/librbd: fix memory leak in TestMockCryptoBlockCrypto
fix memory leak in TestMockCryptoBlockCrypto.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
2021-11-28 13:06:33 +02:00
Sage Weil
69b04de293 Merge PR #43997 into master
* refs/pull/43997/head:
	mgr/cephadm: make logging about agent less verbose

Reviewed-by: Adam King <adking@redhat.com>
2021-11-26 15:15:51 -05:00
Sage Weil
cf046f78da Merge PR #44079 into master
* refs/pull/44079/head:
	mgr/cephadm: skip osd_stats check if osd removal queue is empty

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-26 15:15:42 -05:00
Sage Weil
45312c8627 Merge PR #44075 into master
* refs/pull/44075/head:
	mgr/cephadm: drop osdspec_affinity tracking

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-26 15:15:27 -05:00
Sage Weil
131212254f Merge PR #44073 into master
* refs/pull/44073/head:
	pybind/mgr/mgr_module: cache mgr_ip

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-11-26 15:15:12 -05:00
Sage Weil
22c402b84e Merge PR #43936 into master
* refs/pull/43936/head:
	qa/tasks/cephadm: pull image to all hosts in parallel
	qa/tasks/cephadm: add hosts via mon remote
	qa/tasks/cephadm: use shortname for remote directory
	qa/tasks/cephadm: deploy no more than 5 mons in roleless mode
	qa/tasks/radosbench: default clients to all clients (not client.0)
	qa/tasks/ceph_manager: parallelize flush_pg_stats()
	qa/suites/big: remove thrasher
	qa/suites/big: update for cephadm

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-26 10:38:58 -05:00
Sage Weil
63f986641d Merge PR #44080 into master
* refs/pull/44080/head:
	mgr/cephadm: record when finished with scheduled daemon action

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-26 10:37:27 -05:00
Sebastian Wagner
f3d3dcee87
Merge pull request #44106 from sebastian-philipp/mgr-tox-37
mgr/tox.ini: Add python 3.7 environment 

Reviewed-by: Adam King <adking@redhat.com>
2021-11-25 17:54:26 +01:00
Sebastian Wagner
d93e8beab3
Merge pull request #43943 from sebastian-philipp/osd-memeory-hyperconverged
doc/cephadm: OSD memory autotuning for hyperconverged

Reviewed-by: Adam King <adking@redhat.com>
2021-11-25 17:27:26 +01:00
Radoslaw Zarzynski
5a7fc07933 crimson/os: fix a shutdown-related race condition in AlienStore.
This is supposed to tackle crashes like the following one:

```
INFO  2021-11-17 16:33:12,048 [shard 0] alienstore - stat
...
DEBUG 2021-11-17 16:33:12,789 [shard 0] ms - [osd.2(hb_front) v2:0.0.0.0:6813/34383 >> osd.0 v2:127.0.0.1:6809/34293@56992] closed!
DEBUG 2021-11-17 16:33:12,791 [shard 0] ms - [osd.2(hb_front) v2:0.0.0.0:6813/34383@53359 >> osd.7 v2:0.0.0.0:6815/34448] closed!
INFO  2021-11-17 16:33:12,795 [shard 0] alienstore - umount
INFO  2021-11-17 16:33:12,804 [shard 0] osd - osd.2: committed_osd_maps(23, 62)
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-8896-gf35358f1/rpm/el8/BUILD/ceph-17.0.0-8896-gf35358f1/src/rocksdb/db/db_impl/db_impl.cc:1615: rocksdb::Status rocksdb::DBImpl::GetImpl(const rocksdb::ReadOptions&, const rocksdb::Slice&, rocksdb::DBImpl::GetImplOptions&): Assertion `get_impl_options.column_family' failed.
Aborting.
Backtrace:
INFO  2021-11-17 16:33:13,542 [shard 0] ms - [osd.2(cluster) v2:172.21.15.17:6804/34383 >> osd.3 v2:172.21.15.17:6806/34387@50001] execute_ready(): fault at READY with nothing to send, going to STANDBY -- std::system_error (error crimson::net:4, read eof)
DEBUG 2021-11-17 16:33:13,542 [shard 0] ms - [osd.2(cluster) v2:172.21.15.17:6804/34383 >> osd.3 v2:172.21.15.17:6806/34387@50001] TRIGGER STANDBY, was READY
 0# gsignal in /lib64/libc.so.6
 1# abort in /lib64/libc.so.6
 2# 0x00007F12FA13FC89 in /lib64/libc.so.6
 3# 0x00007F12FA14DA76 in /lib64/libc.so.6
 4# rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::DBImpl::GetImplOptions&) in ceph-osd
 5# rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) in ceph-osd
 6# rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*) in ceph-osd
 7# RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer::v15_2_0::list*) in ceph-osd
 8# BlueStore::Collection::get_onode(ghobject_t const&, bool, bool) in ceph-osd
 9# BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int) in ceph-osd
10# 0x00005584E516577F in ceph-osd
11# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
12# 0x00005584E54E71E9 in ceph-osd
13# 0x00007F12FB861BA3 in /lib64/libstdc++.so.6
14# 0x00007F12FBB3C14A in /lib64/libpthread.so.0
15# clone in /lib64/libc.so.6
Content of /proc/self/maps:
7fff7000-8fff7000 rw-p 00000000 00:00 0
```

The problem happened in RocksDB:

```cpp
Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
                       GetImplOptions& get_impl_options) {
  assert(get_impl_options.value != nullptr ||
         get_impl_options.merge_operands != nullptr);

  assert(get_impl_options.column_family);
  // ...
```

```cpp
tatus DBImpl::Get(const ReadOptions& read_options,
                   ColumnFamilyHandle* column_family, const Slice& key,
                   PinnableSlice* value, std::string* timestamp) {
  GetImplOptions get_impl_options;
  get_impl_options.column_family = column_family;
  get_impl_options.value = value;
  get_impl_options.timestamp = timestamp;
  Status s = GetImpl(read_options, key, get_impl_options);
  return s;
}
```

```cpp
int RocksDBStore::get(
  const string& prefix,
  const char *key,
  size_t keylen,
  bufferlist *out)
{
  ceph_assert(out && (out->length() == 0));
  utime_t start = ceph_clock_now();
  int r = 0;
  rocksdb::PinnableSlice value;
  rocksdb::Status s;
  auto cf = get_cf_handle(prefix, key, keylen);
  if (cf) {
    s = db->Get(rocksdb::ReadOptions(),
                cf,
                rocksdb::Slice(key, keylen),
                &value);
  } else {
    string k;
    combine_strings(prefix, key, keylen, &k);
    s = db->Get(rocksdb::ReadOptions(),
                default_cf,
                rocksdb::Slice(k),
                &value);
  }
  // ...
```

It may be explained by a race condition between `AlienStore::stat()`
and `AlienStore::umount()`. Umounting a BlueStore means nullifying
`default_cf`:

```cpp
void RocksDBStore::close()
{
  // ...
  default_cf = nullptr;
  delete db;
  db = nullptr;
}
```

```
INFO  2021-11-17 16:33:12,048 [shard 0] alienstore - stat
...
INFO  2021-11-17 16:33:12,795 [shard 0] alienstore - umount
INFO  2021-11-17 16:33:12,804 [shard 0] osd - osd.2: committed_osd_maps(23, 62)
```

Although `AlienStore` synchronizes `umount()` and `do_transaction()`
with a `seastar::gate`, it lacks similar mechanism for read-like operations.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-11-25 15:05:34 +00:00
Sage Weil
9d50154a93 qa/tasks/cephadm: pull image to all hosts in parallel
This doesn't affect bootstrap, but it does mean we avoid any delay
the first time we cephadm.shell on some non-boostrap host.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-25 07:52:56 -06:00
Sage Weil
3a110f6c00 qa/tasks/cephadm: add hosts via mon remote
If we use a new remote for each shell command, we end up waiting
for the image to pull on every host in sequence.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-25 07:52:56 -06:00
Sage Weil
0e40064d31 qa/tasks/cephadm: use shortname for remote directory
This aligns with what the ceph and syslog tasks do.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-25 07:52:56 -06:00
Sage Weil
689d7ceabd qa/tasks/cephadm: deploy no more than 5 mons in roleless mode
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-25 07:52:55 -06:00
Sage Weil
e7bf9242c4 qa/tasks/radosbench: default clients to all clients (not client.0)
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-25 07:52:55 -06:00
Sage Weil
99cdaaba70 qa/tasks/ceph_manager: parallelize flush_pg_stats()
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-25 07:52:55 -06:00
Sage Weil
9559fea8b2 qa/suites/big: remove thrasher
This doesn't work with roleless (yet)

Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-25 07:52:55 -06:00
Sage Weil
0514b0a323 qa/suites/big: update for cephadm
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-25 07:52:55 -06:00
Sebastian Wagner
a503e7dc21
mgr/cephadm/tests: remove _deploy_cephadm_binary
(not needed)

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-25 13:29:01 +01:00
Sebastian Wagner
6f7ea4af3e
mgr/tox.ini: Add python 3.7 environment
Plus fixes.

Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-25 13:22:06 +01:00
Neha Ojha
0f7791fa24
Merge pull request #43774 from aclamk/fix-bluefs-truncate
Fix data corruption in bluefs truncate()

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
2021-11-24 09:18:09 -08:00
Neha Ojha
892c3de851
Merge pull request #43875 from liewegas/ceph-cli-better-help
ceph: make -h/--help show match when some args are supplied

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-11-24 09:17:11 -08:00
Sage Weil
994832e8e5 pybind/mgr/mgr_module: cache mgr_ip
This does not change for the lifetime of an active mgr module.  No need to
keep calling back into Mgr to re-fetch it.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-24 11:05:10 -05:00
Sebastian Wagner
54080eb506
Merge pull request #44092 from sebastian-philipp/cephadm-docs-deployment-scenarios
doc/cephadm: Cephadm docs deployment scenarios

Reviewed-by: Adam King <adking@redhat.com>
2021-11-24 16:42:40 +01:00
Melissa
e6e0344981
doc/cephadm: deployment scenarios single host and isolated environment
This PR adds a deployment scenarios section to the cephadm docs to document the single-host-defaults flag, and explain how to deploy in an isolated environment.

Signed-off-by: Melissa Li <melissali@redhat.com>
2021-11-24 15:02:37 +01:00
Melissa
a311e837d6
doc/cephadm: isolated environment and other deployment scenarios
This PR adds a section to the cephadm docs to describe how to install cephadm in different deployment scenarios (set cluster on single host, and deployment in an isolated environment or private network). 

Signed-off-by: Melissa Li <melissali@redhat.com>
2021-11-24 15:02:34 +01:00
Ernesto Puerta
9623700e7e
Merge pull request #43905 from rhcs-dashboard/fix-53242-master
mgr/dashboard: dashboard does not show degraded objects if they are less than 0.5% under "Dashboard->Capacity->Objects block

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2021-11-24 12:47:11 +01:00
Alfonso Martínez
2c44118921
Merge pull request #44023 from rhcs-dashboard/kcli-expanded-monitoring
mgr/dashboard: cephadm e2e start script: --expanded: deploy monitoring stack

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2021-11-24 11:30:22 +01:00
Mykola Golub
ac2c5a9dfe
Merge pull request #44064 from MrFreezeex/fix-statusupdater-utest
rbd-mirror: make RemoveImmediateUpdate test synchronous

Reviewed-by: Mykola Golub <mgolub@suse.com>
2021-11-24 11:58:36 +02:00
Alfonso Martínez
622be4580c
Merge pull request #44045 from rhcs-dashboard/upgrade-cypress
mgr/dashboard: upgrade Cypress to the latest stable version

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2021-11-24 08:34:10 +01:00
Liu-Chunmei
85045a3d26
Merge pull request #44019 from liu-chunmei/crimson-background-recovery
crimson/osd: add delay for background_recovery

reviewed-by:  Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-11-23 16:46:16 -08:00