Commit Graph

119016 Commits

Author SHA1 Message Date
Adam Kupczyk
882714e0c9 tools/bluestore: Add command 'show-sharding' to ceph-bluestore-tool
Add command 'show-sharding' to ceph-bluestore-tool.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
2021-01-19 15:07:16 +01:00
Adam Kupczyk
d93b5406af kv/RocksDBStore: Fix ceph-bluestore-tool reshard command
Fix ceph-bluestore-tool reshard command.
Add conditions that allow to continue resharding that was previously terminated.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
2021-01-19 12:32:16 +01:00
Sebastian Wagner
9923d2a119 mgr/orchestrator: ignore Liskov substitution principle violation
Right now, we're violating the Liskov substitution principle by deriving from `Orchestrator` but `process` takes a sub class of `Completion`:
See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides

The idea is to make Orchestrator a type constructor with `CompletionT` as argument, but this is not supported by mypy: https://github.com/python/typing/issues/548

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2021-01-19 11:33:14 +01:00
Sebastian Wagner
c95ba878c6 mgr/orchestrator: disallow_untyped_defs = True
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2021-01-19 11:33:14 +01:00
Venky Shankar
3478b2a062 test: cephfs-mirror teuthology task and test yamls
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2021-01-19 01:08:10 -05:00
Venky Shankar
f81e8f1e88 test: add tests for mirroring module w/ daemon verification
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2021-01-19 01:08:10 -05:00
Venky Shankar
8334bea5a6 test: optionally create a backup filesystem on startup
Also filter out client-id's starting with "mirror" when
cleaning leftover auth-ids since teuthology would be
configured to create client.mirror and client.mirror_remote
clients before executing mirroring tests.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
2021-01-19 01:08:10 -05:00
Venky Shankar
b7acf7fc77 pybind/mgr/mirroring: interface to mirror CephFS directory snapshots
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2021-01-19 01:06:43 -05:00
Venky Shankar
296e879009 pybind/mgr/mgr_util: make RTimer calss reusable
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2021-01-19 01:06:43 -05:00
Brad Hubbard
45f36bbfc6 osd: initialise m_interval_start
Fixes: https://tracker.ceph.com/issues/48918

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2021-01-19 15:40:07 +10:00
Kefu Chai
3dd11ec576
Merge pull request #37925 from liu-chunmei/seastore_omap_tree
crimson/seastore: add omap tree implementation

Reviewed-by: Samuel Just <sjust@redhat.com>
2021-01-19 12:23:18 +08:00
Kefu Chai
c037f4cb5d mgr: update mon metadata when monmap is updated
there is chance that some monitor(s) is updated / upgraded in a single
monmap update without being removed from cluster state's metata first,
so, without this change, we will not update the metadata associated with
that monitor, hence the mgr modules which consumes the metadata is not
updated accordingly and keep reporting the stale information.

in this change, we always update the metadata associated with all monitor
included by the latest monmap. multiple "mon metadata" commands are sent
to monitor for retrieving their updated metadata, instead of sending a
single one, so that we can reuse "MetadataUpdate" to update the metadata
of a given daemon. as the number of monitors in a typical cluster is
relatively small, and the frequency of monmap update is low, so this
overhead should be fine.

unlike other places where we ask mon for metadata in Mgr class, the code
sending the mon command for updated monitor metata is located outside of
`cluster_state.with_monmap()` block, the reason is that `with_monmap()`
is guraded by the monc_lock under the hood, while `start_mon_command()`
also need to acquire the monc_lock, which is not a recursive lock. so we
have to do this out of the `with_monmap()` block.

Fixes: https://tracker.ceph.com/issues/48905
Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-01-19 12:21:38 +08:00
Casey Bodley
9e7a41f483
Merge pull request #38890 from cbodley/wip-qa-rgw-https-clients
qa/rgw: don't add a certificate for nonexistent rgw.client.1

Reviewed-by: Ali Maredia <amaredia@redhat.com>
2021-01-18 15:02:10 -05:00
Yuri Weinstein
0c06c87d8d qa/tests: added pacific, changed octopus number of runs
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
2021-01-18 10:47:55 -08:00
Jason Dillaman
1c2783a975
Merge pull request #38898 from pkulijiawei/bug-48866
librbd: remove the first if  at api::group::list

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2021-01-18 13:01:03 -05:00
Sebastian Wagner
e0a76dd0a6
Merge pull request #38904 from sebastian-philipp/cephadm-sysctl-silent
cephadm: Don't make sysctl spam the log file

Reviewed-by: Michael Fritch <mfritch@suse.com>
2021-01-18 17:43:44 +01:00
Sebastian Wagner
2e225fa83a
Merge pull request #38910 from batrick/pr38568-fix
cephadm: fix rgw osd cap tag

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2021-01-18 17:42:13 +01:00
Sebastian Wagner
7745ccfc10
Merge pull request #38918 from adk3798/ha-rgw-fix
mgr/cephadm: fix ha-rgw removal

Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
2021-01-18 17:37:45 +01:00
Sridhar Seshasayee
922d93f449 osd: Remove override for osd_async_recovery_min_cost for mclock profiles
Overriding osd_async_recovery_min_cost as part of enabling a built-in
mclock profile has the undesirable side effect of peers not choosing
the correct async recovery targets if osds are using mixed schedulers
(this could happen during upgrades or if "debug_random" is set for
osd_op_queue config option). Due to the above, osds get into a
"choose_acting" loop during peering.

The solution is to remove the override of osd_async_recovery_min_cost.

Fixes: https://tracker.ceph.com/issues/48906
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-01-18 19:07:11 +05:30
Sebastian Wagner
6f7e01ee89 cephadm: Fix node-exporter deployment.
Fixes: 2ce828d5f3682d3eee61e4a4a07a9eedb6a3d04e
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2021-01-18 12:33:36 +01:00
Sebastian Wagner
88c6c34e2b qa/cephadm: Add yaml output to smoke test
this will provide a more detailed output, like

```yaml
...snip...
service_type: node-exporter
service_name: node-exporter
placement:
  host_pattern: '*'
status:
  created: '2021-01-18T11:21:56.024810Z'
  last_refresh: '2021-01-18T11:23:24.477672Z'
  running: 0
  size: 1
events:
- "2021-01-18T11:23:09.602644Z service:node-exporter [ERROR] \"Failed while placing\
  \ node-exporter.ubuntuon ubuntu: cephadm exited with an error code: 1, stderr:Deploy\
  \ daemon node-exporter.ubuntu ...\nVerifying port 9100 ...\nTraceback (most recent\
  \ call last):\n  File \"<stdin>\", line 7274, in <module>\n  File \"<stdin>\", line\
  \ 1563, in _default_image\n  File \"<stdin>\", line 3698, in command_deploy\n  File\
  \ \"<stdin>\", line 2338, in deploy_daemon\n  File \"<stdin>\", line 1961, in create_daemon_dirs\n\
  AssertionError\""
...snip...
```

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
2021-01-18 12:27:14 +01:00
Sebastian Wagner
223f5b4036
Merge pull request #38507 from sebastian-philipp/mypy-mgr_util
pybind/mgr: disallow_untyped_defs=True for mgr_util

Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
Reviewed-by: Patrick Seidensal <pseidensal@suse.com>
2021-01-18 12:00:08 +01:00
Soumya Koduri
e54e68ad3c rgw/lc: Fix use-after-free in RGWLC::process
Fixed use-after-free issue with 'rgw::sal::LCSerializer lock'
in RGWLC::process.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-01-18 14:00:25 +05:30
Neha Ojha
0e7a7be5c0
Merge pull request #38920 from sunnyku/wip-ceph-mclock-op
osd: handle ceph specific config changes for the mclock scheduler

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-01-16 18:24:13 -08:00
Neha Ojha
c823730121
Merge pull request #38675 from sseshasa/wip-dmclock-config-sets
osd: Add mclock config sets that implement built-in and custom profiles

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
2021-01-16 18:23:25 -08:00
Jason Dillaman
e4ba941d24
Merge pull request #38933 from wjwithagen/wjw-fix-QCOW
librbd: remove inclusion of endian.h

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2021-01-16 17:04:46 -05:00
Sridhar Seshasayee
b5b55d4900 osd: Set recovery specific Ceph options for mclock profiles to work.
Set and disable relevant recovery specific Ceph options for mclock
profiles to work as expected. Broadly,
 - Set low value for recovery min cost
 - High values for max active recoveries and max backfills
 - Disable recovery sleep{_hdd, _ssd, _hybrid}

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-01-17 00:39:40 +05:30
Sridhar Seshasayee
96b2066e8a osd: Handle configuration changes to mclock config options
Handle configuration changes to the mclock cost per io, the max
capacity options and the mclock profile. Handle the case where the
profile is changed from a built-in profile to the custom profile.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-01-17 00:39:40 +05:30
Sridhar Seshasayee
5f390d088b osd: Add mclock profile infrastructure and implement mclock profiles
Define config options to specify the cost per io for an osd (hdd & ssd).
 - osd_mclock_cost_per_io_msec
 - osd_mclock_cost_per_io_msec_hdd
 - osd_mclock_cost_per_io_msec_ssd

Define config options to set max osd capacity (hdd & ssd) to be allocated
between clients of dmclock namely,
 - osd_mclock_max_capacity_iops
 - osd_mclock_max_capacity_iops_hdd
 - osd_mclock_max_capacity_iops_ssd

Define config option "osd_mclock_profile" to specify the built-in profile
to enable.

Also, Set the number of op shards being used in the osd within the mclock
scheduler as well. This is necessary to calculate the per shard limits
within the mclock scheduler.

With the above information, enable the specified mclock profile by
calling the appropriate method to set the profile specific mclock
parameters and Ceph options.

Prior to enqueuing an op, the scheduler performs a calculation to scale
up or down the cost associated for the OpSchedulerItem. This calculation
is done based on the existing item cost, the max osd capacity provided
and an additional cost factor based on underlying device type(hdd/ssd).

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-01-17 00:39:40 +05:30
Sunny Kumar
e0f083711e osd: handle ceph specific config changes for the mclock scheduler
The below ceph parameters are set automatically
while enabling the mclock scheduler with a built-in profile.
The user in this case will not be able to modify these
ceph specific config options during runtime.

  a. osd_async_recovery_min_cost
  b. osd_recovery_max_active{_hdd,_ssd}
  c. osd_max_backfills
  d. osd_recovery_sleep{_hdd,_ssd,_hybrid}

If the custom profile is enabled for the mclock scheduler,
the user can modify these parameters.

Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
2021-01-16 19:03:08 +00:00
Patrick Donnelly
e674f752b5
Merge PR #37297 into master
* refs/pull/37297/head:
	qa: add new client tests
	test: add client tests
	client: wire up alternate_name
	mds: fix alternate_name durability
	mds: add alternate_name feature support for dentries
	mds: add static encode/decode helpers for remote in CDentry
	mds: add "fscrypt" flag support for inode_t
	mds: add new CDentry tags support
	include: cleanup filepath cons
	mds: add comment on linkage durability
	client: constify some RWRef methods
	common: add strescape helper

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-16 10:40:51 -08:00
Igor Fedotov
ac73df7158 os/bluestore: fix deferred_queue locking
https://github.com/ceph/ceph/pull/30027 introduced a gap in osr
protection (in _deferred_queue()) which could cause improper deferred_pending value while
processing osr from _deferred_aio_finish().
As a result both segmentation fault in _deferred_aio_finish() or deadlock could occur.

Fixes: https://tracker.ceph.com/issues/48776

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2021-01-16 19:54:52 +03:00
Willem Jan Withagen
9693080acd librbd: remove inclusion of endian.h
The file is not available on FreeBSD.

Fixes: https://github.com/ceph/ceph/pull/38862
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2021-01-16 14:54:52 +01:00
Kefu Chai
9280858e9f
Merge pull request #38880 from xxhdx1985126/wip-crimson-bug-fix
crimson/osd: make PGRecovery::start_primary_recovery_ops take objs recovered by UrgentRecovery into account

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-01-16 14:57:47 +08:00
Mykola Golub
86576b0997 osd: fix potential null pointer dereference when sending ping
Fixes: https://tracker.ceph.com/issues/48821
Signed-off-by: Mykola Golub <mgolub@suse.com>
2021-01-16 05:00:09 +00:00
Kefu Chai
26d6648ffb
Merge pull request #38245 from batrick/ceph-status-pg-state-sort
osd: use more efficient CachedStackStringStream

Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-01-16 12:20:15 +08:00
Neha Ojha
3920085dac
Merge pull request #37391 from aclamk/wip-kv-onode-cache
Add ability to control rocksdb block cache

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-01-15 18:36:47 -08:00
Neha Ojha
3ab4807281
Merge pull request #38925 from thomasgoirand/master
common/ipaddr: Allow binding on lo

Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-01-15 17:40:55 -08:00
Patrick Donnelly
a6891a0c8a
qa: add new client tests
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-15 17:30:40 -08:00
Patrick Donnelly
f77e6aa5c8
test: add client tests
For alternate_name, so far.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-15 17:30:40 -08:00
Patrick Donnelly
e65b8dcad1
client: wire up alternate_name
Here we're exposing a public Client::walk (aka path_walk) so that the
user can inspect dentries (not something normally possible in POSIX).
We're going to skip exposing such an interface in libcephfs since
there's no reason to do that (who would use it?) except for testing.
Instead, a follow-up PR will add Client tests (for the first time, yay!)
that will exercise this code.

Also, ideally we'd also expose alternate_name via readdir results but
that is a bit more complicated since dirents do not normally refer to
external memory. So, just rely on Client::walk for testing for now.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-15 17:30:40 -08:00
Patrick Donnelly
b91490d353
mds: fix alternate_name durability
This is a collection of fixes to Xiubo's prior work. Namely:

- Add new mds_alternate_name_max option to limit the size of
  alternate_name. Otherwise a Client could trick the MDS into creating
  an alternate_name of any size!

- Clean up how alternate_name is assigned to CDentry. In the general
  case, this should be assigned as part of creating the dentry. We want
  this value to be immutable for the life of the dentry. Even for the
  very special case of rename(2) where the destination dentry already
  exists.  We explicitly check (after discussion with Jeff) that the
  target dentry alternate_name already matches what the rename RPC is
  giving.

- The MDS is now properly journaling the alternate_name.

- The MDS rejoin phase is properly transmitting each dentry's
  alternate_name. I've discovered that this MMDSCacheRejoin message
  actually wasn't versioned which I've raised in a tracker [1]. In the
  mean time, we'll just bump CEPH_MDS_PROTOCOL as usual.

[1] https://tracker.ceph.com/issues/48886

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-15 17:30:40 -08:00
Xiubo Li
9e250226a8
mds: add alternate_name feature support for dentries
This will support the "alternate_name" filename support, and will save
an alternate name for each dentry. This alternate name is not used in
path lookup or for any other usual file system purpose. The name is
simply an added blob of metadata on the dentry that is distributed to
clients so that "long" file names may be supported for clients which
require them. In the case of an fscrypt enhanced kernel mount driver,
the long name may be the cyphertext (exceeding FILENAME_MAX) of a long
file name.

Because this affects only files with long file names, the use of this
feature should be rare but could be common for some unusual
applications.

The client mount should check the CEPHFS_FEATURE_ALTERNATE_NAME feature
bit first to check whether the MDS has support for this feature or not.
The alternate_name is transmitted as part of the message payload in
MClientRequest when setting the alternate_name. The LeaseStat structure
in MClientReply contains the alternate_name.

When executing a metadata mutation RPC, the client will set the
alternate_name (if it exists) as part of the operation. The MDS will
pick that up and set it on the new or mutated dentry.

Fixes: https://tracker.ceph.com/issues/47162
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-15 17:30:39 -08:00
Xiubo Li
bd0b90ffbf
mds: add static encode/decode helpers for remote in CDentry
This will unify all the same work in different places.

Fixes: https://tracker.ceph.com/issues/47162
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-01-15 17:30:39 -08:00
Xiubo Li
7fe1c57846
mds: add "fscrypt" flag support for inode_t
If the client has set the "encryption.ctx" attribute, the MDS side
will set the "fscrypt" flag to truth, false as default.

The clients could use this flag to get to know whether the current
inodes are under enscrypted or not.

Fixes: https://tracker.ceph.com/issues/47162
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-01-15 17:30:39 -08:00
Xiubo Li
39f3440e36
mds: add new CDentry tags support
This will add new tag 'i' for inode and new tag 'l' for remote link
for the CDentry. And at the same time will add one proper dentry
version, which will be helpful to add new features/members in future
for the CDentry.

Fixes: https://tracker.ceph.com/issues/47162
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2021-01-15 17:30:39 -08:00
Patrick Donnelly
c73d012b5a
include: cleanup filepath cons
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-15 17:30:39 -08:00
Patrick Donnelly
805069fbe9
mds: add comment on linkage durability
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-15 17:30:39 -08:00
Patrick Donnelly
28b1c539c4
client: constify some RWRef methods
And make some Client state checks const.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-15 17:30:39 -08:00
Patrick Donnelly
093b0505e6
common: add strescape helper
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-01-15 17:30:38 -08:00