Commit Graph

128030 Commits

Author SHA1 Message Date
Sage Weil
b430fd538f qa/suites/rados/thrash-old-clients: use better-support cephadm distro/podman
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-30 10:47:53 -06:00
Neha Ojha
0f9ed11e67
Merge pull request #43999 from kamoltat/wip-autoscale-profile-scale-up-default
pybind/mgr/pg_autoscale: revert to default profile scale-up

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-11-19 16:55:43 -08:00
Patrick Donnelly
860518bcb6
Merge PR #43974 into master
* refs/pull/43974/head:
	qa: disable metrics on kernel client during upgrade

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2021-11-19 18:43:48 -05:00
Ernesto Puerta
515af762bb
Merge pull request #43987 from rhcs-dashboard/53123-dashboard-nfs-cleanup
mgr/dashboard: NFS non-existent files cleanup

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: ljflores <NOT@FOUND>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2021-11-19 20:40:41 +01:00
Ernesto Puerta
d67302fcf4
Merge pull request #43983 from rhcs-dashboard/rgw-add-realm-column
mgr/dashboard: rgw daemon list: add realm column

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: sebastian-philipp <NOT@FOUND>
2021-11-19 20:14:19 +01:00
J. Eric Ivancich
08adeae354
Merge pull request #43824 from cbodley/wip-qa-rgw-upgrade-octopus-multisite-cv
qa/upgrade: rgw multisite upgrade test excludes ceph-volume

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2021-11-19 14:11:39 -05:00
Deepika Upadhyay
742e6cbd5f
Merge pull request #43764 from gregsfortytwo/wip-rbd-crash-consistency
doc: fix up rbd snapshot docs around crash consistency

Reviewed-by: Mykola Golub <mykola.golub@clyso.com>
Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
2021-11-20 00:29:16 +05:30
Kamoltat
a9f9f7b3fd pybind/mgr/pg_autoscale: revert to default profile scale-up
pg_autoscale module will now start out all the pools
with a scale-up profile by default.

Added tests in workunits/mon/pg_autoscaler.sh
to evaluate if the default pool creation is
a scale-up profile

Updated documentation and release notes to
reflect the change in the default behavior
of the pg_autoscale profile.

Fixes: https://tracker.ceph.com/issues/53309

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-11-19 18:55:36 +00:00
Patrick Donnelly
dcda5cb9ce
qa: disable metrics on kernel client during upgrade
v16.2.4 MDS triggers an assert from these messages.

Also: add latest pacific for extra coverage.

Fixes: https://tracker.ceph.com/issues/53293
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2021-11-19 13:32:04 -05:00
Igor Fedotov
32be8d7873
Merge pull request #41557 from ifed01/wip-ifed-better-daemonperf
os/bluestore: improve usability for bluestore/bluefs perf counters

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Reviewed-by: Laura Flores lflores@redhat.com
2021-11-19 18:48:52 +03:00
Venky Shankar
93054a3fa9
Merge pull request #43886 from nmshelke/doc-fix-53054
doc: prerequisites fix for cephFS mount

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2021-11-19 10:10:59 +05:30
Samuel Just
32eb463c94
Merge pull request #44008 from rzarzynski/wip-crimson-leaky-objectcontextregistry
crimson/osd: fix leaks of ObjectContext in the registry.

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
2021-11-18 17:36:30 -08:00
Neha Ojha
e76a9f4045
Merge pull request #43326 from pdvian/wip-doc-config-correction
doc/dev/config: Replace invalid config debug-pg

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-11-18 17:03:11 -08:00
Liu-Chunmei
ef9adc6959
Merge pull request #43980 from liu-chunmei/crimson-peerevent-nested
crimson: add delay for peering_event start when nested

reviewed-by : Samuel Just <sjust@redhat.com>, 
                       Kefu Chai <tchaikov@gmail.com>, 
                       Xuehan Xu <xxhdx1985126@gmail.com>,
                       Radoslaw Zarzynski <rzarzyns@redhat.com>.
2021-11-18 15:04:34 -08:00
Radoslaw Zarzynski
9ae3774bac crimson/osd: fix leaks of ObjectContext in the registry.
The patch is supposed to fix the following problems (extra
debugs onboard):

```
NFO  2021-11-16 01:18:38,713 [shard 0] osd - ~OSD: OSD dtor called
INFO  2021-11-16 01:18:38,713 [shard 0] osd - Heartbeat::Peer: osd.6 removed
INFO  2021-11-16 01:18:38,714 [shard 0] osd - Heartbeat::Peer: osd.5 removed
INFO  2021-11-16 01:18:38,714 [shard 0] osd - Heartbeat::Peer: osd.2 removed
INFO  2021-11-16 01:18:38,714 [shard 0] osd - ~ShardServices: ShardServices dtor called
INFO  2021-11-16 01:18:38,714 [shard 0] osd - ~ObjectContextRegistry: ShardServices dtor called; unref_size=3, size=3
INFO  2021-11-16 01:18:38,714 [shard 0] osd - ~ObjectContextRegistry: unreferenced p=0x619000115380
INFO  2021-11-16 01:18:38,714 [shard 0] osd - ~ObjectContextRegistry: unreferenced p=0x619000114980
INFO  2021-11-16 01:18:38,714 [shard 0] osd - ~ObjectContextRegistry: unreferenced p=0x619000112680
INFO  2021-11-16 01:18:38,714 [shard 0] osd - ~ObjectContextRegistry: set p=0x619000114980
INFO  2021-11-16 01:18:38,714 [shard 0] osd - ~ObjectContextRegistry: set p=0x619000115380
INFO  2021-11-16 01:18:38,714 [shard 0] osd - ~ObjectContextRegistry: set p=0x619000112680
INFO  2021-11-16 01:18:38,738 [shard 0] osd - crimson shutdown complete

=================================================================
==33351==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 2808 byte(s) in 3 object(s) allocated from:
    #0 0x7fe10c0327b0 in operator new(unsigned long) (/lib64/libasan.so.5+0xf17b0)
    #1 0x55accbe8ffc4 in ceph::common::intrusive_lru<ceph::common::intrusive_lru_config<hobject_t, crimson::osd::ObjectContext, crimson::osd::obc_to_hoid<crimson::osd::ObjectContext> > >::get_or_create(hobject_t const&) (/usr/bin/ceph-osd+0x3b000fc4)

Objects leaked above:
0x619000112680 (936 bytes)
0x619000114980 (936 bytes)
0x619000115380 (936 bytes)
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2021-11-18 17:40:52 +00:00
Casey Bodley
e3f0fc7f7f
Merge pull request #43409 from linuxbox2/wip-rgwadmin-logtest
qa/rgw: use local runner with cmdline radosgw_admin.py

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2021-11-18 10:06:56 -05:00
Casey Bodley
483b1b7487
Merge pull request #35100 from soumyakoduri/cloudtiering
rgw/CloudTransition: Transition objects to cloud endpoint

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2021-11-18 09:27:46 -05:00
Ronen Friedman
df658feae2
Merge pull request #42780 from ronen-fr/wip-ronenf-unique-scrub
osd/scrub: mark PG as being scrubbed, from scrub initiation to Inactive state

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-11-18 16:08:07 +02:00
Sage Weil
1d1cd65cb9 Merge PR #43880 into master
* refs/pull/43880/head:
	mgr/cephadm: turn off asyncssh debug output

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-18 08:35:41 -05:00
zdover23
bf1ba269e0
Merge pull request #43093 from zdover23/wip-doc-2021-09-09-rados-bootstrap-options-2
doc/rados: update mon_host & friends options

Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-11-18 22:54:58 +10:00
Sebastian Wagner
fa37ef0f32
Merge pull request #43969 from sebastian-philipp/core
doc/cephadm: core dumps

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
2021-11-18 10:52:36 +01:00
Soumya Koduri
44317eacf0 rgw/CloudTransition: Replace Coroutines with RGWRestConn APIs
To avoid the overhead of using coroutines during lifecycle transition,
RGWRESTStream* APIs are used to transition objects to remote cloud.

Also handled few optimizations and cleanup stated below:
* Store the list of cloud target buckets as part of LCWorker instead
  of making it global. This list is maintained for the duration of
  RGWLC::process(), post which discarded.
* Refactor code to remove coroutine based class definitions which are no
  longer needed and use direct function calls instead.
* Check for cloud transitioned objects using tier-type and return error if
  accessed in RGWGetObj, RGWCopyObj and RGWPutObj ops.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
1e48366c7b rgw/RGWRESTConn: Define a wrapper to send PUT/POST stream request
Similar to "get_resource()", add an API "send_resource()" to send
PUT/POST/DELETE Stream request on RGWRestConn

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
aa362e8899 rgw/CloudTransition: Include aws region name for remote endpoint
With commit#81ad226, aws auth v4 rquires region name for remote
endpoint connection. Include the same in the tier parameters.

& misc fixes

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
cf24116690 rgw/CloudTransition: Handle versioned objects
For versioned and locked objects, similar semantics as that of LifecycleExpiration are applied as stated below -

If the bucket versioning is enabled and the object transitioned to cloud is
 - current version, irrespective of what the config option "retain_object" value is, the object is not deleted but instead delete marker is created on the source rgw server.
 - noncurrent version, it is deleted or retained based on the config option "retain_object" value.

If the object is locked, and is
 - current version, it is transitioned to cloud post which it is made noncurrent with delete marker created.
 - noncurrent version, transition is skipped.

Also misc rebase fixes and cleanup -

* Rename config option to "retain_head_object"

to reflect its functionality to keep head object post transitioning
to cloud if enabled

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
585684a93f rgw/CloudTransition: Skip transition to cloud if the object is locked
If an object is locked, skip its transition to cloud.

@todo: Do we need special checks for bucket versioning too?
If current, instead of deleting the data, do we need to create
a delete marker? What about the case if retain_object is set to true.

& misc rebase fixes

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
9a2c48a520 rgw/CloudTransition: Change tier-type to cloud-s3
Currently the transition is supported to cloud providers
that are compatible with AWS/S3. Hence change the tier-type to
cloud-s3 to configure the S3 style endpoint details.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
728f13d8c6 rgw/CloudTransition: handle versioned objects
If the object is versioned, to avoid objects getting overwritten
post transition to cloud, append object versionID to the target
object name

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
076a7e7fee rgw/CloudTransition: Add documentation
Also to avoid object name collisions across various buckets
post cloud transition, add bucket name to the object prefix.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
a3fdff29ea rgw/CloudTransition: Do not allow data pool for tier type storage classes
Tier type storage classes should not be allowed to have data
pools

& few other fixes/cleanup stated below -

* If the tier_targets are not configured, do not dump them in
the 'zonegroup get' command.

* If not configured, by default a bucket of below name convention -
"rgwx-$zonegroup-$storage_class-cloud-bucket"

is created in the remote cloud endpoint to transition objects to.

* Rename config option 'tier_storage_class' to 'target_storage_class'.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
b86ba5d655 rgw/CloudTransition: Fail GET on cloud tiered objects
As per https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html
GET operation may fail with “InvalidObjectStateError” error if the
object is in GLACIER or DEEP_ARCHIVE storage class and not restored.
Same can apply for cloud tiered objects. However STAT/HEAD requests
shall return the metadata stored.

& misc fixes

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
557b519881 rgw/CloudTransition: Verify if the object is already tiered
Add class to fetch headers from remote endpoint and verify if the object
is already tiered.

& Few other fixes stated below -

* Erase data in the head of cloud transitioned object
* 'placement rm' command should erase tier_config details
* A new option added in the object manifest to denote if the
  object is tiered in multiparts

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:48 +05:30
Soumya Koduri
6333c0e50c rgw/CloudTransition: Store the status of multipart uploads
Store the status of multipart upload parts to verify if the object
hasn't changed during the transition and if yes, abort the upload.

Also avoid re-creating target buckets -

Its not ideal to try creating target bucket for every object
transition to cloud. To avoid it caching the bucket creations in
a map with an expiry period set to '2*lc_debug_interval' for each
entry.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:47 +05:30
Soumya Koduri
7d7aeb1ee0 rgw/CloudTransition: Delete cloud tiered objects by default
Added a new option "retain_object" in tier_config which determines
whether a cloud tiered object is deleted or if its head object is
retained. By default the value is false i.e, the objects get
deleted.

XXX: verify that if Object is locked (ATTR_RETENTION), transition is
not processed. Also check if the transition takes place separately for
each version.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:47 +05:30
Soumya Koduri
23b962157a rgw/CloudTransition: Update object metadata and bi post cloud tranistion
After transitioning the object to cloud, following updates are done
to the existing object.

* In bi entry, change object category to CloudTiered
* Update cloud-tier details (like endpoint, keys etc) in Object Manifest
* Mark the tail objects expired to be deleted by gc

TODO:
* Update all the cloud config details including multiparts
* Check if any other object metadata needs to be changed
* Optimize to avoid using read_op again to read attrs.
* Check for mtime to resolve conflicts when multiple zones try to transition obj

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:47 +05:30
Soumya Koduri
c687d01d1b rgw/CloudTransition: Tier objects to remote cloud
If the storage class configured is of cloud, transition
the objects to remote endpoint configured.

In case the object size is >mulitpart size limit (say 5M),
upload the object into multiparts.

As part of transition, map rgw attributes to http attrs,
including ACLs.

A new attribute (x-amz-meta-source: rgw) is added to denote
that the object is transitioned from RGW source.

Added two new options to tier-config to configure multipart size -
* multipart_sync_threshold - determines the limit of object size,
when exceeded transitioned in multiparts
* multipart_min_part_size - the minimum size of the multipart upload part

Default values for both the options is 32M and minimum value supported
is 5M.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:47 +05:30
Soumya Koduri
c63d16ff96 rgw/CloudTransition: Add new options to configure tier endpoint
As mentioned in https://docs.google.com/document/d/1IoeITPCF64A5W-UA-9Y3Vp2oSfz3xVQHu31GTu3u3Ug/edit,
the tier storage class will be configured at zonegroup level.

So the existing CLI "radosgw-admin zonegroup placement add  <id> --storage-class <class>" will be
used to add tier storage classes as well but with extra tier-config options mentioned below -

--tier-type : "cloud"
--tier-config : [<key,value>,]

These tier options are already defined to configure cloud sync module which are being reused here.

TODO:
* Add multipart options (if any , like part size, threshold)
* Document
* Test upgrade/downgrade

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
2021-11-18 12:52:47 +05:30
chunmei-liu
d4ba8ef4a6 crimson: add delay for peering_event start when nested
Delay the second (nested) peerevent::start to let the first finish.
Then avoid interruptor nesting which will cause local interrupt_cond
not equal global interrupt_cond.

Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
2021-11-17 19:28:36 -08:00
Sage Weil
b1fba14ef5 Merge PR #43929 into master
* refs/pull/43929/head:
	qa/suites/orch/cephadm: verify /var/log/ceph/$fsid ownership
	cephadm: only make_log_dir for ceph daemons

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-17 17:05:23 -05:00
Sage Weil
91157246d6 Merge PR #43934 into master
* refs/pull/43934/head:
	qa/suites/rados/dashboard: use single-container-host.yaml
	qa/distros: add single-container-host.yaml
	qa/suites: use distros/container-hosts/
	qa/distros/container-hosts: add 8.stream + crun
	qa/distros/container-hosts: add collection of container targets

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-17 17:04:57 -05:00
Casey Bodley
06b266b83c qa/upgrade: rgw multisite upgrade test excludes ceph-volume
E: Unable to locate package ceph-volume

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2021-11-17 14:37:49 -05:00
Casey Bodley
8cc3f6056d
Merge pull request #43433 from soumyakoduri/wip-skoduri-dbstore-lc
rgw/dbstore: handle lc related state

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
2021-11-17 12:39:19 -05:00
Neha Ojha
ff9687ddb5
Merge pull request #43921 from aclamk/wip-aclamk-fix-omap-upgrade-fix
BlueStore: Omap upgrade to per-pg fix fix

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
2021-11-17 09:35:19 -08:00
Matt Benjamin
0046803534 qa/rgw: use local runner with cmdline radosgw_admin.py
Restore ability to run radosgw_admin.py unit standalone--improved
to use vstart_runner hooks.

Local rgwadmin(...) wrapper suggested as a cleanup in review by Casey.

Fixes: https://tracker.ceph.com/issues/52837

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
2021-11-17 11:22:32 -05:00
Adam Kupczyk
65a3f374aa os/bluestore: Fix omap upgrade to per-pg scheme
This is fix to regression introduced by fix to omap upgrade: https://github.com/ceph/ceph/pull/43687
The problem was that we always skipped first omap entry.
This worked fine with objects having omap header key.
For objects without header key we skipped first actual omap key.

Fixes: https://tracker.ceph.com/issues/53260

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
2021-11-17 16:56:23 +01:00
Adam Kupczyk
0d13d64e05 os/bluestore: Add more legacy -> per PG upgrade tests
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
2021-11-17 16:56:23 +01:00
Sebastian Wagner
223af588a9
doc/cephadm: core dumps
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
2021-11-17 16:46:09 +01:00
Samuel Just
7c3ddfc59f
Merge pull request #43977 from xxhdx1985126/wip-53273
crimson/os/seastore/lba_manager: do full merge if the donor node is *AT* its minimum capacity

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
2021-11-17 07:34:47 -08:00
Sage Weil
411b2d39c2 qa/suites/rados/dashboard: use single-container-host.yaml
Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-17 09:02:42 -06:00
Sage Weil
362207ba45 qa/distros: add single-container-host.yaml
This is a single, possibly preferred, os + container runtime
combination.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-11-17 09:02:42 -06:00