Commit Graph

129049 Commits

Author SHA1 Message Date
Yingxin Cheng
3165692e43 crimson/os/seastore: classify journal related logs in seastore_types.cc
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2022-01-14 23:06:43 +08:00
Yingxin Cheng
81f6e4e82e crimson/os/seastore: convert ExtentReader to seastore logging
Also set the logger to seastore_journal as the component works at
the journal layer.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2022-01-14 23:06:33 +08:00
Yingxin Cheng
420071be6b crimson/os/seastore/journal: convert to seastore logging
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2022-01-14 22:24:38 +08:00
Ernesto Puerta
4def95b0d4
Merge pull request #44507 from votdev/issue_53813_nfs_page_not_found
mgr/dashboard: NFS pages shows 'Page not found'

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
2022-01-14 12:56:55 +01:00
Ernesto Puerta
f7cd4b2873
Merge pull request #43685 from p-se/fix-grafana-graphs-ceph_daemon
mgr/dashboard: fix Grafana OSD/host panels

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: p-se <NOT@FOUND>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2022-01-14 12:50:13 +01:00
Ernesto Puerta
c208dbeb13
Merge pull request #44573 from rhcs-dashboard/53858-fix-smart-data-single-daemon
mgr/dashboard: fix: get SMART data from single-daemon device

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
2022-01-14 12:48:52 +01:00
Ilya Dryomov
d665f88ad0
Merge pull request #44559 from ideepika/wip-iscsi-53830
test/rbd/iscsi: correct the hostname in gwcli_create.t to match hostname -f

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-14 10:30:27 +01:00
Ilya Dryomov
3c2b05a252
Merge pull request #44571 from idryomov/wip-xfstests-qemu-cert
qa/run_xfstests_qemu.sh: stop reporting success without actually running any tests

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
2022-01-14 10:28:06 +01:00
Venky Shankar
e65d88ca58
Merge pull request #44570 from vshankar/wip-53857
qa: adjust for MDSs to get deployed before verifying their availability

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2022-01-14 08:42:20 +05:30
Samuel Just
2389ebd7d4
Merge pull request #44555 from cyx1231st/wip-fix-seastore-jounral-fast-submit
crimson/os/seastore/journal: fast submit if RecordSubmitter is IDLE and no pending

Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
2022-01-13 17:23:37 -08:00
Adam King
07ff5bcc55
Merge pull request #44583 from mgfritch/fixup-44306-docker-count
cephadm: increase number of docker.io occurances

Reviewed-by: Adam King <adking@redhat.com>
2022-01-13 19:18:35 -05:00
Michael Fritch
b0b5214b8f
cephadm: increase number of docker.io occurances
fixup for 0fe2e54db7

Signed-off-by: Michael Fritch <mfritch@suse.com>
2022-01-13 15:52:09 -07:00
Josh Durgin
bd05cede10
Merge pull request #44554 from jdurgin/wip-rbd-qos-docs
doc/rbd: clarify and add more detail to librbd QoS docs

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 12:02:03 -08:00
Casey Bodley
f65b59be85
Merge pull request #40802 from galsalomon66/wip-s3select-parquet-object-processing-2
RGW/s3select : parquet implementation:

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2022-01-13 12:53:33 -05:00
Deepika Upadhyay
8416173a7b test/rbd/iscsi: correct the HOST name provided.
hostname -f and hostname generated from gwcli_create being different
gave rise to error:

The first gateway defined must be the local machine

Fixes: https://tracker.ceph.com/issues/53830

Signed-off-by: Deepika Upadhyay <dupadhya@redhat.com>
2022-01-13 22:49:28 +05:30
Kefu Chai
40717bf5a9
Merge pull request #44577 from clementperon/master
cmake: Fix Finddpdk cmake module

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
2022-01-14 01:15:07 +08:00
Adam King
528f695b2f
Merge pull request #44498 from phlogistonjohn/jjm-root-check-later
cephadm: check if cephadm is root after cli is parsed 

Reviewed-by: Adam King <adking@redhat.com>
2022-01-13 12:10:13 -05:00
Adam King
b064b1fb4f
Merge pull request #44394 from melissa-kun-li/enable-autotune
Enable autotune for osd_memory_target on bootstrap

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
2022-01-13 12:06:46 -05:00
Adam King
f3ca30449d
Merge pull request #44306 from sebastian-philipp/normalize_image_digest-ambiguity
cephadm: deal with ambiguity within normalize_image_digest

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2022-01-13 12:03:50 -05:00
Josh Durgin
e1548ef36a doc/rbd/rbd-config-ref: add more detail on QoS settings
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2022-01-13 11:37:46 -05:00
gal salomon
2c8d1e6e18 handling arm64(arrow installation)
Signed-off-by: gal salomon <gal.salomon@gmail.com>
2022-01-13 17:47:23 +02:00
Venky Shankar
8a293c21f2
Merge pull request #44427 from lxbsz/client_cleanup
client: remove useless Lx cap check

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2022-01-13 20:34:54 +05:30
Venky Shankar
ca12900ce1
Merge pull request #44451 from lxbsz/wip-53750
mds: directly return just after responding the link request

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2022-01-13 20:33:58 +05:30
Casey Bodley
bc81cd1226
Merge pull request #44561 from cbodley/wip-51727
qa/rgw: add PG_DEGRADED cluster warnings to log-ignorelist

Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>
Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
2022-01-13 09:38:49 -05:00
Alfonso Martínez
6cd3729e27 mgr/dashboard: fix: get SMART data from single-daemon device
Return SMART data even when a device is only associated with a single daemon.

Fixes: https://tracker.ceph.com/issues/53858
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
2022-01-13 15:20:48 +01:00
Daniel Gryniewicz
65900a9df6
Merge pull request #44538 from dang/wip-dang-zipper-perf
RGW Zipper - don't load stats for every bucket load

Reviewed-by: Mark Nelson <mnelson@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2022-01-13 09:09:33 -05:00
Laura Flores
9ebf397972
Merge pull request #44002 from JoshSalomon/wip-primary-balancer 2022-01-13 07:45:54 -06:00
Clément Péron
96a8b4d846 cmake: dpdk: only append common dir if it has been found
Signed-off-by: Clément Péron <peron.clem@gmail.com>
2022-01-13 14:40:17 +01:00
Clément Péron
c37f15f54d cmake: dpdk: use STREQUAL and not EQUAL when comparing strings
Signed-off-by: Clément Péron <peron.clem@gmail.com>
2022-01-13 14:32:34 +01:00
Clément Péron
a24a4a0563 cmake: dpdk: fix typo in HINTS when looking for DPDK
Signed-off-by: Clément Péron <peron.clem@gmail.com>
2022-01-13 14:32:30 +01:00
Venky Shankar
8939d8c14b qa: adjust for MDSs to get deployed before verifying their availability
The check happens when some MDSs are *just* deployed by cephadm causing
jobs to fail with:

     Command failed on smithi016 with status 1: 'sudo /home/ubuntu/cephtest/cephadm \
     --image docker.io/ceph/ceph:v16.2.4 shell -c /etc/ceph/ceph.conf -k \
     /etc/ceph/ceph.client.admin.keyring --fsid 403bfcae-706b-11ec-8c32-001a4aab830c \
     -- bash -c \'ceph --format=json mds versions | jq -e ". | add == 4"\''

Fixes: http://tracker.ceph.com/issues/53857
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2022-01-13 18:25:58 +05:30
Xiubo Li
14f9840dbf mds: directly return just after responding the link request
Fixes: https://tracker.ceph.com/issues/53750
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2022-01-13 20:55:49 +08:00
Venky Shankar
6028ffbad8
Merge pull request #43286 from lxbsz/improve_setattr
client: buffer the truncate if we have the Fx caps

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2022-01-13 18:23:27 +05:30
Xiubo Li
422ac142de client: remove useless Lx cap check
Once here the new_caps must have the 'Ls' caps, the extra check
for 'Lsx' makes no sense.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
2022-01-13 20:46:17 +08:00
Venky Shankar
c1c1669527
Merge pull request #44229 from lxbsz/mds-buffix
mds: remove the duplicated or incorrect respond

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2022-01-13 18:16:13 +05:30
Venky Shankar
6b59fe1bec
Merge pull request #44397 from lxbsz/wip-53726
mds: dump tree '/' when the path is empty

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2022-01-13 18:15:24 +05:30
Venky Shankar
b52f86c8a5
Merge pull request #44422 from lxbsz/wip-51705
qa: do not use any time related suffix for *_op_timeouts

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2022-01-13 18:14:14 +05:30
Patrick Seidensal
7d7488018e monitoring: Add unit tests for OSD panels in ceph-cluster dashboard
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2022-01-13 13:27:55 +01:00
Patrick Seidensal
4a6b2c1dfb monitoring: fix display ceph_osd_in in Grafana panel
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2022-01-13 13:27:55 +01:00
Patrick Seidensal
18d3a71618 mgr/prometheus: Fix regression with OSD/host details/overview dashboards
Fix issues with PromQL expressions and vector matching with the
`ceph_disk_occupation` metric.

As it turns out, `ceph_disk_occupation` cannot simply be used as
expected, as there seem to be some edge cases for users that have
several OSDs on a single disk.  This leads to issues which cannot be
approached by PromQL alone (many-to-many PromQL erros).  The data we
have expected is simply different in some rare cases.

I have not found a sole PromQL solution to this issue. What we basically
need is the following.

1. Match on labels `host` and `instance` to get one or more OSD names
   from a metadata metric (`ceph_disk_occupation`) to let a user know
   about which OSDs belong to which disk.

2. Match on labels `ceph_daemon` of the `ceph_disk_occupation` metric,
   in which case the value of `ceph_daemon` must not refer to more than
   a single OSD. The exact opposite to requirement 1.

As both operations are currently performed on a single metric, and there
is no way to satisfy both requirements on a single metric, the intention
of this commit is to extend the metric by providing a similar metric
that satisfies one of the requirements. This enables the queries to
differentiate between a vector matching operation to show a string to
the user (where `ceph_daemon` could possibly be `osd.1` or
`osd.1+osd.2`) and to match a vector by having a single `ceph_daemon` in
the condition for the matching.

Although the `ceph_daemon` label is used on a variety of daemons, only
OSDs seem to be affected by this issue (only if more than one OSD is run
on a single disk).  This means that only the `ceph_disk_occupation`
metadata metric seems to need to be extended and provided as two
metrics.

`ceph_disk_occupation` is supposed to be used for matching the
`ceph_daemon` label value.

    foo * on(ceph_daemon) group_left ceph_disk_occupation

`ceph_disk_occupation_human` is supposed to be used for anything where
the resulting data is displayed to be consumed by humans (graphs, alert
messages, etc).

    foo * on(device,instance)
    group_left(ceph_daemon) ceph_disk_occupation_human

Fixes: https://tracker.ceph.com/issues/52974

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2022-01-13 13:27:55 +01:00
Yuval Lifshitz
b709091d81
Merge pull request #43995 from TRYTOBE8TME/wip-rgw-kafka-teuth-cleanup
qa/tasks: Checking for kafka cleanup
2022-01-13 11:57:03 +02:00
Patrick Seidensal
154d3525b1 mgr/prometheus: Refactoring: Introduce type aliases
Fixes: https://tracker.ceph.com/issues/52974

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2022-01-13 10:34:12 +01:00
Josh Salomon
86d6d110b4 osd, tools: refactor OSDMap::calc_pg_upmaps (simplify the code)
This is the first commit in a series of commits that aims at adding a primary balancer to Ceph and improving the current upmap balancer functionality. This first commit focuses on simplifying (refactoring) the code of `calc_pg_upmaps` so it is easier to change in the future. This PR keeps the existing functionality as-is and does not change anything but the code structure.

As part of the work is major refactoring of OSDMap::calc_pg_upmaps, the first thing is adding an --upmap-seed param to osdmaptool so test results can be compared without the random factor.

Other changes made:
    - Divided sections of `OSDMap::calc_pg_upmaps` into their own separate functions
    - Renamed tmp to tmp_osd_map
    - Changed all the occurances of 'first' and 'second' in the function to more meaningful names.

Signed-off-by: Josh Salomon <josh.salomon@gmail.com>
2022-01-13 02:25:14 +00:00
Yuri Weinstein
10be79e6c4
Merge pull request #43299 from markhpc/wip-age-binning-rebase-20210923
common/PriorityCache: Updated Implementation of Cache Age Binning

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
2022-01-12 16:54:23 -08:00
gal salomon
e3254b6306 parquet implementation:
(1) adding arrow/parquet to make(install is missing)
(2) s3select-operation contains 2 flows CSV and Parquet
(3) upon parquet-flow s3select processing engine is calling (via callback) to get-size and range-request, the range-requests are a-sync, thus the caller is waiting until notification.
(4) flow : execute --> s3select --(arrow layer)--> range-request --> GetObj::execute --> send_response_data --> notify-range-request --> (back-to) --> s3select
(5) on parquet flow the s3select is handling the response (using call-backs) because of aws-response-limitation (16mb)

add unique pointer (rgw_api); verify magic number for parquet objects; s3select module update
fix buffer-over-flow (copy range request)
change the range-request flow. now,it needs to use the callback parametrs (ofs & len) and not to use the element length
refactoring.  seperate the CSV flow from the parquet flow, a phase before adding conditional build(depend on arrow package installation)
adding arrow/parquet installation to debian/control
align s3select repo with RGW (missing API"s, such as get_error_description)
undefined reference to arrow symbol
fix comment: using optional_yield by value
fix comments; remove future/promise
s3select: a leak fix
s3select: fixing result production
s3select,s3tests : parquet alignments
typo: git-remote --> git_remote
s3select: remove redundant comma(end of projections); bug fix in parquet flow upon aggregation queries
adding arrow/parquet
editorial. remove blank lines
s3select: merged with master(output serialization,presto alignments)
merging(not rebase) master functionlities into parquet branch

(*) a dedicated source-files for s3select operation.
(*) s3select-engine: fix leaks on parquet flows, enabling allocate csv_object and parquet_object on stack
(*) the csv_object and parquet object allocated on stack (no heap allocation)

move data-members from heap to stack allocation, refactoring, separate flows for CSV and parquet. s3select: bug fix

conditional build: upon arrow package is installed the parquet flow become visable, thus enables to process parquet object. in case the package is not installed only CSV is usable

remove redundant try/catch, s3select: fix compile warning

arrow-devel version should be higher than 4.0.0, where arrow::io::AsyncContext become depecrated

missing sudo; wrong url;move the rm -f arrow.list

replace codename with $(lsb_release -sc)

arrow version should be >= 4.0.0; iocontext not exists in namespace on lower versions

RGW points to s3select/master

s3select submodule

sudo --> $SUDO

Signed-off-by: gal salomon <gal.salomon@gmail.com>
2022-01-12 23:15:21 +02:00
Casey Bodley
95544e802b qa/rgw: add PG_DEGRADED cluster warnings to log-ignorelist
and cover rgw/singleton suite

Fixes: https://tracker.ceph.com/issues/51727

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2022-01-12 15:56:38 -05:00
Ilya Dryomov
651f0fbc08
Merge pull request #43494 from majianpeng/enable-test-librbd-BlockGuard
test/librbd: re-enable BlockGuard test

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-12 21:50:00 +01:00
Samuel Just
39e0e7b8a3
Merge pull request #44478 from cyx1231st/wip-crimson-improve-log-3
crimson/os/seastore/../segment_manager: improve logs and validations

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Xuehan Xu <xuxuehan@360.cn>
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
2022-01-12 12:27:14 -08:00
Ilya Dryomov
b47965b577 qa/tasks/qemu: get the new Let's Encrypt root certificate
Fixes: https://tracker.ceph.com/issues/53841
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-12 20:53:45 +01:00
Ilya Dryomov
387be94794 qa/run_xfstests_qemu.sh: harden against wget failures
If wget fails (e.g. due to a certificate issue), it still creates
an empty file.  Then this file is marked executable, ./"${SCRIPT}"
immediately returns 0 and run_xfstests_qemu.sh exits successfully
without running a single xfstest.

This started on Sep 30, 2021 with the expiration of Let's Encrypt
root certificate -- all qemu jobs with "test: qa/run_xfstests_qemu.sh"
just booted the VM for a couple of seconds and reported success.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-12 20:53:45 +01:00