Commit Graph

79091 Commits

Author SHA1 Message Date
Chang Liu
6597e048af osd: remove duplicated function ec_pool in pg_pool_t
Signed-off-by: Chang Liu <liuchang0812@gmail.com>
2017-09-30 16:03:25 +08:00
Xie Xingguo
cd6b9830d1 Merge pull request #15199 from xiexingguo/wip-object-logic-size
osd: fine-grained statistics of logical object space usage

Reviewed-by: Sage Weil <sage@redhat.com>
2017-09-30 14:50:32 +08:00
Kefu Chai
259b3c1ead Merge pull request #16884 from liewegas/wip-20919
osd/PrimaryLogPG: send requests to primary on cache miss

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2017-09-30 12:14:14 +08:00
Kefu Chai
3dfe209499 Merge pull request #17955 from asomers/bin_bash2
test: fix bash path in shebangs (part 2)

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-09-30 12:13:35 +08:00
Kefu Chai
709c77b2b5 Merge pull request #17985 from dzafman/wip-21327
ceph-objectstore-tool: "$OBJ get-omaphdr" and "$OBJ list-omap" scan all pgs instead of using specific pg

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-09-30 12:12:25 +08:00
Kefu Chai
d877b0b07d Merge pull request #18005 from jcsp/wip-21577
tools: update monstore tool for fsmap, mgrmap

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-09-30 12:11:21 +08:00
Kefu Chai
583f62bd98 Merge pull request #18015 from tchaikov/wip-kill-warnings
osd,os/bluestore: kill clang analyzer warnings

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Sage Weil <sage@redhat.com>
2017-09-30 12:10:49 +08:00
Kefu Chai
c76742b748 Merge pull request #18018 from tchaikov/wip-ceph-disk-cleanup
ceph-disk: more precise error message when a disk is specified

Reviewed-by: Loic Dachary <ldachary@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2017-09-30 12:10:06 +08:00
Kefu Chai
48582cb00e Merge pull request #18034 from tchaikov/wip-options
common/options: pass by reference and use user-literals for size

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
2017-09-30 12:07:15 +08:00
Li Wang
73e70a553f client: assert(false)->ceph_abort()
Signed-off-by: Li Wang <laurence.liwang@gmail.com>
2017-09-30 02:30:51 +00:00
ownedu
92c3499f7b msg/async/rdma: fix Tx buffer leakage which can introduce "heartbeat no
reply" due to out of Tx buffers, this can be reproduced by marking some
OSDs down in a big Ceph cluster, say 300+ OSDs.

rootcause: when RDMAStack wants to delete faulty connections there are
chances that those QPs still have inflight CQEs, thus inflight Tx
buffers; without waiting for them to complete, Tx buffer pool will run
out of buffers finally.

fix: ideally the best way to fix this bug is to destroy QPs gracefully
such as to_dead(), we now just reply on the number of Tx WQE and CQE to
avoid buffer leakage; RDMAStack polling is always running so we are safe
to simply bypass some QPs that are not in 'complete' state.

Signed-off-by: Yan Lei <yongyou.yl@alibaba-inc.com>
2017-09-30 10:14:39 +08:00
Jos Collin
ded96388da doc: Fix URL in Licensing
Fixed the unnecessary URL format in the text. Modify the URL formatting to highlight only the file name seems better.

Signed-off-by: Jos Collin <jcollin@redhat.com>
2017-09-30 07:40:01 +05:30
xie xingguo
6a990115c2 osd/PrimaryLogPG: clear pin_stats_invalid bit properly on scrub-repair completion
We have done audit of stats and the numbers should be all ok by then.
Actually the pin_stats_invalid bit is never set true, so forgetting
to clear pin_stats_invalid here generally does harm. Also we could simply
kill the pin_stats_invalid bit instead but let's not bother with that
complexity either.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-09-30 09:59:53 +08:00
Marcus Watts
c11485e1b3 radosgw: fix awsv4 header line sort order.
The awsv4 signature calculation includes a list of header lines, which
are supposed to be sorted.  The existing code sorts by header name, but
it appears that in fact it is necessary to sort the whole header *line*,
not just the field name.  Sorting by just the field name usually works,
but not always.  The s3-tests teuthology suite includes
s3tests.functional.test_s3.test_object_header_acl_grants
s3tests.functional.test_s3.test_bucket_header_acl_grants
which include the following header lines,

x-amz-grant-read-acp:id=56789abcdef0123456789abcdef0123456789abcdef0123456789abcdef01234
x-amz-grant-read:id=56789abcdef0123456789abcdef0123456789abcdef0123456789abcdef01234
x-amz-grant-write-acp:id=56789abcdef0123456789abcdef0123456789abcdef0123456789abcdef01234
x-amz-grant-write:id=56789abcdef0123456789abcdef0123456789abcdef0123456789abcdef01234

in this case, note that ':' needs to sort after '-'.

Fixes: http://tracker.ceph.com/issues/21607

Signed-off-by: Marcus Watts <mwatts@redhat.com>
2017-09-29 17:04:08 -04:00
Jason Dillaman
ae1530bbfb Merge pull request #17971 from idryomov/wip-krbd-exclude-shared-298
qa/suites/krbd: exclude shared/298

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-09-29 16:07:59 -04:00
Jason Dillaman
5a3baf1bd8 librbd: snapshots should be created/removed against data pool
Fixes: http://tracker.ceph.com/issues/21567
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-09-29 15:11:38 -04:00
Radoslaw Zarzynski
90bbcd7cbb os/bluestore: drop support for non-bulky extent release.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2017-09-29 20:30:53 +02:00
Radoslaw Zarzynski
5e1e6f9393 os/bluestore: release txc's extents in bulky manner.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2017-09-29 20:30:53 +02:00
Radoslaw Zarzynski
16906c0190 os/bluestore: BlueFS releases disk extents in bulky manner.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2017-09-29 20:30:53 +02:00
Jason Dillaman
ede691323d librbd: avoid dynamically refreshing non-atomic configuration settings
Fixes: http://tracker.ceph.com/issues/21529
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-09-29 12:22:57 -04:00
David Zafman
2f466f8b26 Merge pull request #17920 from dzafman/wip-21382
Erasure code recovery should send additional reads if necessary

Fixes: http://tracker.ceph.com/issues/21382

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-09-29 09:04:43 -07:00
Patrick Donnelly
b37c7f7db7
qa: relax cap expected value check
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2017-09-29 08:48:14 -07:00
Dongsheng Yang
b10d26dfa8 librbd: notify watcher when updating image metadata
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2017-09-29 11:45:55 -04:00
Haomai Wang
fd704ab8b3 Merge pull request #18036 from ownedu/wip-fix-asyncrdma-coredump
msg/async/rdma: fix a potential coredump when handling tx_buffers under heavy RDMA

Reviewed-by: Haomai Wang <haomai@xsky.com>
2017-09-29 10:33:14 -05:00
Adam Wolfe Gordon
57745b9439 doc: Update rbd-mirror docs to reflect data pool selection changes
Signed-off-by: Adam Wolfe Gordon <awg@digitalocean.com>
2017-09-29 15:32:38 +00:00
Adam Wolfe Gordon
2e239c0551 rbd-mirror: Improve data pool selection when creating images
Previously we used the source image's data pool name
unconditionally. There were two problems with that:

1. If a pool with the same name didn't exist locally, creation of the
   local image would fail.
2. If the local pool had a default data pool configured it would be
   ignored.

Change local image creation so it uses the default pool if configured,
and uses the remote pool name only if a pool with that name exists
locally. If neither of those is true, leave the data pool unset.

Signed-off-by: Adam Wolfe Gordon <awg@digitalocean.com>
2017-09-29 15:14:03 +00:00
Radoslaw Zarzynski
d11753ef4d os/bluestore: make the BitMapAllocator aware about bulk releases.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2017-09-29 17:09:29 +02:00
Radoslaw Zarzynski
cb0420ea0b os/bluestore: make the StupidAllocator aware about bulk releases.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2017-09-29 17:09:29 +02:00
Radoslaw Zarzynski
0b41d2372e os/bluestore: extend the Allocator interface with bulk releases.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2017-09-29 17:09:29 +02:00
Jos Collin
41c4b3dbbe doc: Fix typo and URL in Submitting patches
Dropped the repeated 'the' in the paragraph and Fixed the unnecessary URL format in the text.

Signed-off-by: Jos Collin <jcollin@redhat.com>
2017-09-29 20:20:33 +05:30
xie xingguo
1e4263fa5d osd/PrimaryLogPG: allow trimmed read for OP_CHECKSUM
Normal reads support trimmed read length, and so shall checksums!

This fixes occasionally failure of rados/thrash test scripts, e.g.:
(1) create object using WriteOp with random generated length
(2) normal writes might accompany with TruncOp of randomized chosen truncate_size
(3) for ReadOp, pick a random 'length' to read, and do checksum simultaneously
    for the same range ([0, 'length']) to read too.

Since the 'length' for reading is randomized chosen, it might
exceed the current object size, and hence causing an EOVERFLOW error.

Related issues:
http://qa-proxy.ceph.com/teuthology/xxg-2017-09-22_01:52:47-rados-wip-object-logic-size-distro-basic-smithi/1657337
http://qa-proxy.ceph.com/teuthology/xxg-2017-09-22_14:14:19-rados-wip-object-logic-size-distro-basic-smithi/1658015

Fix the above problems by keeping pace with normal reads.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-09-29 21:16:19 +08:00
xie xingguo
421aee3aea osd: fine-grained statistics of logical object space usage
To test this change, we create an image of 5GB and do rbd bench write of 1GB:
./bin/rbd create bar -s 5120 && ./bin/rbd bench --io-type write --io-size 32K --io-total 100M --io-pattern rand  rbd/bar

Below is the test result.

Was:

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27052M        3859M         12.49
POOLS:
    NAME                  ID     USED      %USED     MAX AVAIL     OBJECTS
    rbd                   0      3191M     26.36         8914M        1174
    cephfs_data_a         1          0         0         8914M           0
    cephfs_metadata_a     2       2246         0         8914M          21

Now:

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    30911M     27050M        3861M         12.49
POOLS:
    NAME                  ID     USED        %USED     MAX AVAIL     OBJECTS
    rbd                   0      101216k      1.10         8913M        1178
    cephfs_data_a         1            0         0         8913M           0
    cephfs_metadata_a     2          892         0         8913M          21

E.g., this change can make "osd pool set-quota max_bytes" work nicely.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-09-29 21:16:19 +08:00
Mykola Golub
7dc2f78215 Merge pull request #17979 from dillaman/wip-21559
rbd-mirror: forced promotion can result in incorrect status

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
2017-09-29 15:20:12 +03:00
Sage Weil
faefd681b5 Merge pull request #18035 from tchaikov/wip-bit-cleanup
script/build-integration-branch: python3 compatible and pep8 clean
2017-09-29 07:03:16 -05:00
Ramana Raja
baf3b88800 ceph_volume_client: fix setting caps for IDs
... that have empty OSD and MDS caps. Don't add a ',' at the
start of OSD and MDS caps.

Fixes: http://tracker.ceph.com/issues/21501
Signed-off-by: Ramana Raja <rraja@redhat.com>
2017-09-29 17:06:05 +05:30
Joao Eduardo Luis
fd04ff81eb Merge pull request #17846 from jecluis/wip-21300
mon/MgrMonitor: read cmd descs if empty on update_from_paxos()

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-09-29 11:25:56 +01:00
Li Wang
3892d1beeb ceph_mon: use default port when not specified
If the monitor port is not specified in ceph.conf,
it will use 0. However, monmaptool will use
default port in creating monitor map in such case.
The inconsistent behavior is not graceful, and
every time monitor loading, it warns 'mon addr
config option does not match monmap file', which
is confusing.

Signed-off-by: Li Wang <laurence.liwang@gmail.com>
2017-09-29 08:03:11 +00:00
David Zafman
390d12f71a osd: For recovery get all possible shards to read on errors
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-28 23:31:25 -07:00
David Zafman
1235810c2a osd: Allow recovery to send additional reads
For now it doesn't include non-acting OSDs
Added test for this case

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-28 23:31:18 -07:00
David Zafman
f92aa6c824 test: Allow modified options to existing setup functions
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-28 23:31:18 -07:00
David Zafman
43e3206de2 test: Use feature to get last array element
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-09-28 23:31:18 -07:00
ownedu
f54a522ce1 Merge branch 'wip-fix-asyncrdma-coredump' of https://github.com/ownedu/ceph into wip-fix-asyncrdma-coredump 2017-09-29 14:00:29 +08:00
ownedu
24280897be msg/async/rdma: fix a potential coredump when handling tx_buffers under
heavy RDMA traffic, there are chances to access a current_chunk which can
be beyond the range of pre-allocated Tx buffer pool thus causes a coredump

Signed-off-by: Yan Lei <yongyou.yl@alibaba-inc.com>
2017-09-29 13:58:37 +08:00
ownedu
59c54caec0 Fix a potential coredump when handling tx_buffers under heacy RDMA
traffic, there are chances to access a current_chunk which can be beyond the
range of pre-allocated Tx buffer pool thus causes a coredump.

Signed-off-by: Yan Lei <yongyou.yl@alibaba-inc.com>
2017-09-29 13:50:23 +08:00
xie xingguo
f90bd4b957 common/interval_set: override subset_of for given range
E.g.:
subset_of([5~10,20~5], 0, 100)  -> [5~10,20~5]
subset_of([5~10,20~5], 5, 25)   -> [5~10,20~5]
subset_of([5~10,20~5], 1, 10)   -> [5~5]
subset_of([5~10,20~5], 8, 24)   -> [8~7, 20~4]

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2017-09-29 12:30:46 +08:00
Kefu Chai
4e8e311694 script/build-integration-branch: python3 compatible and pep8 clean
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-29 12:05:49 +08:00
Kefu Chai
6788ea0e9a common/options: use user-defined literals for sizes
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-29 11:14:25 +08:00
Kefu Chai
d2364bde9b common/options: pass by reference
for better performance. and for better consistency, pass by
std::initializer_list.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-29 11:09:35 +08:00
Kefu Chai
61a61f9968 tools/rados: do not assign never read variable
actually "r" is always 0 in that branch. so it's a no-op.

this silences the clang analyzer warning of

	Value stored to 'r' is never read

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-29 11:01:53 +08:00
Kefu Chai
bd4db59197 os/bluestore: do not assign never read variable
shrink the lexical scope of "csum_order" and do not set it if it is
never read.

this silences the clang analyzer warning of:

	Value stored to 'csum_order' is never read

Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-09-29 11:01:53 +08:00