Commit Graph

97840 Commits

Author SHA1 Message Date
Sage Weil
aa4adf8cc4 Merge PR #27707 into master
* refs/pull/27707/head:
	common/util: handle long lines in /proc/cpuinfo

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
2019-04-23 13:05:44 -05:00
Sage Weil
7863d7c6c4 osd: take heartbeat_lock when calling heartbeat()
Fixes: http://tracker.ceph.com/issues/39439
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-23 13:04:13 -05:00
Patrick Donnelly
707ae12a29
Merge PR #27537 into master
* refs/pull/27537/head:
	mds: better output of 'ceph health detail'

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-23 10:53:52 -07:00
Patrick Donnelly
97d6e9948e
Merge PR #27511 into master
* refs/pull/27511/head:
	mds: fix SnapRealm::resolve_snapname for long name

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-23 10:52:41 -07:00
Patrick Donnelly
f1defa179a
Merge PR #27077 into master
* refs/pull/27077/head:
	test: check listattr for snapshot btime entry
	test: extend LibCephFS.Xattrs test
	client: remove unused vxattr length helpers
	client: fix _listxattr() vxattr buffer length calculation
	test: add libcephfs snap.btime xattr test
	client: add ceph.snap.btime vxattr
	mds: carry snapshot creation time with InodeStat

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2019-04-23 10:50:49 -07:00
Casey Bodley
a487a86342
Merge pull request #27725 from theanalyst/perf-counter-names
rgw: sync counters: drop spaces from counter names

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
2019-04-23 12:48:55 -04:00
Abhishek Lekshmanan
97fb4eeae0 rgw: sync counters: drop spaces from counter names
Since this might break modules like prometheus and general json processing tools
aren't too happy with spaces.

Fixes: https://tracker.ceph.com/issues/39434
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
2019-04-23 17:29:44 +02:00
alfonsomthd
843802f43e mgr/prometheus: replace whitespaces in metrics' names
Fixes: https://tracker.ceph.com/issues/39434

Signed-off-by: Alfonso Martínez <almartin@redhat.com>
2019-04-23 16:20:47 +02:00
Jason Dillaman
15c7294312
Merge pull request #27703 from tchaikov/wip-rbd-replay-denc
rbd_replay: call the member decode() explicitly

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-04-23 10:07:26 -04:00
Lenz Grimmer
ce2c91f989
mgr/dashboard: Clean up TableComponent tests and code (#26784)
mgr/dashboard: Clean up TableComponent tests and code

Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
2019-04-23 15:54:23 +02:00
Alfredo Deza
7ab6a39005 tools: pin the version of breathe that works with Python2
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2019-04-23 09:09:09 -04:00
Kefu Chai
3ca9bba4ef rbd_replay: call the member decode() explicitly
otherwise, the one defined using WRITE_RAW_ENCODER is called instead.
so in this change, rename the the member function which happens to have
the same signature with

decode(type &v, ::ceph::bufferlist::const_iterator& p)

where `type` is `__u8`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-04-23 20:33:13 +08:00
Jason Dillaman
7c5dcf63f3 rbd-mirror: clear out bufferlist prior to listing mirror images
The second call to list mirrored images will fail deep within the
msgr code due to a "bad crc in data" error.

Fixes: http://tracker.ceph.com/issues/39407
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2019-04-23 08:24:52 -04:00
Jason Dillaman
b7c7a4f39e
Merge pull request #27521 from trociny/wip-rbd-remove-clone_v2-parent
librbd: optionally move parent image to trash on remove

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-04-23 08:11:38 -04:00
Jason Dillaman
0834078e77
Merge pull request #27484 from majianpeng/rbd-nbd
rbd-nbd: sscanf return 0 mean not-match

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-04-23 08:11:07 -04:00
Sage Weil
67fadc711a doc/rados/operations/devices: document device failure prediction
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-23 07:10:53 -05:00
Casey Bodley
2542abc0bb
Merge pull request #27697 from cbodley/wip-rgw-bucket-list-unordered
rgw: cls_bucket_list_unordered lists a single shard

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2019-04-23 08:07:13 -04:00
Sage Weil
b9865ad800 osd: make use of pg history and past_intervals in pg_create2 messages
If we get a mismatched epoch and past_intervals, error out early, or else
we'll end up asserting later in the PastIntervals code.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-23 07:03:47 -05:00
Ashish Singh
aeb1c11334 mgr/dashboard: Replace IP address validation with Python standard library functions
Instead of self-written validation methods to validate IPv4 and IPv6 addresses.
Use Python's standard library functions `ipaddress`.

Signed-off-by: Ashish Singh <assingh@redhat.com>
2019-04-23 17:20:23 +05:30
Kefu Chai
f15f69d521
Merge pull request #27713 from tchaikov/wip-24842
doc/rbd/rbd-cloudstack: update disk offering URL to new docs

Reviewed-by: Wido den Hollander <wido@42on.com>
2019-04-23 19:14:55 +08:00
xie xingguo
bae2231cc5 qa: add crush-node-flags test
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-23 14:20:17 +08:00
Kefu Chai
779eccd1fc doc/rbd/rbd-cloudstack: update disk offering URL to new docs
point hyperlinks to latest

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-04-23 13:26:47 +08:00
xie xingguo
ee4d718d0f mon/OSDMonitor: remove crush node flags too on "crush rm"
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-23 12:40:50 +08:00
xie xingguo
01916b99c3 mon/OSDMonitor: make per-OSD no{out,down,in,out} flags prior to CRUSH nodes
This way we 'll be more compatible with older versions, and can effectively
reduce the map size for large clusters.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-23 12:40:50 +08:00
David Zafman
7e77898001 test: Divergent testing of _merge_object_divergent_entries() cases
Case 1: A more recent update exists
Case 2: The first entry in the divergent sequence is a create
Case 3  NOT TESTED - Ohject currently missing
Case 4: We can rollback all of the entries
Case 5: We cannot rollback at least 1 of the entries

Support starting OSDs even when "noup" is set (don't wait for up).
Move create_ec_pool() to ceph-helpers.sh

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-22 18:50:24 -07:00
xie xingguo
aad5d47be6 osd/PG: fix last_complete re-calculation on splitting
We add hard-limit for pg_logs now, which means we might keep trimming
old log entries irrespective of pg's current missing_set. This as a
result can cause the last_complete pointer moving far ahead of the real
on-disk version (the oldest need of missing_set, for instance) the
corresponding pg should have on splitting:

```
2019-04-19 06:41:52.559247 7efd4725c700 10 osd.2 271 Splitting pg[5.6( v 270'943 lc 0'0 (238'300,270'943] local-lis/les=250/251 n=943 ec=223/223 lis/c 250/223 les/251/224/0 250/271/229) [5,2] r=1 lpr=271 pi=[223,271)/4 crt=270'943 unknown NOTIFY m=518 mbc={}] into 5.16
2019-04-19 06:41:52.561413 7efd4725c700 10 osd.2 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=943 ec=223/223 lis/c 250/223 c/f 251/224/0 250/271/229) [5,2] r=1 lpr=271 pi=[223,271)/4 crt=270'943 lcod 0'0 unknown NOTIFY m=261 mbc={}] release_backoffs [MIN,MAX)
```

For the above example, parent's last_complete cursor changed from **0'0** to
**238'300** directly due to the effort of trying to catch up the oldest
log entry changing when splitting was done. However, back into v12.2.9 primary
would still reference shard's last_complete field when trying to figure out all
possible locations of a currently missing object (see PG::MissingLoc::add_source_info):

```c++
  if (oinfo.last_complete < need) {
    if (omissing.is_missing(soid)) {
      ldout(pg->cct, 10) << "search_for_missing " << soid << " " << need
                         << " also missing on osd." << fromosd << dendl;
      continue;
    }
  }
```

Hence a wrongly calculated last_complete could then make primary mis-consider
that a specific shard might have the authoritative object it currently
looking for:

```
2019-04-19 06:41:52.904163 7fd4cfb5a700 10 osd.5 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/
c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 peering m=16 mbc={}] proc_replica_log for osd.2: 5.6( v 270'943 lc 238'30
0 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/c/f 251/224/0 250/271/229) log((249'563,270'943], crt=270'943) missing(261 may_include_del
etes = 1)
2019-04-19 06:41:52.904645 7fd4cfb5a700 20 osd.5 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/
c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 peering m=16 mbc={}]  after missing 5:624c3a7a:::benchmark_data_smithi190
_39968_object1382:head need 226'110 have 0'0
2019-04-19 06:41:53.567820 7fd4d035b700 10 osd.5 pg_epoch: 272 pg[5.6( v 270'943 lc 0'0 (238'300,270'943] local-lis/les=271/272 n=471 ec=223/223 lis/c 250/223 les/c/f
251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 unknown m=16 u=13 mbc={255={(1+0)=220,(2+0)=28}}] search_for_missing 5:624c3a
7a:::benchmark_data_smithi190_39968_object1382:head 226'110 is on osd.2
```

note that ```5:624c3a7a:::benchmark_data_smithi190_39968_object1382:head 226'110```
was actually missing on both primary and shard osd.2 whereas primary insisted that
object should exist on shard osd.2!

https://github.com/ceph/ceph/pull/26175 posted an indirect fix
for the above problem by ignoring last_complete when checking the missing set,
but it should generally make more sense to fill in the last_complete field correctly
whenever possible.
Hence coming this additional fix.

Fixes: http://tracker.ceph.com/issues/26958
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-23 08:57:23 +08:00
Sage Weil
24e7c39d11 Merge PR #27708 into master
* refs/pull/27708/head:
	doc/governance: add cbodey

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2019-04-22 17:07:27 -05:00
Sage Weil
9e5ee15d3e Merge PR #27693 into master
* refs/pull/27693/head:
	mgr/telemetry: default to reports every 24h; lower minimum
	mgr/telemetry: exclude hostname field in crash reports

Reviewed-by: Dan Mick <dmick@redhat.com>
2019-04-22 17:06:42 -05:00
Sage Weil
cf1328d959 doc/governance: add cbodey
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 15:37:11 -05:00
Sage Weil
b02e81935c common/util: handle long lines in /proc/cpuinfo
Fixes: http://tracker.ceph.com/issues/38296
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 12:51:41 -05:00
Sage Weil
cf9dfa79c1 doc/dev/erasure-coded-pool: update
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 12:36:29 -05:00
Jason Dillaman
b184acd218 qa/workunits/rbd: use more recent qemu-iotests that support Bionic
Fixes: http://tracker.ceph.com/issues/24668
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2019-04-22 12:46:47 -04:00
Sage Weil
69c7a4d24e doc/rados/operations/erasure-code*: update default ec profile references
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 11:20:55 -05:00
Ali Maredia
e972c80ecc rgw: thread DoutPrefixProvider into fetch_remote_obj
This is for the AtomicObjProcessor declared there

Signed-off-by: Ali Maredia <amaredia@redhat.com>
2019-04-22 11:30:47 -04:00
Ali Maredia
0242d7400e rgw: log refactoring for putobj_processor
Signed-off-by: Ali Maredia <amaredia@redhat.com>
2019-04-22 11:30:47 -04:00
Casey Bodley
d37d0339ff rgw: cls_bucket_list_unordered lists a single shard
CLSRGWIssueBucketList sends the request to every shard, but this loop
intended to list only the current_shard

Fixes: http://tracker.ceph.com/issues/39393

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2019-04-22 11:20:55 -04:00
Casey Bodley
cd1fc96c5c cls/rgw: expose cls_rgw_bucket_list_op for single shard
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2019-04-22 11:20:49 -04:00
Mykola Golub
34112907a7
Merge pull request #27694 from dillaman/wip-39386
qa/suites/rbd: added writearound cache test permutations

Reviewed-by: Mykola Golub <mgolub@suse.com>
2019-04-22 18:13:02 +03:00
Mykola Golub
5bc54e763d
Merge pull request #27682 from dillaman/wip-39031
librbd: async open/close should free ImageCtx before issuing callback

Reviewed-by: Mykola Golub <mgolub@suse.com>
2019-04-22 18:12:30 +03:00
Sage Weil
6b97d72ed9
Merge pull request #24744 from liewegas/wip-stale-prs
.github/stale.yml: warn at 60, close at 90; adjust message
2019-04-22 09:58:24 -05:00
Sage Weil
712987d533 mgr/telemetry: default to reports every 24h; lower minimum
Allow more frequent telemetry reports.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 09:51:11 -05:00
Sage Weil
4510f33656 mgr/telemetry: exclude hostname field in crash reports
On some systems the hostname is a fully-qualified domain name and
(even when not a fqdn) may inadvertantly allow the cluster to be
identified.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 09:51:11 -05:00
Sage Weil
85d9dc6b10 mon/OSDMonitor: track history and past_intervals for creating pgs
PG create messages from mons are the last case where the OSD may have to
scan an unbounded number of old maps in order to construct a valid
pg_history_t and PastIntervals.  Try to avoid making that a difficult
case by maintaining those structures on the monitor.

It is still possible that the mon may send a pg create message to the OSD
and it sits in a message queue for a very long time, but this would be a
very difficult situation to get into, and is no different from inter-OSD
messages that include history and past_intervals.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 09:33:05 -05:00
Sage Weil
5071f1db54 osd/osd_types: make PastIntervals pi_compact_rep print participants
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 09:32:35 -05:00
Sage Weil
89ef2b0857 osd/osd_types: take bare const OSDMap * to check_new_interval
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 09:32:35 -05:00
Sage Weil
e2552110af osd/osd_types: add pg_history_t ctor that takes creation epoch+stamp
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 09:32:35 -05:00
Sage Weil
6578fd6248 mon/Monitor: handle v1 call into handle_auth_request
A v1 connection should "succeed" at this point because the authentication
happens via MAuth messages.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 07:07:37 -05:00
Sage Weil
d65d0c77d0 msg/Connection: add is_msgr2()
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 07:07:36 -05:00
Sage Weil
64eddc4d75 mon/MonClient: tolerate lack of authorizer for some dispatchers
This is the equivalent of b8d1c80370, but
in the new auth framework.  OSD heartbeats prior to nautilus do not
add authorizers to the heartbeat channel.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-22 07:07:36 -05:00
Sage Weil
cac90d0fd9 Merge PR #27605 into master
* refs/pull/27605/head:
	mon/OSDMonitor: osd add-no{up,down,in,out} - remove state checker

Reviewed-by: Sage Weil <sage@redhat.com>
2019-04-22 07:04:11 -05:00