Commit Graph

89015 Commits

Author SHA1 Message Date
Alfredo Deza
e03be24a4f ceph-volume util do not use stdin for luminous
Signed-off-by: Alfredo Deza <adeza@redhat.com>
2018-08-01 07:15:37 -04:00
Nathan Cutler
e0042dd617 build/ops: unify command substitution in install-deps.sh
The $() form is preferable to `` because folks (like me) might be using
` as a keyboard shortcut to GNU Screen, causing havoc to ensue whenever
copy-pasting the ` character.

Signed-off-by: Nathan Cutler <ncutler@suse.com>
2018-08-01 12:36:31 +02:00
Nathan Cutler
f170775770 build/ops: streamline processing of WITH_SEASTAR env var
Quoting relevant portion of "man test":

    STRING equivalent to -n STRING

Signed-off-by: Nathan Cutler <ncutler@suse.com>
2018-08-01 12:36:09 +02:00
Ricardo Dias
77cdb2dccf
Merge pull request #23224 from votdev/rest_client_timeout
mgr/dashboard: Set timeout in RestClient calls

Reviewed-by: Lenz Grimmer <lgrimmer@suse.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
2018-08-01 10:02:09 +01:00
Ricardo Dias
47a50eeba5
Merge pull request #21881 from sebastian-philipp/dashboard-pool-patch
mgr/dashboard: Add Pool update endpoint 

Reviewed-by: Stephan Müller <smueller@suse.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
2018-08-01 10:00:10 +01:00
Volker Theile
969645efee mgr/dashboard: Modal dialogs are still open when UI is redirected to the login screen
Fixes https://tracker.ceph.com/issues/24570

Signed-off-by: Volker Theile <vtheile@suse.com>
2018-08-01 09:43:02 +02:00
Kefu Chai
09121bb95f
Merge pull request #23284 from tchaikov/wip-seastar-config
crimson/common: write configs synchronously on shard.0

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2018-08-01 14:58:43 +08:00
Brad Hubbard
ab91fe1225 doc/releases: Update releases to August '18
Mimic    13.2.1
Luminous 12.2.5, 12.2.6, 12.2.7
Jewel    10.2.11

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2018-08-01 14:29:29 +10:00
Enming.Zhang
78254b21a5 rgw-admin: add "--trim-delay-ms" introduction for 'sync error trim'
Signed-off-by: Enming.Zhang <enming.zhang@umcloud.com>
2018-07-31 23:23:47 -04:00
Sage Weil
c909c0aa35 Merge PR #22825 into master
* refs/pull/22825/head:
	common: FreeBSD does not have /etc/os-release

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-07-31 20:16:46 -05:00
Sage Weil
0837c9d816 Merge PR #22998 into master
* refs/pull/22998/head:
	filestore: add pgid in filestore pg dir split log message

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-07-31 20:16:41 -05:00
Sage Weil
569d475da3 Merge PR #23134 into master
* refs/pull/23134/head:
	common: check completion condition before waiting

Reviewed-by: Gregory Farnum <gfarnum@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2018-07-31 20:16:35 -05:00
Sage Weil
97a697e7f7 Merge PR #23223 into master
* refs/pull/23223/head:
	osd/PG: kill dead functions and related options
	iosd/osd_type: kill unused input ec_pool for iterate_mayberw_back_to
	common: kill dead options
	osd/PG: do not initialize up/acting twice
	osd/PG: clear missing_loc properly if last location is gone

Reviewed-by: Sage Weil <sage@redhat.com>
2018-07-31 20:16:30 -05:00
Kefu Chai
1cbd929806
Merge pull request #22990 from tchaikov/wip-cmake-link-static-libstdc++
cmake: fix "WITH_STATIC_LIBSTDCXX"

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2018-08-01 09:06:37 +08:00
Sage Weil
34646c6a65 Merge PR #22692 into master
* refs/pull/22692/head:
	doc/mgr/devicehealth: document devicehealth module
	doc/rados/operations/health-checks: document DEVICE_HEALTH* messages
	mgr/devicehealth: fix style for returns
	mgr/devicehealth: use constants for health warnings
	mgr/devicehealth: deal with as many daemons as we can until limit
	mgr/devicehealth: warn if too many daemons are expected to fail soon
	mgr/devicehealth: set primary-affinity 0 for failing devices
	msg/devicehealth: fix config options
	mgr/devicehealth: only fetch osdmap once from check_health
	mgr/devicehealth: revise health messages
	mgr/devicehealth: add 'device check-health' command and run periodically
	mgr/devicehealth: fix new options
	mgr/devicehealth: add helpers to life_expectancy_response()
	mgr/devicehealth: simplify setting defaults
	common/blkdev remove debug statements

Reviewed-by: John Spray <john.spray@redhat.com>
2018-07-31 17:23:48 -05:00
Patrick Donnelly
a2089173e3
Merge PR #23157 into master
* refs/pull/23157/head:
	Provided API to change umask
2018-07-31 14:50:50 -07:00
Sage Weil
0aba0f4bcd Merge PR #23354 into master
* refs/pull/23354/head:
	src/osd/PG.cc: remove redundant call to trim_log()

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-07-31 16:24:22 -05:00
Sage Weil
f09a87f902 doc/mgr/devicehealth: document devicehealth module
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
7ab8675fdf doc/rados/operations/health-checks: document DEVICE_HEALTH* messages
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
ccdfcc7e72 mgr/devicehealth: fix style for returns
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
1f8662a708 mgr/devicehealth: use constants for health warnings
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
b23295dbb9 mgr/devicehealth: deal with as many daemons as we can until limit
Process as many OSDs as we can until we hit the min_in_ratio.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
4cda89c9e3 mgr/devicehealth: warn if too many daemons are expected to fail soon
Refuse to mark out *all* OSDs.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
1c9ce2fc56 mgr/devicehealth: set primary-affinity 0 for failing devices
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
cba41b6f7c msg/devicehealth: fix config options
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
abdee9f679 mgr/devicehealth: only fetch osdmap once from check_health
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
c688c81afd mgr/devicehealth: revise health messages
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
8deec7445f mgr/devicehealth: add 'device check-health' command and run periodically
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Sage Weil
b9d547f012 mgr/devicehealth: fix new options
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Yaarit Hatuka
e1552de24b mgr/devicehealth: add helpers to life_expectancy_response()
- if mark_out_threshold is met we write to log.warn instead of raising a
  health warning.
- check that OSD is 'in' before calling mark_out().
- raise a health warning in case OSD is marked 'out' but still has PGs
  attached to it.
- cast thresholds default values to string.
- add SCSI multipath support to health warning message.
- change health warning message.

Signed-off-by: Yaarit Hatuka <yaarithatuka@gmail.com>
2018-07-31 14:08:53 -05:00
Sage Weil
2b86590a66 mgr/devicehealth: simplify setting defaults
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 14:08:53 -05:00
Yaarit Hatuka
8e542033a1 common/blkdev remove debug statements
Signed-off-by: Yaarit Hatuka yaarithatuka@gmail.com
2018-07-31 14:08:53 -05:00
Sage Weil
34698a2c62 Merge PR #23334 into master
* refs/pull/23334/head:
	pybind/rados/rados: do not pass prval from stack

Reviewed-by: John Spray <john.spray@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-07-31 14:08:37 -05:00
David Zafman
9d06ab3da9
Merge pull request #23217 from dzafman/wip-25085
osd: Allow repair of an object with a bad data_digest in object_info on all replicas

Reviewed-by: Sage Weil <sage@redhat.com>
2018-07-31 15:07:22 -04:00
Neha Ojha
283b0bde4a src/osd/PG.cc: remove redundant call to trim_log()
This change is motived by the failure tracked in
https://tracker.ceph.com/issues/25198. The failure highlights a case, when a
call to trim_log() after the PG has recovered, races with the previous op,
on a replica OSD. Since the previous operation has not completed, the
last_complete value for that OSD is not valid, when we try to trim the
log. It is also worth noting that the race is due to MOSDPGTrim going through
the strict queue as a peering message vs regular ops going through the
non-strict queue.

During the investigation of this bug, we noticed that, with
https://tracker.ceph.com/issues/23979, we allow pg log trimming to
happen on the primary and replicas, whenever we cross the upper bound of
the pg log. This also ensures that pg log trimming happens while processing
any new op.

Therefore, the function trim_log(), which earlier served the purpose of
trimming logs on the primary and replicas, just before the PG went into
the Recovered state, is no more required. This acted like a last line of
defense to trim logs, when we did not need the logs any more. But, this call
seems redundant now, because, we are limiting the pg log length at all times.

Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-07-31 11:43:02 -07:00
Tiago Melo
b4fc13d554 mgr/dashboard: Replace "npm install" with "npm ci"
"npm ci" is the recommended command to install dependencies
in a continuous integration system.

It will make sure node_modules is empty and that the version in
"package-lock.json" match the ones in "package.json"

Signed-off-by: Tiago Melo <tmelo@suse.com>
2018-07-31 18:07:37 +01:00
John Spray
911fe5ce4f mgrc: enable disabling stats via mgr_stats_threshold
Because we had a min_max setting with CRIT the maximum,
it wasn't possible to actually turn off stats entirely.

Fixes: http://tracker.ceph.com/issues/25197
Signed-off-by: John Spray <john.spray@redhat.com>
2018-07-31 17:51:07 +01:00
Tiago Melo
7299ee3555 mgr/dashboard: Add package-lock.json
This will make sure that, at anytime, when someone runs 'npm install'
the resulting packages that are installed are allways the same.

Signed-off-by: Tiago Melo <tmelo@suse.com>
2018-07-31 17:25:16 +01:00
Sage Weil
8e36f18cde pybind/rados/rados: do not pass prval from stack
The prval is a pointer to an int to write the final completion code of
the rados op.  This can't be on the stack since we immediately leave the
current scope after preparing the op (looong before we do the rados op).

We keep the tuple return value to avoid breaking users of this API
(devicehealth module, gnocchi at a minimum).

Fixes: http://tracker.ceph.com/issues/25175
Signed-off-by: Sage Weil <sage@redhat.com>
2018-07-31 09:41:05 -05:00
Alfredo Deza
96e7576400
Merge pull request #23348 from ceph/wip-rm24957
ceph-volume: adds test for `ceph-volume lvm list /dev/sda`

Reviewed-by: Alfredo Deza <adeza@redhat.com>
2018-07-31 09:56:05 -04:00
Andrew Schoen
ef10886f1e ceph-volume: adds a unit test for lvm list /dev/sda
This test is to prove that the issue from
http://tracker.ceph.com/issues/24957 was fixed
by http://tracker.ceph.com/issues/24784

When running lvm list against a raw device it should handle
gracefully the situation where there are multiple PVs with the
name of the given device.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-07-31 08:50:28 -05:00
Andrew Schoen
37ed1be08b ceph-volume: move pvolumes fixture into conftest.py
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
2018-07-31 08:50:27 -05:00
xie xingguo
d9123158d1 osd/OSD: fix HeartbeatInfo.is_healthy() check
Delay to declared to be healthy until we have received the first
replies from both front and back connections.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-07-31 20:58:53 +08:00
xie xingguo
aba603736c osd/OSD: use first_tx to calculate failed_for
If we never hear any replies from a heartbeat peer, use first_tx
to calculdate failed_for, which is more accurate.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-07-31 20:58:52 +08:00
xie xingguo
477774ceee osd: refactor heartbeat health check
The original logic will reuse the timestamp which we send pings to
the specific heartbeat peer to update the last_rx_front[back] field
on receiving the corresponding replies, which later shall be honoured
as the exact time we succeed in getting the corresponding replies and
is used to calculate the heartbeat latency and determine whether the
relevant peer is dead.

However this is not accurate enough as there may be a delay between
we receive a reply and call heartbeat_check(). We can eliminate
the delay by introducing a map to track the ping-history here,
each entry of which consists of three elements:

1. "tx_time", worked as the map key, indicates the exact timestamp
   we send pings.
2. "deadline", indicates we shall receive all replies by then,
   otherwise we consider this peer as "dead".
3. "unacknowledged", indicates how many pings for the corresponding
   ping are still unacknowledged. The initial value is 2(as we send
   two pings from the front and back side for each peer).

We insert an item into the map on every time we sending out a ping, and
decrease the "unacknowledged" counter by 1 each time we get a reply from
the tracked ping. If "unacknowledged" drops to 0, we know all the replies
have been successfully collected and we can safely erase the relevant
item from the map as well as the earlier sent ones,  if there is any.

By comparing the current timestamp with the oldest deadline, we can now
make a much accurate decision about whether the corresponding peer is
healthy or not. And by setting last_rx_* to the timestamp we receiving
the reply, the lower bound when we can no longer hear a reply from the
corresponding connection is also much clear now.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-07-31 20:58:52 +08:00
Jos Collin
0f3442a35e
doc: fix the broken urls
Fixed the broken urls.

Fixes: http://tracker.ceph.com/issues/25185
Signed-off-by: Jos Collin <jcollin@redhat.com>
2018-07-31 09:06:51 +05:30
Jos Collin
b19e239923
doc: add radosgw reference label
Added radosgw doc reference label

Fixes: http://tracker.ceph.com/issues/25185
Signed-off-by: Jos Collin <jcollin@redhat.com>
2018-07-31 09:06:21 +05:30
Kefu Chai
cec5a23f69
Merge pull request #23336 from noahdesu/vstart-dashboard-no-rbd
vstart: disable dashboard when rbd not built

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-07-31 11:05:02 +08:00
Neha Ojha
1b6dafb351 osd/PGLog.cc: use lgeneric_subdout instead of generic_dout
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-07-30 16:42:55 -07:00
Patrick Donnelly
33910303cb
ceph_volume_client: use integer division for pg_num
Otherwise a float is sent to the manager which is not the invalid format.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2018-07-30 16:12:48 -07:00