Commit Graph

71097 Commits

Author SHA1 Message Date
John Spray
d98e19fdbd Merge pull request #14589 from jcsp/wip-19640
client: refine fsync/close writeback error handling

Reviewed-by: Jeff Layton <jlayton@redhat.com>
2017-04-18 12:58:37 +01:00
John Spray
a2a100dc13 Merge pull request #14272 from jcsp/wip-vstart-fixup
qa: fix test_standby_for_invalid_fscid with vstart_runner

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-04-18 12:50:20 +01:00
John Spray
a765f249f1 Merge pull request #14196 from jcsp/wip-cephfs-relnotes
PendingReleaseNotes: recent cephfs changes
2017-04-18 12:50:04 +01:00
John Spray
612fe8422b Merge pull request #14105 from jcsp/wip-pretty-tell
mds: pretty json from `tell` commands

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2017-04-18 12:49:39 +01:00
John Spray
a762ad4d7f Merge pull request #14104 from jcsp/wip-18509
mds: include advisory `path` field in damage

Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-04-18 12:48:52 +01:00
John Spray
c4f8f61816 Merge pull request #14164 from jcsp/wip-16842-mitigation
Mitigation for #16842, validate sessions after load

Reviewed-by: Yan, Zheng <zyan@redhat.com>
2017-04-18 12:48:20 +01:00
John Spray
1a69bec52f client: refine fsync/close writeback error handling
Previously, errors stuck indelibly to the inode, which
meant that a close call would see an error even if the
user already dutifully fsync()'d and handled it.

We should emit each error only once per file handle.

Signed-off-by: John Spray <john.spray@redhat.com>
2017-04-18 07:47:10 -04:00
Willem Jan Withagen
9ff401e65b /workunits/cephtool/test.sh: Be more liberal in testing health-output.
Sometimes I get output like:
   HEALTH_ERR 2 pgs stuck unclean; Full ratio(s) out of order

Which goes away over time. So it is a transit issue

Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
2017-04-18 13:43:54 +02:00
Haomai Wang
5737edfa46 Merge pull request #14585 from optimistyzy/414
bluestore/NVMEDEVICE: update SPDK to version 17.03

Reviewed-by: Haomai Wang <haomai@xsky.com>
2017-04-18 19:23:30 +08:00
Nathan Cutler
b48b6f4ed8 doc: mention --show-mappings in crushtool manpage
Fixes: http://tracker.ceph.com/issues/19649
Signed-off-by: Loic Dachary <ldachary@redhat.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2017-04-18 12:21:39 +02:00
Orit Wasserman
cb94e5ad3f Merge pull request #12535 from ceph/wip-rgw-multisite-teuthology
rgw: multisite enabled over multiple clusters
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2017-04-18 11:47:48 +03:00
Kefu Chai
b973be63fe Merge pull request #14555 from yaozongyou/fix-readme-notconsistent
README.md: fix build instructions inconsistent.

Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-04-18 15:59:23 +08:00
Loic Dachary
1b02fef697 crush: implement CrushWrapper::dump(choose_args)
Signed-off-by: Loic Dachary <loic@dachary.org>
2017-04-18 09:45:13 +02:00
Loic Dachary
fa52dfaff2 crush: disable modification API when choose_args is not empty
Adding, removing or move items / buckets via the CrushWrapper API when
choose_args is not empty is unlikely to produce the desired outcome. The
caller should instead add, remove or move items / buckets in a
decompiled crushmap, update the associated choose_arg and upload the new
crushmap.

Signed-off-by: Loic Dachary <loic@dachary.org>
2017-04-18 09:45:07 +02:00
Loic Dachary
dbe36e08be crush: compile/decompile crush_choose_arg_map
A map of crush_choose_arg_map is added to the crushmap text syntax. The
key is an integer matching a pool number.

Signed-off-by: Loic Dachary <loic@dachary.org>
2017-04-18 09:45:03 +02:00
Kefu Chai
17ca501fe8 debian: package ceph.logroate properly
see also "man dh_installlogrotate"

Fixes: http://tracker.ceph.com/issues/19390
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-18 15:41:15 +08:00
Loic Dachary
55fb91d640 crush: add per pool choose_args when calling do_rule
If there is no crush_choose_arg_map for a given pool (the default) a
NULL pointer is given instead and crush_do_rule behavior remains
unchanged.

Signed-off-by: Loic Dachary <loic@dachary.org>
2017-04-18 09:39:43 +02:00
Loic Dachary
19537a450f crush: implement weight and id overrides for straw2
bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Signed-off-by: Loic Dachary <loic@dachary.org>
2017-04-18 09:39:42 +02:00
Loic Dachary
18245ecd78 crush: cleanup test memory leaks
Signed-off-by: Loic Dachary <loic@dachary.org>
2017-04-18 09:39:38 +02:00
Loic Dachary
57d4c8d6fe crush: do not use TREE in tests
It is bugous and unsupported.

Signed-off-by: Loic Dachary <loic@dachary.org>
2017-04-18 09:38:54 +02:00
Jason Dillaman
fcb42c7076 librados: expose new checksum osd operation
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-04-17 22:54:27 -04:00
Jason Dillaman
2f4b8c0cf9 osd: new op for retrieving an extent checksum
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-04-17 22:54:21 -04:00
Yan, Zheng
48f3c91004 mds: drop partial entry and adjust write_pos when opening PurgeQueue
At tail journal, there can be partial written entry. Before appending
new entries to the journal, we need to drop any partial written entry
and adjust write_pos. For mds log, partial written entry is detected
and dropped when replaying the journal.

For PurgeQueue journal, we don't replay the whole journal when MDS
starts. Before appending new entry to the journal, we need to drop
any partial written entry and adjust write_pos.

Previous patch makes the journal header write_pos align to boundary
of fully flushed entry. We can start finding partial written entry
from the journal header write_pos. It should be fast even when the
purge queue is very large.

Fixes: http://tracker.ceph.com/issues/19450
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2017-04-18 10:20:09 +08:00
Yan, Zheng
8ae2962b79 osdc/Journaler: make header write_pos align to boundary of flushed entry
This can speed up the process that detects and drops partial written
entry in the log tail.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
2017-04-18 10:20:09 +08:00
optimistyzy
bc7c6b3fc0 bluestore/NVMEDEVICE: update SPDK to version 17.03
Do some minor changes:

1 Restrict the total DPDK memory used by an osd instance.
change the name from bluestore_spdk_socket_mem to
bluestore_spdk_mem.

2 use spdk_env_init instead of rte_eal_init. The reason is that
SPDK lib invokes rte_eal_init which reduces the initilization
paramter conversion and check, also spdk 17.03 invokes
spdk_vtophys_register_dpdk_mem() (which is an internal function)
in spdk_env_init, and this func must be called.

Signed-off-by: optimistyzy <optimistyzy@gmail.com>
2017-04-18 09:44:47 +08:00
Jason Dillaman
7e3b4ca803 common/Checksummer: allow the initial/seed value to be supplied
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2017-04-17 19:10:47 -04:00
David Zafman
ebab8b1f4f osd: Give requested scrub work a higher priority
Once started we now queue scrub work at higher priority than
scheduled scrubs.

Fixes: http://tracker.ceph.com/issues/15789

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 14:58:02 -07:00
Sage Weil
48074a96ec Merge pull request #14591 from tchaikov/wip-readme-headings
README.md: use github heading syntax to mark the headings

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
2017-04-17 16:43:43 -05:00
Sage Weil
37e9a874af Merge pull request #13968 from dzafman/wip-15912-followon
osd,mon: misc full fixes and cleanups

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: John Spray <john.spray@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2017-04-17 16:42:13 -05:00
Yuri Weinstein
287f94d0c0 Merge pull request #14440 from liewegas/wip-status-flags
osd/OSDMap: hide require_*_osd and sortbitwise flags


Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2017-04-17 13:34:37 -07:00
David Zafman
3becdd3138 test: Test health check output for full ratios
Test out of order ratios summary and details
Test various full osd conditions summary and details

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 13:02:57 -07:00
Matt Benjamin
b01bf489ad Merge pull request #14561 from linuxbox2/wip-rgw-reread-dir
rgw_file: fix readdir after dirent-change
2017-04-17 14:58:35 -04:00
Kefu Chai
472626b4ec README.md: use github heading syntax to mark the headings
Signed-off-by: Kefu Chai <kchai@redhat.com>
2017-04-17 23:49:32 +08:00
Casey Bodley
7e19db2c2f Merge pull request #14466 from fangyuxiangGL/bi
rgw: bucket stats display bucket index type

Reviewed-by: liuchang0812 <liuchang0812@gmail.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2017-04-17 11:04:02 -04:00
David Zafman
2522307865 mon, osd: Add detailed full information for now in the mon
Show ceph health doc output in the correct order

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:02:50 -07:00
David Zafman
e4cf10d3d8 mon: Issue warning or error if a full ratio out of order
The full ratios should be in this order: nearfull, backfillfull, full, failsafe full

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:02:50 -07:00
David Zafman
c83f11de00 mon: Always fix-up full ratios when specified incorrectly in config
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:02:50 -07:00
David Zafman
afd739bed6 mon: Use currently configure full ratio to determine available space
This is a bug that would not adjust available space based on the
currently configured full ratio, but rather the mon_osd_full_ratio
default initial value.

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
David Zafman
1fafec2175 osd: check_full_status() remove bogus comment and use equivalent computation
We actually compute kb_used as the kb - kb_avail.  We don't have the
statfs() system call issue of non-privileged f_bavail vs f_bfree.  It
was assumed that used was really like (blocks - f_bfree).  It is not.

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
David Zafman
84088568b5 osd: Check whether any OSD is full before starting recovery
Add event RecoveryTooFull to move to NotRecovering state

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
David Zafman
27e14504f6 osd: Add PG state and flag for too full for recovery
New state machine state NotRecovering
New PG state PG_STATE_RECOVERY_TOOFULL

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
David Zafman
c7e8dcad34 osd: Add check_osdmap_full() to check for shard OSD fullness
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
David Zafman
94e253ce37 osd: Rename backfill_request_* to recovery_request_*
To be used by both recovery and backfill

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
David Zafman
1711ccdec7 osd: Check failsafe full and crash on push/pull
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
David Zafman
1e2fde1012 osd: Revamp injectfull op to support all full states
Use check_* for injectable full checks
Use is_* to just test simple cur_state

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
David Zafman
a5731076ad osd: Handle backfillfull_ratio just like nearfull and full
Add BACKFILLFULL as a local OSD cur_state
Notify monitor of this new fullness state

Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 08:00:24 -07:00
David Zafman
0264bbddb7 osd: For testing full disks add injectfull socket command
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 07:58:30 -07:00
David Zafman
9dd6952999 common: Bump ratio for backfillfull from 85% to 90%
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 07:58:30 -07:00
David Zafman
79a4ac41c5 common: Remove unused config option osd_recovery_threads
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 07:58:30 -07:00
David Zafman
79124330c7 osd: too_full_for_backfill() returns ostream for reason
Signed-off-by: David Zafman <dzafman@redhat.com>
2017-04-17 07:58:30 -07:00