Commit Graph

31953 Commits

Author SHA1 Message Date
Sage Weil
561869d9c1 Merge pull request #1376 from ceph/wip-7608
test: Fix tiering test cases to use ---force-nonempty

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-05 12:35:56 -08:00
David Zafman
e016e83bce test: Fix tiering test cases to use ---force-nonempty
Fixes: #7608

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-03-05 12:31:29 -08:00
Sage Weil
8106adee4b Merge pull request #1375 from ceph/wip-pgmap-stat
mon/PGMap: return empty stats if pool is not in sum

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-03-05 11:07:03 -08:00
Sage Weil
f6edceefe2 mon/PGMap: return empty stats if pool is not in sum
Greg was right!

When a pool is created, the PGs are not added to the PGMap until the *next*
proposal.  Weaken the assert here and return empty stats for non-existent
(new) pools so that a pool create + tier add sequence does not crash.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-05 10:44:41 -08:00
Sage Weil
4901347e89 Merge pull request #1373 from ceph/wip-crush-json
crush: revise JSON format for 'item' type

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-05 08:52:45 -08:00
John Spray
1685c6f75c crush: revise JSON format for 'item' type
Commit a7e9a7b648 changed the JSON format of CRUSH rules
such that the 'item' attribute on a step was sometimes
an integer and sometimes a string.

This commit separates the integer and string representations
so that tools which rely on a 'item' consistently being an
integer ID will work.

Signed-off-by: John Spray <john.spray@inktank.com>
2014-03-05 16:28:00 +00:00
Samuel Just
8fdfece9fd ReplicatedPG::fill_in_copy_get: fix early return bug
This is not a leak: we are in an else block where cb must
be NULL.  The fix as introduced did not include braces on
the if causing the method to return unconditionally.

Fixes: #7604
Introduced in: 500206d809
Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-04 19:23:00 -08:00
Samuel Just
4bf28df229 Merge remote-tracking branch 'upstream/wip-7447' into firefly
Reviewed-by: Greg Farnum <greg@inktank.com>
2014-03-04 19:22:08 -08:00
Samuel Just
d0b1094ff7 ECBackend,ReplicatedPG: delete temp if we didn't get the transaction
We always send the transaction for operations on temp objects,
but if we didn't get the final transacition on the actual object,
we might end up failing to remove the temp object.  Thus, if
we get a sub op and don't have the transaction, just remove the
named temp objects.

Fixes: #7447
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-04 15:29:20 -08:00
Samuel Just
f2a4eec1d6 PGBackend/ECBackend: handle temp objects correctly
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-04 15:29:20 -08:00
Samuel Just
308ea1bd9e ECMsgTypes: fix constructor temp_added/temp_removed ordering to match users
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-04 15:29:20 -08:00
Samuel Just
3e219961a0 ReplicatedPG::finish_ctx: use correct snapdir prior version in events
Fixes: #7595
Reviewed-by: Greg Farnum <greg@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-04 15:00:20 -08:00
Loic Dachary
4938212b69 Merge pull request #1360 from enovance/wip-brag
Fixes for ceph-brag

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-03-04 12:41:54 +01:00
Babu Shanmugam
46b9f65506 Merge remote-tracking branch 'brag/master' into firefly
Signed-off-by: Babu Shanmugam <anbu@enovance.com>
2014-03-04 14:16:49 +05:30
Sage Weil
d223d3a7d2 Merge pull request #1352 from dachary/wip-7578
common: -- support for env_to_vec

Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-03 21:43:39 -08:00
Sage Weil
bcea57d61f Merge pull request #1342 from ceph/wip-cache-add
mon: add 'osd tier add-cache ...' command (DNM until after wip-tier-add)

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-03-03 21:37:56 -08:00
Sage Weil
397e844397 Merge pull request #1335 from ceph/wip-tier-add
mon: prevent non-empty pools from being added as tiers

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-03-03 21:36:22 -08:00
Gregory Farnum
48e55d9881 Merge pull request #1358 from ceph/wip-2288
mds: check projected xattr when handling setxattr

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-03-03 21:19:40 -08:00
Sage Weil
49e54aba33 mon/OSDMonitor: fix race in 'osd tier remove ...'
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-03 21:16:24 -08:00
Sage Weil
241b9e81f1 mon/OSDMonitor: fix some whitespace
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-03 21:16:24 -08:00
Sage Weil
c029c2fbf1 mon/OSDMonitor: add 'osd tier add-cache <pool> <size>' command
This is a friendlier interface for setting up a cache tier with some
reasonable defaults (defined via config options).  This will simplify
the user experience and documentation.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-03 21:16:24 -08:00
Sage Weil
62e0eb7f2e mon/OSDMonitor: handle 'osd tier add ...' race/corner case
If you have two racing requests to add two different pools as a tier, the
committed checks will pass but they proposals will conflict.  Recheck the
pending pools for the same conditions and wait for a commit if they
occur.

Reported-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-03 21:16:24 -08:00
Sage Weil
0e5fd0e322 osd: make default bloom hit set fpp configurable
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-03 21:16:24 -08:00
Sage Weil
eddf7b68ff osd/ReplicatedPG: fix agent division by zero
If the pool is empty we cannot divide by the object count.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-03 21:16:24 -08:00
Sage Weil
08efb45889 OSDMonitor: do not add non-empty tier pool unless forced
In general, users should not use non-empty pools as new tiers or else
things can behave strangely:

 - the data sets are unrelated behavior will be... strange.
 - if the cache pool is not "new" and does not do the OMAP flag, the OSD
   will not know not to flush omap objects to an EC base tier
 - probably other random stuff I'm forgetting

Allow a user to shoot themselves in the foot with --force-nonempty.

Implements: #7457
Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-03 21:11:17 -08:00
Yan, Zheng
12909bb607 mds: check projected xattr when handling setxattr
Fixes: #2288
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-03-04 10:55:23 +08:00
Samuel Just
198b0aa268 Merge pull request #1354 from ceph/wip-7563
Wip 7563

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-03-03 17:05:53 -08:00
Samuel Just
192a27cac6 Merge pull request #1355 from ceph/wip-osd-verbosity
osd: be a bit more verbose on startup

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:38:54 -08:00
Samuel Just
20fe162ece TestPGLog: tests for proc_replica_log/merge_log equivalence
We need the merge_log and proc_replica_log paths to result in the
same missing set.  This patch adds some machinery for specifying
a log merge scenario and comparing both paths to the same correct
result.  This machinery also makes it a bit easier to read and add
new tests.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:17 -08:00
Samuel Just
9a64947ca1 TestPGLog::proc_replica_log: adjust wonky test
This test didn't quite make sense since the divergent entry
cannot be from a newer epoch.  It also didn't quite match the
diagram.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:17 -08:00
Samuel Just
6b6065ab9d TestPGLog::proc_replica_log: adjust to corrected proc_replica_log behavior
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:17 -08:00
Samuel Just
97f35960a0 TestPGLog::proc_replica_log: add prior_version to some entries
Otherwise, the test logs are invalid.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:17 -08:00
Samuel Just
200e2964ea PGLog::proc_replica_log: _merge_divergent_entries based on truncated olog
We can't merge using the primary's log since we haven't decided whether
to send them a complete log yet.  Thus, merge based on the truncated olog
rather than the primary's log.  This is a consequence of the division
between trimming divergent entries in peering/unfound search and sending
a complete log to actual members of the actingbackfill set in activate().
_merge_divergent_entries on the truncated log and add_next_event() on the
newer entries result in the same missing/log regardless of the order in
which they are performed.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:16 -08:00
Samuel Just
b0357abcae PG.h:PGLogEntryHandler: remove silly cant_rollback logic
Also, we now call rollback in a reverse order, so there is no
need to reverse the entries again.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:16 -08:00
Samuel Just
c99b7e1985 PG,PGLog: replace _merge_old_entry with _merge_object_divergent_entries
The _merge_old_entry structure had trouble distinguishing between the
following cases:

missing: foo, 1,1
merge_old_entry modify 1,1 0,0
merge_old_entry modify 1,2 1,1

and
merge_old_entry modify 1,2 1,1

In the first case, we should end up with foo removed from missing
at the end.  In the second, we need foo added to missing at 1,1.
It's far simpler to present all of the divergent entries for a single
object at once.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:12 -08:00
Samuel Just
86b21e0b78 TestPGLog::merge_old_entry: ne.version cannot be oe.version
Otherwise, it would not be divergent!

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:11 -08:00
Samuel Just
3dc4f10a9a TestPGLog::merge_old_entry: we no longer use merge_old_entry this way
This needs to be replaced with an equivalent test of
_merge_object_divergent_entries.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:11 -08:00
Samuel Just
ff329ac52b TestPGLog:rewind_divergent_log: set prior_version for delete
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:11 -08:00
Samuel Just
9e43dd6ee3 TestPGLog: ignore merge_old_entry return value
No callers use the merge_old_entry return value.  _merge_divergent_entries
won't have one.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:11 -08:00
Samuel Just
3cc9e2262c TestPGLog: not worth maintaining tests of assert behavior
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 16:05:11 -08:00
David Zafman
dda72dee70 Merge pull request #1356 from ceph/wip-7458
osd: stray pg ref on shutdown

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-03-03 14:47:38 -08:00
Samuel Just
a234053d42 OSD,config_opts: log osd state changes at level 0 instead
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-03-03 13:53:54 -08:00
Sage Weil
fd9c29b9b0 Merge pull request #1341 from ceph/wip-osd-status
osd: 'status' admin socket command

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-03-03 11:21:11 -08:00
Sage Weil
68890b2009 osd: be a bit more verbose on startup
load_pgs can take a while and it is nice to know what ceph-osd is doing
without cranking up logging.

Did a quick audit of dout(1)'s and making this the default.  This lets us
see basic OSD state changes (load_pgs, boot, active) at the default level.

At this point all osd state changes should be logged at level 1.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-03-03 11:16:31 -08:00
Ilya Dryomov
bd9913ce64 Merge branch 'wip-hint' into firefly
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-03-03 20:37:24 +02:00
Ilya Dryomov
371a80cb0f librbd: prefix rbd writes with CEPH_OSD_OP_SETALLOCHINT osd op
In an effort to reduce fragmentation, prefix every rbd write with
a CEPH_OSD_OP_SETALLOCHINT osd op with an expected_write_size value set
to the object size (1 << order).  Backwards compatibility is taken care
of on the osd side.

"The CEPH_OSD_OP_SETALLOCHINT hint is durable, in that it's enough to
do it once.  The reason every rbd write is prefixed is that rbd doesn't
explicitly create objects and relies on writes creating them
implicitly, so there is no place to stick a single hint op into.  To
get around that we decided to prefix every rbd write with a hint (just
like write and setattr ops, hint op will create an object implicitly if
it doesn't exist)."

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-03-03 20:33:44 +02:00
Ilya Dryomov
8e49bc32c8 FileStore: add option to cap alloc hint size
Add a new config option, filestore_max_alloc_hint_size, to cap
SETALLOCHINT hint size.  The unit is a byte, the default value is
1 megabyte.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-03-03 20:33:44 +02:00
Ilya Dryomov
1f5b796f58 FileStore: introduce XfsFileStoreBackend class
Introduce XfsFileStoreBackend class, currently the only filestore
backend implementing SETALLOCHINT op.  This commit adds a build-time
dependency on libxfs as xfs-specific ioctl (XFS_IOC_FSSETXATTR /
XFS_XFLAG_EXTSIZE) is used to implement the new set_alloc_hint()
method.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-03-03 20:33:44 +02:00
Ilya Dryomov
391257c00e FileStore: refactor FS detection checks a bit
Refactor FS detection checks in FileStore::_detect_fs() so that they
look the same as the ones in FileStore::mkfs().  This is in preparation
for adding XfsFileStoreBackend class.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-03-03 20:33:44 +02:00
Ilya Dryomov
6456802394 osd: add SETALLOCHINT operation
This is primarily for librbd/krbd's benefit and is supposed to combat
fragmentation:

"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks.  We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."

SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.

xfs is hooked up in the subsequent commits.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-03-03 20:33:44 +02:00