Commit Graph

33126 Commits

Author SHA1 Message Date
Sage Weil
6ff645f592 osd/PG: fix repair_object when missing on primary
If the object is missing on the primary, we need to fully populate the
missing_loc.needs_recovery_map.  This broke with the recent refactoring of
recovery for EC, somewhere around 84e2f39c55.

Fixes: #8008
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-11 15:39:23 -07:00
Sage Weil
19acfebc4d ceph_test_librados_tier: tolerage EAGAIN from pg scrub command
We may get EAGAIN if the osd happens to be down, for example due to
thrashing.  Try a few times and then give up.

Note that the other place we try to scrub we don't even check the return
value as we are poking ever pg in the pool.  And the scrub commands get
lost due to any peering event, etc.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-11 14:48:26 -07:00
Gregory Farnum
b1db075147 Merge pull request #1656 from ceph/wip-osd-boot
mon: fix osd boot check

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-04-11 14:44:01 -07:00
Sage Weil
39b9d9d8c4 mon/OSDMonitor: fix osd epoch in boot check
This was introduced in 4c99e978a7 and was
incorrect; boot_epoch is the previous epoch the osd booted in, not the
latest map epoch that the OSD currently has.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-11 14:32:21 -07:00
Sage Weil
78df66f520 osd/ReplicatedPG: skip missing hit_sets when loading into memory
We weren't handling hit_sets that were missing.

Two changes here:

1- Load the hit_sets oldest to newest.  That means that if we stop partway
   through loading, and then add another to the end of the list, and then
   try again to load some more, we will still catch them all.
2- If the object is missing, stop.  We'll try again the next time
   agent_work() is called.

Fixes: #8077
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-11 13:14:58 -07:00
Yan, Zheng
7077438be9 mds: finish table servers recovery after creating newfs
Fixes: #8054
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-12 03:23:11 +08:00
Sage Weil
052519ed4a Revert "mds: finish table servers recovery after creating newfs"
This reverts commit f6c20730c1.

This breaks single MDS startup.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-11 10:33:11 -07:00
Loic Dachary
f374591219 Merge pull request #1650 from dachary/wip-erasure-code-doc
erasure-code: document the ruleset-root profile parameter

Reviewed-by: Mark Nelson <mark.nelson@inktank.com>
2014-04-11 19:20:35 +02:00
Yehuda Sadeh
82d8397ad8 rgw: update bucket / object rewrite
Get code up to date for firefly, need to pass bucket owner for quota
related functionality.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-04-11 10:08:23 -07:00
Yehuda Sadeh
6f2ee99a22 radosgw-admin: add some conditions for bucket rewrite
--min-rewrite-size, --max-rewrite-size to specify object size
 conditions

 --start-date, --end-date to specify object mtime conditions (format is
 YYYY-MM-DD[ HH:MM:SS])

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit dbb2013eaf1eedf2a5cbcae1d314385ea033dbb8)
2014-04-11 10:08:23 -07:00
Yehuda Sadeh
9130e7d692 radosgw-admin: new 'bucket rewrite' command
Iterates through objects and rewrites them.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 3af09b9f9f6eaf2c80f0f0fd2ea87ac16c7ffcf8)

Conflicts:
	src/rgw/rgw_admin.cc
2014-04-11 10:08:23 -07:00
Yehuda Sadeh
f12bccc98b radosgw-admin: check params for object rewrite
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 28c716bccd955eaf68e9c739139c901ca4f419f7)
2014-04-11 10:08:23 -07:00
Josh Durgin
fcd94d6abd Merge pull request #1630 from ceph/wip-7450
Wip 7450

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-11 10:04:57 -07:00
David Zafman
ec4a6ce0c2 Merge pull request #1635 from ceph/wip-7437
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-11 08:33:45 -07:00
Sage Weil
1ad5bdcd76 Merge pull request #1641 from ceph/wip-multimds
mds: guarantee message ordering when importing non-auth caps

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-11 06:59:11 -07:00
Babu Shanmugam
60cb2aecce Merge branch 'master' of https://github.com/enovance/ceph-brag into firefly 2014-04-11 13:57:33 +00:00
Sage Weil
204b7a469c Merge pull request #1645 from ceph/wip-8054
mds: finish table servers recovery after creating newfs

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-11 06:56:39 -07:00
Babu Shanmugam
ad40356ddb Included the total cluster size in components_count object
Signed-off-by: Babu Shanmugam <anbu@enovance.com>
2014-04-11 13:28:50 +00:00
Babu Shanmugam
78fcb1a0f0 Fetching the date from ceph osd dump as that is more reliable across ceph versions
Signed-off-by: Babu Shanmugam <anbu@enovance.com>
2014-04-11 12:05:02 +00:00
Wido den Hollander
99d74eef82 doc: Add additional information over CloudStack and RBD 2014-04-11 13:59:31 +02:00
Loic Dachary
db3e0b5129 erasure-code: document the ruleset-root profile parameter
If unspecified it is ruleset-root=default and will translate into

   take default

when a ruleset is created for an erasure-code pool.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-04-11 13:51:46 +02:00
Babu Shanmugam
6d42bd9f91 While generating crush_types, avoiding item parsing, and calculating type count by just iterating thorugh buckets list
Signed-off-by: Babu Shanmugam <anbu@enovance.com>
2014-04-11 11:28:14 +00:00
Babu Shanmugam
1987832569 Bug fix in the way crush_type is extracted from osd crush dump
Signed-off-by: Babu Shanmugam <anbu@enovance.com>
2014-04-11 08:42:25 +00:00
Josh Durgin
e46af060c3 Merge pull request #1647 from ceph/wip-lockdep
a couple of lockdep fixes

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-11 00:11:18 -07:00
Sage Weil
072d3711d6 RWLock: make lockdep id mutable
This allows us to keep the lock/unlock methods const, as per commit
970d53fc0f.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-10 21:36:37 -07:00
Sage Weil
da0d38208b Revert "RWLock: don't assign the lockdep id more than once"
This reverts commit 957ac3cbe3.

It's important to assign these for all operations for cases where
g_lockdep isn't yet true when the constructor runs.  This is true
for the HeartbeatMap rwlock, among other things, as that thread
is created during early startup before lockdep is enabled.  All
of the lockdep hooks assume that they can assign ids on the fly
and not tracking them here breaks things.

Conflicts:

	src/common/RWLock.h
2014-04-10 21:34:51 -07:00
Sage Weil
632098f2a6 common_init: remove dup lockdep message
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-10 21:34:03 -07:00
John Wilkins
8c38ec7a7e Merge pull request #1646 from dmick/wip-erasure-doc
doc: Wordsmith the erasure-code doc a bit.
2014-04-10 20:02:56 -07:00
Dan Mick
3c54a49e39 Wordsmith the erasure-code doc a bit
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2014-04-10 19:55:52 -07:00
Yan, Zheng
f6c20730c1 mds: finish table servers recovery after creating newfs
Fixes: #8054
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-11 09:57:29 +08:00
Sage Weil
756e36260d Merge pull request #1643 from ceph/wip-8062
mon/OSDMonitor: ignore boot message from before last up_from

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-04-10 18:25:23 -07:00
Yan, Zheng
3db7486128 mds: issue new caps before starting log entry
Locker::issue_new_caps() calls Locker::eval(), which may dispatch
other requests.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-11 08:39:55 +08:00
David Zafman
07e8ee208e test: Add EC testing to ceph_test_rados_api_aio
Fixes: #7437

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
69afc59b3e test: Add multiple write test cases to ceph_test_rados_api_aio
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
d99f1d9f68 test, librados: aio read *return_value consistency, fix ceph_test_rados_api_aio
test:
  Add set_completion*PP() functions to cast arg to correct class
  Add return_value checks
  Add some reads with buffers larger than object size
  Check buffer length on reads
librados:
  Make sure *return_value() has bytes read in all cases

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
3d290c2fa6 test: Add EC unaligned append write test to ceph_test_rados_api_io
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
39bf68c3ce pybind, test: Add python binding for append and add to test
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
d211381470 pybind: Check that "key" is a string
Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
David Zafman
98127202c2 librados, test: Have write, append and write_full return 0 on success
Fix consistency of write, append, write_full, all return 0 on success
Include C (rados_*) variants, C++ ctx variants
and aio get_return_value() and rados_aio_get_return_value()

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-10 17:22:29 -07:00
Yehuda Sadeh
6ce7116fc1 civetweb: update subproject
Fixes: #7786

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-04-10 15:54:01 -07:00
Yehuda Sadeh
43d837d3ec rgw: radosgw-admin object rewrite
A radosgw-admin command that copies the object into itself while
preserving mtime and attributes so that data can be restriped.
Especially useful when migrating from argonaut (where objects
weren't striped).

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 292c3c16fa87ed9b6d6abb22f45527ea2487c2a6)

Conflicts:
	src/rgw/rgw_admin.cc
2014-04-10 13:40:28 -07:00
Sage Weil
4c99e978a7 mon/OSDMonitor: ignore boot message from before last up_from
It is possible we will have a dup OSDBoot message queued up in the mon
and will process it again after that osd was marked up and then down.  If
that happens, we should ignore this message, not mark the osd back in with
the same address.

Fixes: #8062
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-10 13:34:58 -07:00
Sage Weil
28371a2463 Merge pull request #1624 from ceph/wip-6789
mon: Monitor: suicide on start if mon has been removed from monmap

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-10 11:01:43 -07:00
Sage Weil
a8f0953974 osd/ReplicatedPG: adjust obc + snapset_obc locking strategy
Prevoiusly we assumed that if we had snapset_obc set, !exists on the head
and if we got the snapdir lock we were good to take the head lock too.
This is no the case when:

 - delete queued
   - takes wr lock on both head and snapdir
 - delete commits (but not yet applied)
 - stat
   - tries to take wr lock on head
     - blocks, toggles w=1 state on *head only*
 - copy-from
   - tries to take wr lock on snapdir, succeeds
   - tries to take wr lock on head, fails because w=1
     - fails the assert(got)

The problem is that the read and write paths are taking different locks
and we are expecting them to operate in synchrony.

Fix this by using the same ordering for reads as well as write: if the
snapset_obc is defined, take the read lock on that too, just as we do with
a write.

Fixes: #8046
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-10 10:55:55 -07:00
Joao Eduardo Luis
86b85947a2 mon: Monitor: suicide on start if mon has been removed from monmap
If the monitor has been marked as having been part of an existing quorum
and is no longer in the monmap, then it is safe to assume the monitor
was removed from the monmap.  In that event, do not allow the monitor
to start, as it will try to find its way into the quorum again (and
someone clearly stated they don't really want them there), unless
'mon force quorum join' is specified.

Fixes: 6789
Backport: dumpling, emperor

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2014-04-10 15:14:19 +01:00
Yan, Zheng
02048dcc30 mds: guarantee message ordering when importing non-auth caps
Current code allow importing non-auth caps when inode is being exported.
This can breaks message ordering because the corresponding cap import
messages are sent after the flush session messages. So they can arrive
at clients after clients have already received cap import messages from
new auth MDS of the inode.

The quick fix is ignore MExportCaps when inode is frozen.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-10 19:46:56 +08:00
Sage Weil
cf69bdbd74 Merge pull request #1639 from ceph/wip-multimds
Wip multimds

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-09 21:19:42 -07:00
Yan, Zheng
ac51fcac6b mds: include truncate_seq/truncate_size in filelock's state
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-10 11:13:30 +08:00
Yan, Zheng
808ba130ef mds: remove wrong assertion for remote frozen authpin
For across authority rename, the MDS first freezes the source inode's
authpin. It happens while the source dentry isn't locked. So when the
inode's authpin become frozen, the source dentry may have changed and
be linked to a different inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-10 11:13:20 +08:00
Sage Weil
860d72770c osdc/Objecter: move mapping into struct, helper
Move the common bits of Op and LingerOp into op_target_t and separate the
actual mapping calculation into calc_target().  This hugely simplifies
recal_*op_target() by mostly just shuffling all of the same logic into
that helper.

There is one functional change in this patch: recalc_linger_op() now is
aware of the tiering logic that was previously only handled in
recalc_op_target().

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-09 18:02:27 -07:00