Commit Graph

25459 Commits

Author SHA1 Message Date
Sage Weil
aa14da20ed doc/release-notes: v0.60
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-01 18:17:27 -07:00
Gary Lowell
6ffadce67e Merge branch 'next' 2013-04-01 17:57:45 -07:00
athanatos
f861d54c17 Merge pull request #181 from ceph/wip_4510
Scrub/repair should correctly handle truncation and EIO

Fixes #4510
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-01 16:32:34 -07:00
Samuel Just
fc13f1111c PG::_scan_list: assert if error is neither -EIO nor -ENOENT
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 16:27:34 -07:00
Samuel Just
3fa3b676f9 FileStore: rename debug_delete_obj to debug_obj_on_delete
This should make the method intent less confusing.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 16:27:34 -07:00
Samuel Just
40070cef3f PG: _scan_list can now handle EIO on read, stat, get_omap_header
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 16:27:34 -07:00
Samuel Just
fcec1a06dd ObjectStore: add allow_eio to read, stat, get_omap_header
This will allow enlightened callers to handle EIO.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 16:27:31 -07:00
João Eduardo Luís
0e1f504234 Merge pull request #183 from ceph/wip-4313-b
qa: workunits: mon: test 'config-key' store

Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-01 15:57:04 -07:00
Sage Weil
76ad956330 librados: test empty ObjectWriteOperation
Tests that #2673 is fixed.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-01 15:35:17 -07:00
Sage Weil
15bd980246 Merge pull request #182 from ceph/wip-no-cors-without-rgw
Makefile.am: disable building ceph_test_cors when radosgw is not enabled
2013-04-01 14:56:30 -07:00
Josh Durgin
690e4df19a Makefile.am: disable building ceph_test_cors when radosgw is not enabled
This test depends on radosgw. Trying to build it without radosgw will
result in a compile error.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 14:05:05 -07:00
Gary Lowell
f26f7a3902 v0.60 2013-04-01 12:22:53 -07:00
Sage Weil
db7a09507e Merge remote-tracking branch 'gh/next' 2013-04-01 11:52:46 -07:00
Sage Weil
557685f391 Merge pull request #169 from ceph/wip-rbd-diff
rbd incremental backup/restore

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 11:26:16 -07:00
Josh Durgin
267ce0d90b librados: don't use lockdep for AioCompletionImpl
This is a quick workaround for the next branch. A more complete fix
will be done for the master branch. This does not affect correctness,
just what qa runs with lockdep enabled do.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2013-04-01 11:17:41 -07:00
Greg Farnum
78acc5c214 test: fix signed/unsigned comparison in test_cors
Signed-off-by: Greg Farnum <greg@inktank.com>
Acked-by: Sage Weil <sage@inktank.com>
2013-04-01 10:04:32 -07:00
Samuel Just
d5b797022a PG: don't compare auth with itself
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 09:38:12 -07:00
Samuel Just
39d1a3fbce PG: pass authoritative scrub map to _scrub
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 09:38:11 -07:00
Samuel Just
a838965ca3 PG: read_error should trigger a repair in _compare_scrub_objects
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 09:38:11 -07:00
Samuel Just
1940cf3e3f FileStore,OSD: add mechanism for injecting EIO, truncating obj
This will be used in testing repair.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 09:38:11 -07:00
Samuel Just
83dbfaea7d PG::_select_auth_object: prefer a peer which did not hit a read error
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 09:38:08 -07:00
Samuel Just
e61c94e2f6 PG: make _select_auth_object smarter
Previously, we just picked the first one to have the object in
question.  Now, we will attempt to choose one that has as
much of the following as possible:
1) has the object (there must be one)
2) has an object_info attr
3) has a valid object_info attr
4) has an object_info whose size matches the scrubbed size

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-01 09:38:04 -07:00
Greg Farnum
5d11c20140 Merge branch 'wip-mds' 2013-04-01 09:31:37 -07:00
Greg Farnum
a77eaec852 mds: bump the protocol version.
We've changed quite a lot of the restart behavior, as well as one
of the message encodings. This is cheaper and easier than using feature bits,
and CephFS is still a tech preview or whatever, so let's cover them using this.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:27:27 -07:00
Yan, Zheng
93ab1edd10 mds: don't roll back prepared table updates
When table server is recovering, it re-sends 'agree' messages for
prepared table updates. It is possible table client receives an
'agree' messages before it commits the corresponding update. Don't
send 'rollback' message back to the server in this case.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:24 -07:00
Yan, Zheng
2b0f03cbf8 mds: clear scatter dirty if replica inode has no auth subtree
This avoids sending superfluous scatterlock state to recovering MDS

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:24 -07:00
Yan, Zheng
3d3d85d845 mds: don't replicate purging dentry
open_remote_ino is racy, it's possible someone deletes the inode's
last linkage while the MDS is discovering the inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:24 -07:00
Yan, Zheng
44db980253 mds: eval inodes with caps imported by cache rejoin message
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:24 -07:00
Yan, Zheng
9939ced468 mds: try merging subtree after clear EXPORTBOUND
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:24 -07:00
Yan, Zheng
5ceae8caab mds: clear dirty inode rstat if import fails
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:24 -07:00
Yan, Zheng
d1602b3b7e mds: don't open dirfrag while subtree is frozen
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:24 -07:00
Yan, Zheng
fcf170b81b mds: notify bystanders if export aborts
So bystanders know the subtree is single auth earlier.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:24 -07:00
Yan, Zheng
7278f644a9 mds: fix export cancel notification
The comment says that if the importer is dead, bystanders thinks the
exporter is the only auth, as per mdcache->handle_mds_failure(). But
there is no such code in MDCache::handle_mds_failure().

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:24 -07:00
Yan, Zheng
27438db5fa mds: unfreeze subtree if import aborts in PREPPED state
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:23 -07:00
Yan, Zheng
4d532cb6d9 mds: check MDS peer's state through mdsmap
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:23 -07:00
Yan, Zheng
b4395889d7 mds: avoid double auth pin for file recovery
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:23 -07:00
Yan, Zheng
e072d34fb7 mds: add dirty imported dirfrag to LogSegment
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:23 -07:00
Yan, Zheng
a4ed7ea8b8 mds: send lock action message when auth MDS is in proper state.
For rejoining object, don't send lock ACK message because lock states
are still uncertain. The lock ACK may confuse object's auth MDS and
trigger assertion.

If object's auth MDS is not active, just skip sending NUDGE, REQRDLOCK
and REQSCATTER messages. MDCache::handle_mds_recovery() will take care
of them.

Also defer caps release message until clientreplay or active

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:23 -07:00
Yan, Zheng
7ad7c347d4 mds: issue caps when lock state in replica become SYNC
because client can request READ caps from non-auth MDS.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:23 -07:00
Yan, Zheng
10b1a5663f mds: share inode max size after MDS recovers
The MDS may crash after journaling the new max size, but before sending
the new max size to the client. Later when the MDS recovers, the client
re-requests the new max size, but the MDS finds max size unchanged. So
the client waits for the new max size forever. This issue can be avoided
by checking client cap's last_sent, share inode max size if it is zero.
(reconnected cap's last_sent is zero)

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:23 -07:00
Yan, Zheng
b2342a9c31 mds: take object's versionlock when rejoinning xlock
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:26:23 -07:00
Yan, Zheng
6862fe7a14 mds: reqid for rejoinning authpin/wrlock need to be list
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-04-01 09:26:00 -07:00
Yan, Zheng
d1a257498c mds: handle linkage mismatch during cache rejoin
For MDS cluster, not all file system namespace operations that impact
multiple MDS use two phase commit. Some operations use dentry link/unlink
message to update replica dentry's linkage after they are committed by
the master MDS. It's possible the master MDS crashes after journaling an
operation, but before sending the dentry link/unlink messages. Later when
the MDS recovers and receives cache rejoin messages from the surviving
MDS, it will find linkage mismatch.

The original cache rejoin code does not properly handle the case that
dentry unlink messages were missing. Unlinked inodes were linked to stray
dentries. So the cache rejoin ack message need push replicas of these
stray dentries to the surviving MDS.

This patch also adds code that handles cache expiration in the middle of
cache rejoining.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:25:59 -07:00
Yan, Zheng
ce0b74e55e mds: encode dirfrag base in cache rejoin ack
Cache rejoin ack message already encodes inode base, make it also encode
dirfrag base. This allowes the message to replicate stray dentries like
MDentryUnlink message. The function will be used by later patch.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:24:41 -07:00
Gregory Farnum
4f844050b5 Merge pull request #179 from ceph/wip-client-cond
client: always remove cond from list after waiting

Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:22:45 -07:00
Yan, Zheng
9f66d0454f mds: include replica nonce in MMDSCacheRejoin::inode_strong
So the recovering MDS can properly handle cache expire messages.
Also increase the nonce value when sending the cache rejoin acks.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>

Also update the MMDSCacheRejoin encoding to the new format.
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:22:38 -07:00
Joao Eduardo Luis
cbb38a1cbb mon: OSDMonitor: only output warn/err messages if quotas are set > 0
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-04-01 09:21:00 -07:00
Yan, Zheng
01fd55a64c mds: remove MDCache::rejoin_fetch_dirfrags()
In commit 77946dcdae (mds: fetch missing inodes from disk), I introduced
MDCache::rejoin_fetch_dirfrags(). But it basicly duplicates the function
of MDCache::open_undef_dirfrags(), so just remove rejoin_fetch_dirfrags()
and make open_undef_dirfrags() also handle undefined inodes.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
e62e48bb32 mds: fix MDS recovery involving cross authority rename
For mds cluster, rename operation may involve multiple MDS. If the
rename source's auth MDS crashes after some witness MDS have prepared
the rename but before the rename is committing. Later when the MDS
recovers, its subtree map and linkages are different from the prepared
MDS'. This causes problems for both subtree resolve and cache rejoin.
The solution is, if the rename source's auth MDS fails, the prepared
witness MDS query the master MDS if the operation is committing. If
it's not, rollback the rename, then send resolve message to the
recovering MDS.

Another similar case is a prepared witness MDS crashes when the
rename source's auth MDS has prepared or is preparing the operation.
when the witness recovers, the master just delay sending the resolve
ack message until the it commits the operation.

This patch also updates Server::handle_client_rename(). Make preparing
the rename source's auth MDS be the final step before committing the
rename.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
3ab86637b3 mds: send resolve acks after master updates are safely logged
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00