Commit Graph

25360 Commits

Author SHA1 Message Date
Yan, Zheng
3ab86637b3 mds: send resolve acks after master updates are safely logged
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
75346d8f3d mds: send cache rejoin messages after gathering all resolves
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
97bc0d26e6 mds: don't send MDentry{Link,Unlink} before receiving cache rejoin
The active MDS calls MDCache::rejoin_scour_survivor_replicas() when it
receives the cache rejoin message. The function will remove the objects
replicated by MDentry{Link,Unlink} from replica map.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
e381bb3930 mds: set resolve/rejoin gather MDS set in advance
For active MDS, it may receive resolve/rejoin message before receiving
the mdsmap message that claims the MDS cluster is in resolving/rejoning
state. So instead of set the gather MDS set when receiving the mdsmap.
set them in advance when detecting MDS' failure.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
ed85dd61a5 mds: don't send resolve message between active MDS
When MDS cluster is resolving, current behavior is sending subtree resolve
message to all other MDS and waiting for all other MDS' resolve message.
The problem is that active MDS can have diffent subtree map due to rename.
Besides gathering active MDS's resolve messages are also racy. The only
function for these messages is disambiguate other MDS' import. We can
replace it by import finish notification.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
30dbb1d4e5 mds: compose and send resolve messages in batch
Resolve messages for all MDS are the same, so we can compose and
send them in batch.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
a6d9eb8c58 mds: don't delay processing replica buffer in slave request
Replicated objects need to be added into the cache immediately

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
131271655f mds: unify slave request waiting
When requesting remote xlock or remote wrlock, the master request is
put into lock object's REMOTEXLOCK waiting queue. The problem is that
remote wrlock's target can be different from lock's auth MDS. When
the lock's auth MDS recovers, MDCache::handle_mds_recovery() may wake
incorrect request. So just unify slave request waiting, dispatch the
master request when receiving slave request reply.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-01 09:17:19 -07:00
Yan, Zheng
ef9a4f6605 mds: defer eval gather locks when removing replica
Locks' states should not change between composing the cache rejoin ack
messages and sending the message. If Locker::eval_gather() is called
in MDCache::{inode,dentry}_remove_replica(), it may wake requests and
change locks' states.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:17:09 -07:00
Yan, Zheng
12e7c3d171 mds: avoid sending duplicated table prepare/commit
This patch makes table client defer sending table prepare/commit messages
until receiving table server's 'ready' message. This avoid duplicated table
prepare/commit messages.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:16:59 -07:00
Yan, Zheng
a5dce808b5 mds: make sure table request id unique
When a MDS becomes active, the table server re-sends 'agree' messages
for old prepared request. If the recoverd MDS starts a new table request
at the same time, The new request's ID can happen to be the same as old
prepared request's ID, because current table client code assigns request
ID from zero after MDS restarts.

This patch make table server send 'ready' messages when table clients
become active or itself becomes active. The 'ready' message updates
table client's last_reqid to avoid request ID collision. The message
also replaces the roles of finish_recovery() and handle_mds_recovery()
callbacks for table client.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:16:48 -07:00
Yan, Zheng
bb83a5d63c mds: consider MDS as recovered when it reaches clientreplay state.
MDS in clientreplsy state already starts servering requests. It also
make MDS::handle_mds_recovery() and MDS::recovery_done() match.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-01 09:16:36 -07:00
Sage Weil
782681402f client: always remove cond from list after waiting
The signal method removes conds from the list after it signals.  That's
not okay if the cond triggers for some other reason; an invalid Cond*
will remain on the list and get signaled later.

Make the wait_on_list() helper remove it; use that in several callers;
explicitly do the removal in the remaining callers.

Change signal_cond_list() to not clear the list; rely on the signalee's to
do that.  Audit all users and make sure they are either using the
wait_on_list() helper (which removes its Cond) or do the remove explicitly.

Backport some form of this: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-01 09:12:44 -07:00
Sage Weil
8267bf56ed librbd: fix size arg type for diff_iterate
Fixes build on 32-bit archs.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-01 08:58:10 -07:00
Josh Durgin
b2b1034c53 PendingReleaseNotes: note about rbd progress output
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:07 -07:00
Josh Durgin
f1f6407221 test_librbd: add diff_iterate test including discard
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:07 -07:00
Josh Durgin
e88fe3cbbc rbd.py: add some missing functions
discard, flush, and striping info slipped through the cracks before,
but are useful and trivial to add.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:07 -07:00
Josh Durgin
c0e3f642b1 librbd: add C and python bindings for diff_iterate
The python interface is a bit awkward since it maps directly
to the C interface, but it'll work well enough and not use
tons of memory.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:07 -07:00
Josh Durgin
e83fd3b937 librados: don't insert zero length extents in a diff
They're useless, and trigger an assert in interval_set::inesrt.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:07 -07:00
Josh Durgin
52097d343b rbd: add formatted output to diff command
All the other commands that display information have this.
For consistency, add it to this command too.

Also switch the plain output to use a TextTable for better readability.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:07 -07:00
Josh Durgin
33d1a2fc88 librbd: return -ENOENT from diff_iterate when the snap doesn't exist
This is a bit more helpful than -EINVAL.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:07 -07:00
Josh Durgin
6a04a7fa56 rbd: initialize random number generator for bench-write
Without this, the same seed is used each time, so multiple runs
of bench-write with the same parameters have the same I/O pattern.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:07 -07:00
Josh Durgin
c680531e07 librbd: change diff_iterate interface to be more C-friendly
Use int instead of bool for the callback, and make it represent
whether the data exists, rather than the opposite, since callers
are likely to test for whether it's data instead of whether its zeroes.

Change the return value to 0, since an int64_t will wrap around
for large reads, and there's no value in reporting the length
read when it will always be the length requested clipped to the
size of the image.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:07 -07:00
Josh Durgin
8a1cbf3e74 rbd: remove alway-true else condition in import-diff
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:06 -07:00
Josh Durgin
d86fb04f48 rbd: make diff banner length depend on the banner
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-01 08:56:06 -07:00
Neil Levine
c499caf508 mkcephfs: warn that mkcephfs is deprecated in favor of ceph-deploy
Signed-off-by: Neil Levine <neil.levine@inktank.com>
2013-04-01 08:54:30 -07:00
Sage Weil
3b5f663f11 Merge pull request #178 from ceph/wip-client
Fix client with cache disabled, and a use-after-free

Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-04-01 08:48:45 -07:00
Joao Eduardo Luis
677867d088 qa: workunits: mon: test 'config-key' store
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-04-01 16:11:01 +01:00
Josh Durgin
c0e5c22dfd rbd: fail import-diff if we reach the end of the stream sooner than expected
safe_read() just protects against EINTR, and may return less data than
requested if it reaches the end of the file. Use safe_read_exact() to
make sure we get the right amount of data.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-31 23:32:42 -07:00
Josh Durgin
09898ffdd9 rbd: complete progress for import-diff from stdin
The diff format gives us a size, so unlike a normal import, we do update progress.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-31 23:32:42 -07:00
Josh Durgin
a0fca0807c rbd: fix else style in import-diff
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-31 23:32:42 -07:00
Josh Durgin
2ec87e66a5 rbd: update progress as a diff is exported
This will be jumpy since changed extents probably aren't evenly
distributed, but it's better than nothing.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-31 23:32:41 -07:00
Josh Durgin
f0ddf6cc77 rbd: remove unused argument from do_diff()
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
ef4938594a rbd: fix size change output
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
88706ab89c rbd: send progress info to stderr, not stdout
This avoids interfering when export is sent to stdout.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
325a3372cb rbd: include 'diff' command in man page
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
64a202a7ad rbd: update man page for import-diff and export-diff
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
f67f62abab rbd: prevent import-diff if start snapshot is not already present
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
9946c69cd1 rbd: fail import-diff if end snap already exists
This will prevent a user from inadvertantly reapplying a diff twice.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
5b0c68b928 doc/dev/rbd-diff: specify that metadata records come before data
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
3694968a05 librbd: implement image.snap_exists()
This is a much more convenient way to tell if a snapshot already exists.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
c5bd978a1d librados: move snap_set_diff to librados/
This is most closely related to the librados list_snaps API; move it there.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
6af769a16f librados: cleanly define SNAP_HEAD, SNAP_DIR constants
We were using the internal CEPH_NOSNAP and CEPH_SNAPDIR constants, and
defining a clone_info_t::HEAD (with a different value).  The docs were
referrring to the internal constant names.

Instead, define librados constants (C and C++) with the same values as the
internal types.

Note that this changes the clone_info_t::HEAD value from -1 to -2 so that
it now matches the internal type.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
10dc0ad09f librados: document list_snaps
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
be8927f598 librbd: drop unused elapsed calc for diff_iterate
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
f0c9a200ec librbd: diff_iterate fromsnapname after the end snap is also invalid
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
a69532e864 librbd: document diff_iterate in header
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
d0baadb9d3 librbd: uint64_t len for diff_iterate
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
7bbaa71a56 doc/dev/rbd-diff: update incremental file format
Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00
Sage Weil
44e295a3cc qa: rbd/diff_continuous.sh: use non-standard striping
Exercise the striping arithmetic by using non-standard striping that
varies between the parent and child.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-03-31 23:32:41 -07:00