Commit Graph

15206 Commits

Author SHA1 Message Date
Sage Weil
d72bdab7d7 mds: take a remote_wrlock on srcdir for cross-mds rename
This ensures that we hold a wrlock on the srcdn auth when the slave
makes it's changes to the src directory, and prevents us from corrupting
the scatterlock state.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-08 09:48:00 -07:00
Sage Weil
025748a695 mds: implement remote_wrlock
For the rename code to behave, we need to hold a wrlock on the slave node
to ensure that any racing gather (mix->lock) is not sent prior to the
_rename_prepare() running; otherwise we violate the locking rules and
corrupt rstats.

Implement a remote_wrlock that will be used by rename.  The wrlock is held
on a remote node instead of the local node, and is set up similarly to
remote_xlocks.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-08 09:48:00 -07:00
Sage Weil
4d5b05380e client: clean up debug output
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-08 09:32:19 -07:00
Sage Weil
c3a40829d7 mds: add mix->lock(2) state
There is a problem with the wrlocks and cross-mds renames:

 - master (dest auth, srci auth, srcdir replica) takes wrlock on srcdiri
 - something triggers a srcdiri lock, putting inest/ifile lock in mix->lock
   state
 - slave (srcdir auth) sends LOCKACK
 - master sends prepare_rename
 - slave (srcdir auth) does rename prepare, which modifies srcdir

Even though the master holds a wrlock on the srcdiri, the gather starts
immediately and the slave sends the LOCKACK before the master's wrlock is
released.

To fix this, we add a new mix->lock(2) state, and we do not start the
mix->lock gather from replicas until the local gather completes, _after_
the auth's wrlock is released.  This makes the master's wrlock sufficient
to ensure the prepare_rename on the slave is save.

This also works when the slave is the srci auth, since the gather won't
complete until the master releases its wrlock.  BUT, it does NOT work if a
third MDS is the srcdiri auth, since it can still gather from the slave
prior to the master releasing its wrlock.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-08 09:32:19 -07:00
Sage Weil
088013b89f mds: cleanup: use enum for lock states
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-06 13:49:33 -07:00
Sage Weil
528b615112 Merge branch 'next' 2011-07-06 13:49:14 -07:00
Yehuda Sadeh
8f9eaf0de3 rgw: when listing objects, set locator key only when needed 2011-07-06 13:34:13 -07:00
Sage Weil
1d7fbed6fd rados: rename load-gen options
No abbreviations, update usage().

Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-06 08:59:31 -07:00
Colin Patrick McCabe
1da8f8177a honor CINIT_FLAG_NO_DEFAULT_CONFIG_FILE
Don't use CEPH_CONF_FILE_DEFAULT when CINIT_FLAG_NO_DEFAULT_CONFIG_FILE
is set.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-07-05 14:56:59 -07:00
Tommi Virtanen
5b2de2b9d6 mkcephfs: Only create OSD journal dir if we have a journal.
Thanks to huang jun <hjwsm1989@gmail.com> for finding the bug.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-07-05 14:57:10 -07:00
Sage Weil
2aa146a7a2 mds: always clear_flushed() after finish_flush()
The scatter_writebehind_finish() is always followed up by an eval_gather(),
which does the clear_flushed().  For everyone else (replicas!), we need to
clear it immediately to avoid confusing things later.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-05 14:22:24 -07:00
Sage Weil
fb7696f3b3 client: fix num_flushing_caps accounting
This only affects debug output, fwiw.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-05 13:58:53 -07:00
Sage Weil
e9e3883d0d client: don't call flush_snaps when nothing to flush
Otherwise we fail an assert.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-05 13:43:27 -07:00
Sage Weil
933e34951b mds: kill stray break
This broke with the gatherbuilder addition.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-05 13:43:23 -07:00
Sage Weil
7e1f09ff4f context: implement complete()
finish() requires the caller to delete.  complete() does that for you by
calling finish() and then doing delete this.  Unless you overload it and
do something else.  This will allow us to make Contexts are are reusable,
for example, by overloading complete() instead of finish() and managing
the lifecycle in some other way.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-07-05 13:33:40 -07:00
Sage Weil
b11b5826ad Merge branch 'stable' 2011-07-05 10:07:11 -07:00
Tommi Virtanen
531f46c384 logrotate.conf: Mark stat/*.log as "missingok"; it's not always there.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-07-05 10:00:51 -07:00
Sage Weil
529df5dbd8 Merge branch 'stable' 2011-07-05 09:18:27 -07:00
Alexandre Oliva
b670f31dc9 Move stat/*.log to the end of logrotate.conf
Logrotate ignores entries after a rule that doesn't match any files.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-05 09:18:15 -07:00
Sage Weil
6feab3cbf4 mds: fix file_excl assert
If we are in XSYN state and want to move to anything else, we must go via
EXCL, but we may not be loner anymore.  Weaken the file_excl() assert so we
don't crash.

Reported-by: Fyodor Ustinov <ufm@ufm.su>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-05 08:58:26 -07:00
Colin Patrick McCabe
924a3225ac obsync: improve formatting a little bit
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-07-01 15:44:33 -07:00
Colin Patrick McCabe
da917ade4a obsync: add man page, documentation line
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-07-01 15:28:37 -07:00
Colin Patrick McCabe
f5cca2e8ab buffer: remove do_cow, clone_in_place
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-07-01 10:56:06 -07:00
Sage Weil
bd79ae82a1 Merge remote branch 'origin/wip-client' 2011-07-01 08:52:07 -07:00
Colin Patrick McCabe
a6ffcc8dfb librados: close very small race condition
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-07-01 00:23:39 -07:00
Sage Weil
0e6de71580 mon: add 'osd create [id]' command
If the id is specified, mark a non-existant osd rank as existant.  The id
must fall within the current [0,max) range.  This is the counterpart of
'osd rm <id>'.

If the id is not specified, allocate an unused osd id and set the EXISTS
flag.  Increase max_osd as needed.

Closes: #1244
Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-30 23:17:49 -07:00
Sage Weil
1af8998c02 client: clean up cap flush methods
We grew several copies of this code, and it turns out none of them were correct.

- assign flush tid in send_cap() helper
- pin inode on (dirty | flushing), not either/both
- add a proper mark_caps_flushing helper

and a bunch of other stuff.  This brings this bit of code in alignment with
the kernel implementation.

And, flush_caps() on cap import.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-06-30 22:24:03 -07:00
Sage Weil
984e5a0a6a Makefile: libmds.a, not libmds.la
We never link this into a .so, so avoid building it again with -fPIC.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-06-30 22:15:11 -07:00
Sage Weil
038a754fd4 mds: fix off-by-one in cow_inode vs snap flushes
We need to wait for the client to flush snapped caps if the client has
not already flushed for the given snap.  If the client has already flushed
caps through the last snapid for the old inode, we do not need to set up
the snapped inode's locks to wait for that.

This fixes an occasional hang on the snaps/snaptest-multiple-capsnaps.sh
workunit.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-06-30 22:04:42 -07:00
Yehuda Sadeh
1206625b57 rgw: fix of users are created suspended 2011-06-30 14:45:32 -07:00
Colin Patrick McCabe
ca6d239083 Fix handling of CEPH_CONF
Formerly, CEPH_CONF was not respected by libraries. But now it is.
It overrides the default when reading the config file.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 14:43:01 -07:00
Yehuda Sadeh
31d49735f7 rados: fix warning 2011-06-30 14:00:23 -07:00
Sage Weil
6e49415c21 client: only send one flushsnap once per mds session
This mirrors a kclient change a while back (e835124).

We only want to send one flushsnap cap message per MDS session:
 - it's a waste to send multiples
 - the mds will only reply to the first one

If the mds restarts we need to resend.

This fixes a hang where we send multiples, the first (and only) reply is
ignored (due to tid mismatch), and we are left with dangling references to
the inode and hang on umount.  (Reliably reproduced by running the full
snaps/ workunit directory.)

Fixes: #1239
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-06-30 13:44:24 -07:00
Yehuda Sadeh
133904d753 Merge branch 'rados-load-gen' 2011-06-30 13:33:28 -07:00
Yehuda Sadeh
860c6657f1 rados tool: load generator 2011-06-30 13:32:59 -07:00
Colin Patrick McCabe
2f5925ea6a Add "How to use C_GatherBuilder" comment
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 10:24:04 -07:00
Colin Patrick McCabe
f69fcc7076 C_GatherBuilder: add C_GatherBuilder::activate()
Add an activate() function that must be called before we call the
onfinish callback. This is especially important in multi-threaded
contexts, since otherwise if completions come in in the wrong order, we
may delete the C_Gather object right before calling new_sub on it!

Also delete rm_subs because it is redundant with sub_finish.

Finally, num_subs_created, num_subs_remaining are now methods on
C_GatherBuilder rather than C_Gather.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 10:24:04 -07:00
Colin Patrick McCabe
16b6567839 C_Gather: remove unused "any" option
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 10:24:04 -07:00
Colin Patrick McCabe
728c132aa3 C_Gather: hide constructor, convert uses
Note: this fixes a small memory leak in MDCache::open_snap_parents.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 10:24:04 -07:00
Colin Patrick McCabe
9771a8e297 C_GatherBuilder: more uses, add set_finisher
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 10:24:04 -07:00
Colin Patrick McCabe
562a04df18 Filer.h: use C_GatherBuilder
Filer.h now uses C_GatherBuilder to avoid memory leaks.

Also, C_GatherBuilder's constructor now takes a Context.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 10:24:03 -07:00
Colin Patrick McCabe
4772bb693d Add C_GatherBuilder
C_Gather objects are deleted by the last sub-context to execute.
If you create a C_Gather object manually, you must worry about the case
where there are no sub-contexts.

C_GatherBuilder is a little object that sits on the stack that allows
you to build C_Gather objects without worrying about this.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 10:24:03 -07:00
Colin Patrick McCabe
5f53131f0a mds/journal.cc: remove deadcode
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 10:24:03 -07:00
Colin Patrick McCabe
a157bbb8f2 Add compiler_extensions.h for warn_unused_result
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-06-30 10:24:03 -07:00
Wido den Hollander
648e50e616 obsync: Depend on python-lxml on Debian derived platforms
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-30 09:58:41 -07:00
Samuel Just
2fbba81f64 osd: don't spew spurious scrub unreserve messages
The past primary was sending out scrub unreserve messages to all the
non-primary OSDs in the acting set on a PG state change. They're
spurious since the other OSDs will cancel the scrubs themselves
on state change, and they weren't right anyway because the loop
was looking at all the non-primary OSDs and sending out a message,
which could have excluded the new primary (if it was a replica before)
included other OSDs new to the PG, and included the current OSD.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-06-30 09:21:25 -07:00
Sage Weil
7779ca1512 client: more inode ref counting debugging
blech

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-06-29 20:32:59 -07:00
Sage Weil
9da44e67f4 client: do not leak MetaRequests on get_or_create() failure
Avoid leaking in the error paths.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-29 20:17:05 -07:00
Sage Weil
553c8a9fd9 client: do not assume MetaRequest's dentries are linked
The dentries we reference may have been unlinked prior to us sending this
request.  That's fine as long as we don't dereference a null dentry.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-29 20:17:05 -07:00
Sage Weil
490f7e95e7 client: pin dentries referenced by MetaRequest
Pin dentries referenced by MetaRequest.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-29 20:17:05 -07:00