Commit Graph

25522 Commits

Author SHA1 Message Date
John Wilkins
808ad25a28 doc: Removed fragmented logging info. Consolidated into one doc.
Logging was variously described in the ceph configuration document,
a configuration reference, and a section in operations. Since
logging and debugging are generally used with troubleshooting,
I consolidated the docs and placed them in the troubleshooting
section. Also fixed the example and provided additional detail.

fixes: #3804

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-17 18:18:10 -07:00
Wido den Hollander
3c144e9b6c rbd: Only allow shrinking an image when --allow-shrink flag is passed
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-17 15:42:34 -07:00
Greg Farnum
7b408ece37 client: disable invalidate callbacks :(
See #4746; it deadlocks right now.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-17 15:41:19 -07:00
Josh Durgin
90a3bb7ae3 Merge pull request #219 from ceph/wip-rbd-progress
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-17 15:37:11 -07:00
Sage Weil
db37bd8e73 rbd: add --no-progress switch
Disable progress output to stderr.t

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-17 15:31:36 -07:00
Greg Farnum
8f21beb23c leveldbstore: handle old versions of leveldb
The filter_policy (bloom filter) stuff is fairly new in LevelDB's life,
and it turns out that precise's version is too old for it. Add conditional
compilation for those members in order to build and work properly.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-17 15:14:35 -07:00
Sage Weil
4bf2448210 Merge remote-tracking branch 'gh/wip-4521-fix' into next
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-17 15:03:03 -07:00
Yan, Zheng
085b3ec444 mds: change XLOCK/XLOCKDONE's next state to LOCK
For simplelock and filelock, XLOCK/XLOCKDONE's next state is SYNC.
But filelock in XLOCK/XLOCKDONE state allow Fb caps, filelock in
SYNC state does not. So filelock can be stuck in XLOCK/XLOCKDONE
state forever if there are Fb caps issued.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-17 14:39:22 -07:00
Yan, Zheng
efe7399749 mds: pass proper mask to CInode::get_caps_issued
There is a total of 22 cap bits and file lock uses 8 cap bits.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-17 14:39:22 -07:00
Joao Eduardo Luis
f25f922b9e mon: Monitor: convert osdmap_full as well
Store conversion wasn't converting the osdmap_full/ versions, only the
incrementals under osdmap/ and the latest full version stashed.  This
would lead to some serious problems during OSDMonitor's update_from_paxos
when the latest stashed didn't correspond to the first available
incremental.

Fixes: #4521

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-04-17 22:17:51 +01:00
Joao Eduardo Luis
1260041777 mon: PaxosService: add helper function to check if a given version exists
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-04-17 22:17:47 +01:00
Danny Al-Gaaf
246b8118a8 osd/PG.cc: initialize PG::flushed in constructor
Initialize PG::flushed in constructor with false as
described in doc/dev/osd_internals/pg.rst .

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit fb840c8ff7)
2013-04-17 13:31:38 -07:00
Sage Weil
f8183c91e5 Merge pull request #215 from ceph/wip-leveldb-config
os: bring leveldbstore options up to date

Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-17 09:49:11 -07:00
caleb miles
a993d2565f Fix policy handling for RESTful admin api.
Signed-off-by caleb miles <caleb.miles@inktank.com>
2013-04-17 11:42:47 -04:00
Sage Weil
544eb9bda2 qa: pull qemu-iotests from ceph.com mirror
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-16 16:40:14 -07:00
Sage Weil
4865fb73c6 Merge pull request #214 from ceph/wip-objectcacher-handler-ordered
keep write responses to clones in order

Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-16 15:48:15 -07:00
Sage Weil
899456617f librbd: flush on diff_iterate
The diff_iterate() tests fail when caching is enabled because recent writes
aren't visible to listsnaps.  Flush from diff_iterate to ensure that they
are.  Someday, maybe, we might make diff_iterate() inspect the cache
contents to make this more efficient, but for now that is not necessary.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-16 15:46:32 -07:00
John Wilkins
103fb9b0fb Merge branch 'next' of https://github.com/ceph/ceph into next 2013-04-16 13:29:15 -07:00
John Wilkins
efce39e221 doc: Cherry-picked from master to next. Uses ceph-mds package during upgrade.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-16 13:28:18 -07:00
John Wilkins
82aab8dcf3 doc: Cherry-picked from master to next. Rewrite of CloudStack document.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-16 13:26:32 -07:00
John Wilkins
97532875ce doc: Cherry-picked from master to next. Updates config to use virtio.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-16 13:24:47 -07:00
John Wilkins
72b3919c81 doc: Cherry-picked from master to next. Reorders ceph osd create.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-16 13:23:56 -07:00
John Wilkins
3afe84b200 doc: Cherry picked from master to next. Adds comments on naming OSDs.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-16 13:22:13 -07:00
Sage Weil
02d3c114ab os/FileJournal: fix journal completion plug removal
We plug completions when transitioning from a full to non-full journal
to ensure that we do not complete items before we have a stable journal
starting point that is past the committed_thru marker.  However, the order
of the header update and completion queueing means that we never remove
the plug if the journalq is empty--the seq test is always false.  The
result is very slow osd requests that only commit when we do a full sync.

This bug was masked until recently by another issue, fixed in
170d4a3d79.

The simple fix is to reorder the completion queuing before we update the
new header.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-04-16 13:20:58 -07:00
Greg Farnum
d8a354d511 config: provide settings for the LevelDB stores we use
Now that we can set up the LevelDB options internally, provide
config options on the OSD and the Monitor. We leave the OSD values
at the defaults for now as they're performance-sensitive, but we
set new values on the Monitor so that it can scale to large PGMaps.
(Previously there were issues with large PGMaps taking forever to write;
these changes to the use of compression and the default block and
write buffers counteract them.)

Since we pass these variables through, users who are interested in
doing so now can test and tune them more appropriately.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-16 10:59:21 -07:00
Sam Lang
4a84ddbd30 client: Fix inode remove from snaprealm race
This is a follow on fix to b5ce4d0.  Always remove the inode from the
snaprealm's list of inodes_with_caps before the snaprealm ref is
decremented (and the snaprealm potentially gets freed).

Fixes #4694.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-16 09:04:42 -07:00
Sage Weil
6133ea5e59 librbd: use initialized data for DiffIterateDiscard test
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-15 21:50:09 -07:00
Sage Weil
638eb24fe5 librbd: print seed for all DiffIterate tests
This will aid debugging on failures, and give better coverage.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-15 21:32:03 -07:00
Sage Weil
1ddea41fc5 Merge pull request #217 from alram/master
Fix: use absolute path with udev

Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-15 20:32:46 -07:00
Alexandre Marangone
785b25f53d Fix: use absolute path with udev
Avoids the following: udevd[61613]: failed to execute '/lib/udev/bash'
'bash -c 'while [ ! -e /dev/mapper/....

Signed-off-by: Alexandre Marangone <alexandre.marangone@inktank.com>
2013-04-15 15:57:00 -07:00
Josh Durgin
98de67d424 qa: add workunit for running qemu-iotests
This uses the old stand-alone qemu-iotests repo so it works with the
version of qemu in Ubuntu 12.04. The tests depend tightly on qemu
version, so to use later tests we'd need to install corresponding
versions of qemu.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-12 17:59:35 -07:00
Greg Farnum
a0ae2ece49 os: bring leveldbstore options up to date
LevelDB has a lot of options which we don't implement right now. Add
an options struct to the LevelDBStore which users can access as they
wish in order to set values different from the defaults.
This will let us set various size values, as well as turning on
caching or bloom filter read optimizations.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-12 16:29:02 -07:00
Greg Farnum
6b98162f2b mds: output error number when failing to load an MDSTable
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-12 13:14:17 -07:00
Gary Lowell
ae71b576a7 init-radosgw.sysv: New radosgw init file for rpm based systems
Added init-radosgw.sys file for rpm based systems, added it to
the tarball list in the makefile, and updated the specfile to
install it.  Also added the a dependency in ceph since it uses
utility routes from that package (On debian systems these are
packaged in ceph-common).  Incorporated review comments from
Alex. (Bug #4571)

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
Reviewed-by: Alexandre Marangone  <alexandre.marangone@inktank.com>
2013-04-11 23:02:08 -07:00
Greg Farnum
f875c0c913 mds: only go through the max_size change rigamarole if the client requested it
The previous patch was forcing a new size change even if we were
doing it as part of our regular optimistic settings; we don't much
want to do that. This is a small optimization, but Sage asked for
it and it's very easy.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-11 17:42:59 -07:00
Greg Farnum
9c18fd6735 mds: Locker needs to remember requested max_size changes from clients
Previously, if we received an MClientCaps request containing a change
in the inode's max size, and _do_cap_update() was unable to process
the request immediately (due to a locking issue), we would wait-list
the request by adding a call to check_inode_max_size() once the lock
became stable. However, we then tossed out the message without in any
way propagating the new max size which had been requested!

Handle this by extending check_inode_max_size to also accept parameters
for increasing the max size, and by storing all the parameters explicitly
in the C_MDL_CheckMaxSize Context instead of relying on defaults. That
gets us to the point where we *can* notice we need to increase the max. To
actually do so, we now pass calc_new_client_ranges() the requested max
size instead of the actual size if we're doing an update.

Notice that as a side effect of this, all clients get to see the max size
increase instead of just the requester. This should be okay, but it is
chattier than in the optimal case (where we don't get stuck on a lock).

Fixes #3637

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-11 17:30:52 -07:00
Sam Lang
d777b8e66b Merge pull request #213 from ceph/wip-sessionmap-4644
mds: fix session_info_t decoding

Reviewed-by:  Sam Lang <sam.lang@inktank.com>
2013-04-11 09:08:04 -07:00
Gregory Farnum
e32849c4ee Merge pull request #212 from ceph/wip-4451 2013-04-11 08:45:06 -07:00
Sam Lang
4977f3eab0 mds: Delay export on missing inodes for reconnect
The reconnect caps sent by the client on reconnect may not have
inodes found in the inode cache until after clientreplay (when
the client creates a new file, for example). Currently, we send an
export for that cap to the client if we don't see an inode in the cache
and path_is_mine() returns false (for example, if the client didn't
send a path because the file was already unlinked).
Instead, we want to delay handling of the reconnect cap until
clientreplay completes.

This patch modifies handle_client_reconnect() so that we don't assume
the cap isn't ours if we don't have an inode for it, but instead delay
recovery for later. An export cap message is only sent if the inode exists
and the cap isn't ours (non-auth) during reconnect. If any remaining
recovered caps exist in the recovered list once the mds goes active, we
send export messages at that point.

Also, after removing the path_is_mine check,
MDCache::parallel_fetch_traverse_dir() needs to skip non-auth dirfrags.

Fixes #4451.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-11 10:25:46 -05:00
Sam Lang
3a1cf53c30 client: Unify session close handling
If mds failure causes client reconnect while the
client is unmounting, the client will send a session
close request to the mds even if there are outstanding
inodes in the cache waiting to receive flush_acks.   This
causes the mds to send back a session close message and
the client closes the connection, so that when the mds tries
to send flush acks back to the client, they get dropped, resulting
in the client hanging on unmount.  The pattern for this bug is:

1. mds restart
2. client sends session open request
3. client unmount sets unmounting flag and waits for flush_acks
4. mds sends session open reply
5. client sends session close request (because its unmounting)
6. mds sends session close, client closes connection
7. mds tries to send flush_acks, but drops them because the connection
is gone

This patch unifies the session close handling so that the client
only sends a session close in unmount once all flush acks have been
received.  If the mds restarts during session close, the reconnect
logic will kick the session close waiter so that session close requests
are re-sent for session close replies not yet received.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-04-11 10:25:46 -05:00
Josh Durgin
06d05e5ed7 LibrbdWriteback: complete writes strictly in order
RADOS returns writes to the same object in the same order. The
ObjectCacher relies on this assumption to make sure previous writes
are complete and maintain consistency. Reads, however, may be
reordered with respect to each other. When writing to an rbd clone,
reads to the parent must be performed when the object does not exist
in the child yet. These reads may be reordered, resulting in the
original writes being reordered. This breaks the assmuptions of the
ObjectCacher, causing an assert to fail.

To fix this, keep a per-object queue of outstanding writes to an
object in the LibrbdWriteback handler, and finish them in the order in
which they were sent.

Fixes: #4531
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-10 16:57:08 -07:00
Samuel Just
a3298713bb OSD: make pg upgrade logging quiet
Fixes: #4701
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-10 14:24:23 -07:00
Samuel Just
ac720a091d Merge branch 'wip_4654' into next
Fixes: #wip_4654
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-10 14:00:13 -07:00
Alex Elder
351d9b270f rbd qa/workunits: add rbd read data test
This adds a new test script for validating data reads from a mapped
rbd image is what it's expected to be.

See the content of the file for a bit more explanation.

Signed-off-by: Alex Elder <elder@inktank.com>
2013-04-10 15:54:13 -05:00
caleb miles
bb8d1c9897 rgw_admin: Create keys for a new user by default.
Create a new key pair for new users or when --gen-access-key is specified.

Signed-off-by: caleb miles <caleb.miles@inktank.com>
2013-04-10 15:49:01 -04:00
Samuel Just
170d4a3d79 FileJournal: start_seq is seq+1 if journalq.empty()
This is also the same as journaled_seq + 1 for writeahead
journaling, but not for parallel journaling.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-10 12:48:02 -07:00
Samuel Just
90c256d757 FileJournal: fix off by one error in committed_thru
journalq.front().first is the sequence number of the entry
at journalq.front().second.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-10 12:48:02 -07:00
Samuel Just
a4fa0a8200 Journal: commits may not include all journaled seqs
At one point, a commit had to drain the FileStore op
queue.  This is no longer the case.  Consequently, the
journal may have to wait more than one commit for the
filestore to create a stable commit point at a particular
sequence.  Handling this requires two changes:

1) We cannot transition to FULL_WAIT until we receive
a commit_start on a seq >= journaled_seq.
2) We cannot remove the journal completion plug until get
a committed_thru on a seq >= header.start_seq at least as
new as the oldest committed item in the journal.  If on
replay, the journal does not include fs_op_seq, we ignore
it, which is fine since we won't have reported those
entries committed!

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-10 12:48:01 -07:00
Samuel Just
13474b089b Journal: pass the sequence number to commit_start
A subsequent patch will need to see the committing seq.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-10 12:47:23 -07:00
Yan, Zheng
a1d9cbe5af mds: fix session_info_t decoding
commit 0bcf2ac081 changes session_info_t's format, but there is
a typo in the code that decodes old format. We also need to
handle struct_v == 1, which had the same encoding but without
the size guards (which is all handled by DECODE_START_LEGACY_COMPAT).

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-10 12:46:30 -07:00