Commit Graph

23236 Commits

Author SHA1 Message Date
Yan, Zheng
69f9f024e8 mds: fix error hanlding in MDCache::handle_discover_reply()
The error hanlding code in MDCache::handle_discover_reply() has two
main issues. MDCache::handle_discover_reply() does not wake waiters
if dir_auth_hint in reply message is equal to itself's nodeid. This
can happen if discover race with subtree importing. Another issue is
that it checks the existence of cached directory fragment to decide
if it should take waiter from inode or from directory fragment. The
check is unreliable because subtree importing can add directory
fragments to the cache.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-12-23 20:01:11 -08:00
Yan, Zheng
e6b8f0a659 mds: set want_base_dir to false for MDCache::discover_ino()
When frozen inode is encountered, MDCache::handle_discover() sends
reply immediately if the reply message is not empty. When handling
"discover ino" requests, the reply message always contains the base
directory fragment. But requestor already has the base directory
fragment, the only effect of the reply message is wake the requestor
and make it send same "discover ino" request again. So the requestor
keeps sending "discover ino" requests but can't make any progress.

The fix is set want_base_dir to false for MDCache::discover_ino().
After set want_base_dir to false, also need update the code that
handles "discover ino" error.

This patch also remove unused error handling code for flag_error_dn

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-12-23 20:01:11 -08:00
Yan, Zheng
b7e698a52b mds: no bloom filter for replica dir
We should delete dir fragment's bloom filter after exporting the dir
fragment to other MDS. Otherwise the residual bloom filter may cause
problem if the MDS imports dir fragment later.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-12-23 20:01:11 -08:00
Yan, Zheng
0ab0744e6f mds: properly mark dirfrag dirty
If predirty_journal_parents() does not propagate changes in dir's
fragstat into corresponding inode's dirstat, it should mark the
inode as dirfrag dirty. This happens when we modify dir fragments
that are auth subtree roots.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-12-23 20:01:11 -08:00
Yan, Zheng
48d8ae58ef mds: alllow handle_client_readdir() fetching freezing dir.
At that point, the request already auth pins and locks some objects.
So CDir::fetch() should ignore the can_auth_pin check and continue
to fetch freezing dir.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2012-12-23 20:01:11 -08:00
Sage Weil
d9673ca324 Merge branch 'wip-create-layout'
Reviewed-by: Greg Farnum <greg@inktank.com>

The functional tests for the create operations should add and specify non-default
pools, but we don't have a set of library methods to do that yet (to interact with
the monitor).
2012-12-23 19:59:04 -08:00
Sage Weil
8efcf54dc1 mds: *_pg_pool -> *_pool
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 19:39:23 -08:00
Sage Weil
d2f5890f84 client, libcephfs: add method to get the pool name for an open file
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 19:39:23 -08:00
Sage Weil
32ab274a4f client: specify data pool on create operations
Fill in the data pool field if specified by the client, or set to -1.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 19:39:22 -08:00
Sage Weil
3f4582176a mds: verify that the pool id is valid on SET[DIR]LAYOUT
Make sure the data pool exists and is part of the MDSMap data pools list.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 19:39:22 -08:00
Sage Weil
99d9e1daa5 mds: allow data pool to be specfied on create
Reuse old preferred_pg field.  Only use if the new CREATEPOOLID feature
is present, and the value is >= 0.

Verify that the data pool is allowed, or return EINVAL to the client.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 19:39:22 -08:00
Sage Weil
697ed23cb9 client: remove set_default_*() methods
This is a poor interface.  The hadoop stuff is shifting to specify this
information on file creation instead.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 19:39:22 -08:00
Sage Weil
850d1d544b osd: fix dup failure cancellations
If we had a pending failure report, and send a cancellation, take it
out of our pending list so that we don't keep resending cancellations.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 15:21:18 -08:00
Sage Weil
61d43af747 osd: make MOSDFailure output more sensible
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 15:21:18 -08:00
Sage Weil
9df522e9ec mon: make osd failure report log msgs sensible
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 15:11:39 -08:00
Sage Weil
1290671f15 Merge branch 'wip-scrub' into next
Reviewed-by: Sage Weil <sage@inktank.com>
Conflicts:
	src/osd/PG.cc
2012-12-23 14:42:51 -08:00
Sage Weil
8362e6403e monclient: fix get_monmap_privately retry interval
Use mon_client_hunt_interval (default 3) instead of hardcoding 1 second.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 13:53:21 -08:00
Sage Weil
d843a64a3a Makefile: fix 'base' rule
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 13:53:18 -08:00
Sage Weil
00b89c3f7b Merge branch 'next' 2012-12-23 11:19:39 -08:00
Sage Weil
a09f5b1b46 init-ceph,mkcephfs: default inode64 for mounting xfs
According to hch this is now the default or new kernels.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 11:18:45 -08:00
Sage Weil
5f25f9f8cf init-ceph: default osd_data path
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-22 11:10:03 -08:00
Samuel Just
f6b2ca8b38 OSD: always do a deep scrub when repairing
Otherwise, errors turned up in a deep-scrub will be
swept under the rug without being repaired.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 20:37:06 -08:00
Samuel Just
ad9bcc705f PG: don't use a self-transition for WaitRemoteRecoveryReserved
Previously, using the state on active worked, but now we might
go back through WaitRemoteRecoveryReserved without resetting
Active.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 20:37:06 -08:00
Samuel Just
2e96bb1817 PG: Handle repair once in scrub_finish
We don't want to change missing sets during a chunky
scrub since it would cause !is_clean() and derail
the rest of the scrub.  Instead, move the missing,
inconsistent, and authoritative sets into scrubber
and add to during scrub_compare_maps().  Then,
handle repairing objects all at once in scrub_finish().

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 20:35:19 -08:00
Dan Mick
6325a4800d import_export.sh: sparse import export
Add tests for:
   - sparse import makes expected sparse images
   - sparse export makes expected sparse files
   - sparse import from stdin also creates sparse images
   - import from partially-sparse file leads to partially-sparse image
   - import from stdin with zeros leads to sparse
   - export from zeros-image to file leads to sparse file

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 17:03:38 -08:00
Dan Mick
5905d7fae7 rbd: harder-working sparse import from stdin
Try to accumulate image-sized blocks when importing from stdin, even if
each read is shorter than requested; if we get a full block, and it's
all zeroes, we can seek and make a sparse output file

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 17:03:38 -08:00
Dan Mick
410903fe7a rbd: check for all-zero buf in export, seek output if so
Use buf_is_zero in common/util.cc

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 17:03:38 -08:00
Dan Mick
4a558048cf librbd: move buf_is_zero() to new common/util.cc and include/util.h
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 17:03:38 -08:00
Sage Weil
8f5de15605 osd: fix pg stat msgs vs timeout
We can get a pattern like so:

- new mon session
- after say 120 seconds, we decide to send a stats msg
- outstanding_pg_stats is finally true, we immediately time out (30 second
  grace), and reconnect to a new mon
-> repeat

The problem is that we don't reset the last_sent timestamp when we send.
Or that we do this check after sending instead of before.  Fix both.

This should resolve the issue #3661 where osds that don't have pgs
updating are not stats messags to the mon to check in, and are eventually
getting marked down as a result.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-12-21 16:47:50 -08:00
John Wilkins
2bf4f42b6d doc: Added new journaler page to CephFS section. Needs descriptions.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-21 16:14:53 -08:00
John Wilkins
53afac1a21 doc: Added Journaler Configuration to toc tree.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-21 16:14:23 -08:00
John Wilkins
757902d639 doc: Added --mkfs options.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-21 16:09:09 -08:00
John Wilkins
46d0334456 doc: Added running multiple clusters. Per Tommi.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-21 16:08:05 -08:00
John Wilkins
e3d075667b doc: Updated the Configuration File section.
- Replaced ceph.conf with Ceph configuration to clarify
  when running multiple clusters on the same hardware.
- Added a [client] entry so people know it can be set too.
- Updated existing auth example.
- Added an authentication section with a link to the cephx guide.
- Added section for running multiple clusters. Per Tommi.


Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-21 16:07:27 -08:00
Samuel Just
00ed6657c9 PG::scrub_compare_maps increment scrubber.fixed for missing repairs
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 15:20:22 -08:00
Samuel Just
c9e051746e PG::_compare_scrubmaps: increment scrubber.errors on missing object
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 15:16:19 -08:00
Josh Durgin
4a039393a1 release-notes: add more user-visible changes
These are from looking through the shortlog from 0.48.2..next.
The description of the min_size defaults could probably be improved.
I did not look closely at radosgw or cephfs changes.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 15:15:46 -08:00
Josh Durgin
b39928dfa1 release-notes: remove bug fix that does not affect argonaut
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 15:15:46 -08:00
Josh Durgin
048567e01d release-notes: fix typos
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 15:15:46 -08:00
Josh Durgin
3076e45966 release-notes: pgnum is required now
This should have been in the 0.55 release notes.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 15:15:46 -08:00
Josh Durgin
b564fdb843 release-notes: remove warning about osd caps
This was only an issue from 0.49-0.52 upgrading to 0.53+

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 15:15:46 -08:00
John Wilkins
09d4f0365d doc: Added sudo the ceph health for when cephx is on.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-21 14:54:18 -08:00
John Wilkins
085992f672 doc: minor fix to syntax.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2012-12-21 14:53:28 -08:00
Sage Weil
206ffcd82e mkcephfs: error out if 'devs' defined but 'osd fs type' not defined
We can infer btrfs if they use btrfs devs, but if they use devs there is
no default fs.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-21 14:23:14 -08:00
Sage Weil
4a40067db6 doc: update ceph.conf examples about btrfs default
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-21 14:04:30 -08:00
Sage Weil
11fb314153 Merge remote-tracking branch 'gh/wip-scrub' into next 2012-12-21 13:56:16 -08:00
Sage Weil
47145d8009 Merge remote-tracking branch 'gh/wip-3643' into next
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 13:45:39 -08:00
Sage Weil
999ba1b2e7 monc: only warn about missing keyring if we fail to authenticate
This avoids the situation where a librados or other user with the default
of 'cephx,none' and no keyring is authenticating against a cluster with
required of 'none' and an annoying warning is generated every time.  Now
we only print a helpful message if we actually failed.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-21 13:44:19 -08:00
Sage Weil
5d5a42bc71 osd: clear CLEAN on exit from Clean state
This means we can drop the scrub repair state_clear() call.  We probably
can drop others, but lets leave that for another day.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-21 13:10:32 -08:00
Yehuda Sadeh
b3e62ad692 auth: use none auth if keyring not found
If both cephx and none are accepted auth methods, and
cephx keyring cannot be found then resort to using
none, instead of failing.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-12-21 12:19:41 -08:00