Commit Graph

23002 Commits

Author SHA1 Message Date
Josh Durgin
8bbb4a364d doc: fix rbd permissions for unprotect
Unprotect examines all pools, so use blanket x before 0.54. After
that, use class-read restricted by object_prefix to rbd_children.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
d0a14d110d librbd: fix race between unprotect and clone
Clone needs to actually re-read the header to make sure the image is
still protected before returning. Additionally, it needs to consider
the image protected *only* if the protection status is protected -
unprotecting does not count. I thought I'd already fixed this, but
can't find the commit.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
958addc0c9 rbd: open (source) image as read-only
This allows users without write access to copy, export and list
information about an image.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
47bf519584 librbd: open parent as read-only during clone
We never write to the parent, and don't need to watch it during this process.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
c67c789de6 librbd: add {rbd_}open_read_only()
Since 58890cfad5, regular {rbd_}open()
would fail with -EPERM if the user did not have write access to the
pool, since a watch on the header was requested.

For many uses of read-only access, establishing a watch is not
necessary, since changes to the header do not matter. For example,
getting metadata about an image via 'rbd info' does not care if a new
snapshot is created while it is in progress.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
91e941aef9 OSD: remove RD flag from CALL ops
20496b8d2b forgot to do this. Without
this change, all class methods required regular read permission in
addition to class-read or class-write.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Josh Durgin
85e9d4f000 cls_rbd: get_children does not need write permission
This prevented a read-only user from being able to unprotect a
snapshot without write permission on all pools. This was masked before
by the CLS_METHOD_PUBLIC flag.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-30 00:06:11 -08:00
Sage Weil
942c71454b init-ceph: ok, 8K files
16K might be a bit many.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 17:12:06 -08:00
Sage Weil
0a5d6d8759 msg/Pipe: remove broken cephs signing requirement check
Remove the special-case check, which does not inform the peer what
protocol features are missing.  It also enforces this requirement even
when we negotiate auth none.

Reported as part of bug #3657.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 17:10:28 -08:00
Sage Weil
65b787ea2a msg/Pipe: include remote socket addr in debug output
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 16:00:47 -08:00
Sage Weil
ca34fc4d3c osd: allow RecoveryDone self-transition in RepNotRecovering
In a mixed cluster where some OSDs support the recovery reservations and
some don't, the replica may be new code in RepNotRecoverying and will
complete a backfill.  In that case, we want to just stayin
RepNotRecovering.

It may also be possible to make it infer what the primary is doing even
thought it is not sending recovery reservation messages, but this is much
more complicated and doesn't accomplish much.

Fixes: #3689
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 15:03:10 -08:00
Sage Weil
ea13ecc291 osd: less noise about inefficient tmap updates
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 12:34:15 -08:00
Sage Weil
672c56b18d init-ceph: default to 16K max_open_files
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-28 12:11:55 -08:00
Sage Weil
f6ce5dda43 rgw: disable ops and usage logging by default
Most users don't need this, and having it on will just fill their clusters
with objects that will need to be cleaned up later.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-12-27 22:02:38 -08:00
Noah Watkins
c0fe381556 java: remove deprecated libcephfs
Removes ceph_set_default_*

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2012-12-27 16:39:55 -08:00
Sage Weil
6c7b667bad init-ceph: fix status version check across machines
The local state isn't propagated into the backtick shell, resulting in
'unknown' for all remote daemons.  Avoid backticks altogether.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-27 16:32:29 -08:00
Sage Weil
635673928a osd: fix recovery assert for pg repair case
In the case of PG repair, this assert is not valid.  Disable it for now.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-27 13:26:09 -08:00
tamil
998f71945d dropping xfs test 186 due to bug: 3685
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
2012-12-27 11:27:31 -08:00
Sage Weil
82c71716f7 osd: drop 'osd recovery max active' back to previous default (5)
Having this too large means that queues get too deep on the OSDs during
backfill and latency is very high.  In my tests, it also meant we generated
a lot of slow recovery messages just from the recovery ops themselves (no
client io).

Keeping this at the old default means we are no worse in this respect than
argonaut, which is a safe position to start from.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-27 11:12:33 -08:00
Sage Weil
6f1f03c7d3 journal: reduce journal max queue size
Keep the journal queue size smaller than the filestore queue size.

Keeping this small also means that we can lower the latency for new
high priority ops that come into the op queue.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-27 11:11:08 -08:00
Sage Weil
850d1d544b osd: fix dup failure cancellations
If we had a pending failure report, and send a cancellation, take it
out of our pending list so that we don't keep resending cancellations.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 15:21:18 -08:00
Sage Weil
61d43af747 osd: make MOSDFailure output more sensible
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 15:21:18 -08:00
Sage Weil
9df522e9ec mon: make osd failure report log msgs sensible
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 15:11:39 -08:00
Sage Weil
1290671f15 Merge branch 'wip-scrub' into next
Reviewed-by: Sage Weil <sage@inktank.com>
Conflicts:
	src/osd/PG.cc
2012-12-23 14:42:51 -08:00
Sage Weil
8362e6403e monclient: fix get_monmap_privately retry interval
Use mon_client_hunt_interval (default 3) instead of hardcoding 1 second.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 13:53:21 -08:00
Sage Weil
d843a64a3a Makefile: fix 'base' rule
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 13:53:18 -08:00
Sage Weil
a09f5b1b46 init-ceph,mkcephfs: default inode64 for mounting xfs
According to hch this is now the default or new kernels.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-23 11:18:45 -08:00
Sage Weil
5f25f9f8cf init-ceph: default osd_data path
Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-22 11:10:03 -08:00
Samuel Just
f6b2ca8b38 OSD: always do a deep scrub when repairing
Otherwise, errors turned up in a deep-scrub will be
swept under the rug without being repaired.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 20:37:06 -08:00
Samuel Just
ad9bcc705f PG: don't use a self-transition for WaitRemoteRecoveryReserved
Previously, using the state on active worked, but now we might
go back through WaitRemoteRecoveryReserved without resetting
Active.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 20:37:06 -08:00
Samuel Just
2e96bb1817 PG: Handle repair once in scrub_finish
We don't want to change missing sets during a chunky
scrub since it would cause !is_clean() and derail
the rest of the scrub.  Instead, move the missing,
inconsistent, and authoritative sets into scrubber
and add to during scrub_compare_maps().  Then,
handle repairing objects all at once in scrub_finish().

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 20:35:19 -08:00
Dan Mick
6325a4800d import_export.sh: sparse import export
Add tests for:
   - sparse import makes expected sparse images
   - sparse export makes expected sparse files
   - sparse import from stdin also creates sparse images
   - import from partially-sparse file leads to partially-sparse image
   - import from stdin with zeros leads to sparse
   - export from zeros-image to file leads to sparse file

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 17:03:38 -08:00
Dan Mick
5905d7fae7 rbd: harder-working sparse import from stdin
Try to accumulate image-sized blocks when importing from stdin, even if
each read is shorter than requested; if we get a full block, and it's
all zeroes, we can seek and make a sparse output file

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 17:03:38 -08:00
Dan Mick
410903fe7a rbd: check for all-zero buf in export, seek output if so
Use buf_is_zero in common/util.cc

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 17:03:38 -08:00
Dan Mick
4a558048cf librbd: move buf_is_zero() to new common/util.cc and include/util.h
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 17:03:38 -08:00
Sage Weil
8f5de15605 osd: fix pg stat msgs vs timeout
We can get a pattern like so:

- new mon session
- after say 120 seconds, we decide to send a stats msg
- outstanding_pg_stats is finally true, we immediately time out (30 second
  grace), and reconnect to a new mon
-> repeat

The problem is that we don't reset the last_sent timestamp when we send.
Or that we do this check after sending instead of before.  Fix both.

This should resolve the issue #3661 where osds that don't have pgs
updating are not stats messags to the mon to check in, and are eventually
getting marked down as a result.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-12-21 16:47:50 -08:00
Samuel Just
00ed6657c9 PG::scrub_compare_maps increment scrubber.fixed for missing repairs
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 15:20:22 -08:00
Samuel Just
c9e051746e PG::_compare_scrubmaps: increment scrubber.errors on missing object
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 15:16:19 -08:00
Sage Weil
206ffcd82e mkcephfs: error out if 'devs' defined but 'osd fs type' not defined
We can infer btrfs if they use btrfs devs, but if they use devs there is
no default fs.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-21 14:23:14 -08:00
Sage Weil
11fb314153 Merge remote-tracking branch 'gh/wip-scrub' into next 2012-12-21 13:56:16 -08:00
Sage Weil
47145d8009 Merge remote-tracking branch 'gh/wip-3643' into next
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-12-21 13:45:39 -08:00
Sage Weil
999ba1b2e7 monc: only warn about missing keyring if we fail to authenticate
This avoids the situation where a librados or other user with the default
of 'cephx,none' and no keyring is authenticating against a cluster with
required of 'none' and an annoying warning is generated every time.  Now
we only print a helpful message if we actually failed.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-21 13:44:19 -08:00
Sage Weil
5d5a42bc71 osd: clear CLEAN on exit from Clean state
This means we can drop the scrub repair state_clear() call.  We probably
can drop others, but lets leave that for another day.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-21 13:10:32 -08:00
Yehuda Sadeh
b3e62ad692 auth: use none auth if keyring not found
If both cephx and none are accepted auth methods, and
cephx keyring cannot be found then resort to using
none, instead of failing.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-12-21 12:19:41 -08:00
Samuel Just
4d661e0d01 PG::sched_scrub: only set PG_STATE_DEEP_SCRUB once reserved
Otherwise we would have +DEEP before we have +SCRUB.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 11:36:54 -08:00
Samuel Just
7c56d8fad0 PG::sched_scrub: return true if scrub newly kicked off
The previous return value wasn't really what OSD::sched_scrub
wanted to know.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 11:36:54 -08:00
Sage Weil
ae044e6405 osd: allow transition from Clean -> WaitLocalRecoveryReserved for repair
If we do a scrub repair, we need to go from clean to recovery again to
copy objects around.

This fixes a simple repair of a missing object, either on the primary or
replica.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-21 11:37:48 -08:00
Samuel Just
670afc6c0c PG: in sched_scrub() set PG_STATE_DEEP_SCRUB not scrubber.deep
scrubber.deep gets reset in scrub() to match
state_test(PG_STATE_DEEP_SCRUB).

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-12-21 11:29:47 -08:00
Sage Weil
19e44bff37 osd: clear scrub state if queued scrub doesn't start
We set SCRUBBING when we queue a pg for scrub.  If we dequeue and
call scrub() but abort for some reason (!active, degraded, etc.), clear
that state bit.

Bug is easily reproduced with 'ceph osd scrub N' during cluster startup
when PGs are peering; some PGs can get left in the scrubbing state.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-21 11:29:47 -08:00
Sage Weil
e765dcb4f1 osd: only dec_scrubs_active if we were active
This fixes a bug that puts scrubs_active negative.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-12-20 21:45:09 -08:00