Commit Graph

26805 Commits

Author SHA1 Message Date
Sage Weil
bcfd2f31a5 udev: drop useless --mount argument to ceph-disk
It doesn't mean anything anymore; drop it.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 14:04:54 -07:00
Sage Weil
b139152039 ceph-disk-udev: activate-journal
Trigger 'ceph-disk activate-journal' from the alt udev rules.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 14:04:54 -07:00
Sage Weil
e5ffe0d248 ceph-disk: do not use mount --move (or --bind)
The kernel does not let you mount --move when the parent mount is
shared (see, e.g., https://bugzilla.redhat.com/show_bug.cgi?id=917008
for another person this also confused).  We can't use --bind either
since that (on RHEL at least) screws up /etc/mtab so that the final
result looks like

 /var/lib/ceph/tmp/mnt.HNHoXU /var/lib/ceph/osd/ceph-0 none rw,bind 0 0

Instead, mount the original dev in the final location and then umount
from the old location.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 14:04:54 -07:00
Sage Weil
f3234c147e ceph.spec: include by-partuuid udev workaround rules
These are need for old or buggy udev.  Having them for new and unbroken
udev is harmless.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 14:04:42 -07:00
Sage Weil
1aa7f59537 ceph.spec: add missing ceph_test_rados_api_cmd to package
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 14:04:24 -07:00
Sage Weil
b1293ee834 ceph: flush stderr, stdout for sane output; add prefix
Aie.

e.g., ceph tell mon.* injectargs '--debug-ms 1'

 mon.a: injectargs:debug_ms=1/1
 mon.b: injectargs:debug_ms=1/1
 mon.c: injectargs:debug_ms=1/1

or

 osd.0: debug_ms=1/1
 osd.1: debug_ms=1/1
 osd.2: Problem getting command descriptions from ('osd', '2'), ENXIO
 osd.3: Problem getting command descriptions from ('osd', '3'), ENXIO
 osd.4: Problem getting command descriptions from ('osd', '4'), ENXIO
 osd.5: Problem getting command descriptions from ('osd', '5'), ENXIO

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-06-14 12:35:46 -07:00
Sage Weil
82ff72f827 ceph-disk: work around buggy rhel/centos parted
parted on RHEL/Centos prefixes the *machine readable output* with

 1b 5b 3f 31 30 33 34 68

Note that the same thing happens when you 'import readline' in python.

Work around it!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 12:10:49 -07:00
Joao Eduardo Luis
92b8300759 mon: OSDMonitor: don't ignore apply_incremental()'s return on UfP [1]
apply_incremental() may return -EINVAL.  Don't ignore it.

[1] UfP = Update from Paxos

Fixes: #5343

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-14 11:22:27 -07:00
Sage Weil
7e08ed1bf1 upstart: start ceph-all on runlevel [2345]
Starting when only one network interface has started breaks machines with
multiple nics in very problematic ways.

There may be an earlier trigger that we can use for cases where other
services on the local machine depend on ceph, but for now this is better
than the existing behavior.

See #5248

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 11:21:25 -07:00
Sage Weil
7503db9a17 ceph: fix mon.*
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 11:02:06 -07:00
Sage Weil
a2b2f39bf3 librados: add tests for too-large objects
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 10:17:31 -07:00
Sage Weil
4a1eb3c8fa osd: fix types for size checks
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 10:14:54 -07:00
Sage Weil
2be3c8dd6f remove RELEASE_CHECKLIST
This ancient document has long since been replaced by
doc/dev/release-process.rst.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-14 09:42:08 -07:00
David Zafman
f1b6bd7988 osd: EINVAL from truncate causes osd to crash
Maximum object size is 100GB configurable with osd_max_object_size
Error EFBIG if attempt to WRITE/WRITEFULL/TRUNCATE beyond osd_max_object_size
Error EINVAL if length < 1 for WRITE/WRITEFULL/ZERO
Make ZERO beyond existing size a no-op

Fixes: #5252
Fixes: #5340

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-14 09:40:28 -07:00
Sage Weil
bcfbd0a3ff ceph_test_rados: add --pool <name> arg
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 22:08:36 -07:00
Sage Weil
9b66f1aa81 Merge remote-tracking branch 'gh/next' 2013-06-13 21:33:25 -07:00
Dan Mick
c672b777f8 Merge pull request #362 from ceph/wip-4984
ceph-disk: udev/partprobe redo, zap command, activate-journal command
2013-06-13 19:37:37 -07:00
Sage Weil
02599c43b4 ceph-fuse: fix uninitialized variable
There is a delete call in the out_mc_start_failed path.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 18:13:34 -07:00
Sage Weil
a2a78e8d16 ceph-disk: implement 'activate-journal'
Activate an osd via its journal device.  udev populates its symlinks and
triggers events in an order that is not related to whether the device is
an osd data partition or a journal.  That means that triggering
'ceph-disk activate' can happen before the journal (or journal symlink)
is present and then fail.

Similarly, it may be that they are on different disks that are hotplugged
with the journal second.

This can be wired up to the journal partition type to ensure that osds are
started when the journal appears second.

Include the udev rules to trigger this.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 18:01:43 -07:00
Sage Weil
8b3b59e014 ceph-disk: call partprobe outside of the prepare lock; drop udevadm settle
After we change the final partition type, sgdisk may or may not trigger a
udev event, depending on how well udev is behaving (it varies between
distros, it seems).  The old code would often settle and wait for udev to
activate the device, and then partprobe would uselessly fail because it
was already mounted.

Call partprobe only at the very end, after prepare is done.  This ensures
that if partprobe calls udevadm settle (which is sometimes does) we do not
get stuck.

Drop the udevadm settle.  I'm not sure what this accomplishes; take it out,
at least until we determine we need it.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 18:01:43 -07:00
Sage Weil
10ba60cd08 ceph-disk: add 'zap' command
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 18:01:43 -07:00
Sage Weil
71402a5daa Merge pull request #363 from dmick/wip-cli-help
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-13 17:47:41 -07:00
Dan Mick
06f0b72485 ceph.in: allow args with -h to limit help to cmds that match partially
Enables "ceph -h pg" to see just the pg commands

Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-06-13 17:40:02 -07:00
Dan Mick
6ebfd3c923 ceph.in: better global description of tool
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-06-13 17:38:50 -07:00
Dan Mick
821b203c4e ceph.in: less verbosity on error
Only show 'did you mean?' when in verbose mode
Only show first ten closest matches on error

Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-06-13 17:38:26 -07:00
Sage Weil
99bd5c8f7b librados: add missing #include
librados/librados.cc: In function 'int rados_mon_command_target(void*, const char*, const char**, size_t, const char*, size_t, char**, size_t*, char**, size_t*)':
error: librados/librados.cc:1877: 'LONG_MAX' was not declared in this scope
error: librados/librados.cc:1877: 'LONG_MIN' was not declared in this scope

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 17:38:02 -07:00
Sage Weil
93505bb3c7 librados: wait for osdmap for commands that need it
In commit 7e1cf87b51 we stopped waiting for
the osdmap on start because the Objecter will normally wait, but for some
commands we assume the osdmap is recent(ish).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-06-13 16:39:30 -07:00
Gary Lowell
f6a864d079 rules: Don't disable tcmalloc on ARM (and other non-intel)
Fixes #5342

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-06-13 16:38:26 -07:00
Sage Weil
763432a3cc Merge pull request #356 from ceph/wip-leaks
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-06-13 16:21:21 -07:00
Sage Weil
95aa2e8d07 Merge branch 'wip-objecter' into next
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-06-13 16:15:44 -07:00
Sage Weil
2bda9db1c2 osdc/Objecter: dump command ops
Dump command_ops along with everything else.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 16:01:31 -07:00
Sage Weil
6e73d999af osdc/Objecter: ping osds for which we have pending commands
As with ops and linger_ops, this ensures we detect connection resets.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 15:57:57 -07:00
Dan Mick
e4f9dce7a5 ceph.in: refuse 'ceph <type> tell' commands; suggest 'ceph tell <type>'
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-13 15:56:52 -07:00
Dan Mick
a6876ad7d9 ceph.in: argparsing cleanup: suppress --completion, add help
Options -v, --verbose, --concise didn't have helpstrings
Option --completion doesn't quite work yet, and should be hidden anyway

Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-06-13 15:30:38 -07:00
Sage Weil
392e86fbff Merge remote-tracking branch 'gh/next' 2013-06-13 15:17:05 -07:00
Sage Weil
68a91995ba osdc/Objecter: kick command ops on osd con resets
Resend osd/pg commands on the OSDSession, just as we do with other request
types.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 15:16:20 -07:00
Sage Weil
db7d12103a osdc/Objecter: add perfcounters for commands
This matches the other counters we maintain for other kinds of ops.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 15:16:19 -07:00
Sage Weil
9a7ed0b3f8 mon: fix idempotency of 'osd crush add'
If we add an item that already exists in particular position, we should
update instead of inserting it; the CrushWrapper methods are not
idempotent.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 14:42:05 -07:00
Sage Weil
7e1cf87b51 librados: do not wait for osdmap on start
If we abort while waiting, we incorrect clean up (we switch the state value
incorrectly, and also fail to clean up the initialized objecter).

Intead, skip this wait.. it's useless!

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-06-13 14:42:03 -07:00
John Wilkins
51dae8ad7c doc: Updated with glossary terms.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-06-13 14:09:35 -07:00
Sage Weil
35ea1639aa mon/MonmapMonitor: remove unused label
mon/MonmapMonitor.cc: In member function 'bool MonmapMonitor::preprocess_command(MMonCommand*)':
mon/MonmapMonitor.cc:273:2: warning: label 'out' defined but not used [-Wunused-label]

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 11:27:49 -07:00
Sage Weil
987f175fb8 mon/MonCap: bootstrap-* need to subscribe to osdmap, monmap
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 11:27:23 -07:00
Sage Weil
0193f88519 mon/MonClient: mark_down during get_monmap_privately() shutdown
We explicitly mark_down() and clear cur_con when shutting down; do the same
for get_monmap_privately() to ensure that the reset event doesn't make us
do something silly (like, in this case, call _reopen_session() again).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:53:06 -07:00
Sage Weil
962d118743 mon/MonClient: mark_down connection on shutdown
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:53:06 -07:00
Sage Weil
597e4398b5 msgr: queue reset when marking down pipes on shutdown
This lets the callbacks clean up ref cycles.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:53:06 -07:00
Sage Weil
ea6880f8a2 msg/DispatchQueue: do not discard queued events on stop
When the shutdown/stop flag is set, continue to work through the queue.
Process events, but discard messages.  This avoids the loss of reset events
on shutdown that are necessary to clean up ref cycles.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:53:06 -07:00
Sage Weil
de64bc50f2 msgr: queue reset exactly once on any connection
Use the atomic pipe link removal as a signal that we are the one failing
the con and use that to queue the reset event.

This fixes the case where we have an open, the session gets set up via the
handle_accept callback, and then race with another connection and go into
wait + close, or just close.  In that case, fault() needs to queue a reset
event to match the accept.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:52:18 -07:00
Sage Weil
26e16c008d msg/Pipe: include con reef in debug prestring
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:52:18 -07:00
Sage Weil
eea73ab88f msg/Pipe: reset replaced pipes
This gives the ms_handle_reset call a chance to clean up (for example, by
breaking a con->priv <-> session reference cycle).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:52:18 -07:00
Sage Weil
e96c0ceec7 msgr: use ConnectionRef throughout
Make RefCountedObject a private parent of Connection so that users are
forced to use ConnectionRef whenever references are taken.

Many methods can still take a raw Connection* when they are using the
caller's reference but not taking their own; this is cheaper than
twiddling the reference count, and the lifetime is still well defined.
Local variables generally use ConnectionRef, though.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-13 10:52:18 -07:00