Commit Graph

26782 Commits

Author SHA1 Message Date
Sage Weil
9ae0ec83da mon/Elector: cancel election timer if we bootstrap
If we short-circuit and bootstrap, cancel our timer.  Otherwise it will
go off some time later when we are in who knows what state.

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-06-24 18:51:07 -07:00
Sage Weil
03d3be3eaa mon: cancel probe timeout on reset
If we are probing and get (say) an election timeout that calls reset(),
cancel the timer.  Otherwise, we assert later with a splat like

2013-06-24 01:09:33.675882 7fb9627e7700  4 mon.b@0(leader) e1 probe_timeout 0x307a520
2013-06-24 01:09:33.676956 7fb9627e7700 -1 mon/Monitor.cc: In function 'void Monitor::probe_timeout(int)' thread 7fb9627e7700 time 2013-06-24 01:09:43.675904
mon/Monitor.cc: 1888: FAILED assert(is_probing() || is_synchronizing())

 ceph version 0.64-613-g134d08a (134d08a965)
 1: (Monitor::probe_timeout(int)+0x161) [0x56f5c1]
 2: (Context::complete(int)+0xa) [0x574a2a]
 3: (SafeTimer::timer_thread()+0x425) [0x7059a5]
 4: (SafeTimerThread::entry()+0xd) [0x7065dd]
 5: (()+0x7e9a) [0x7fb966f62e9a]
 6: (clone()+0x6d) [0x7fb9652f9ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Fixes: #5438
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-06-24 18:12:11 -07:00
Sage Weil
521fdc2a4e mon/AuthMonitor: ensure initial rotating keys get encoded when create_initial called 2x
The create_initial() method may get called multiple times; make sure it
will unconditionally generate new/initial rotating keys.  Move the block
up so that we can easily assert as much.

Broken by commit cd98eb0c65.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-24 17:58:48 -07:00
Sage Weil
31d6062076 init-radosgw.sysv: remove -x debug mode
Fixes: #5443
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-24 17:42:11 -07:00
Sage Weil
eb86eebe1b common/pick_addresses: behave even after internal_safe_to_start_threads
ceph-mon recently started using Preforker to working around forking issues.
As a result, internal_safe_to_start_threads got set sooner and calls to
pick_addresses() which try to set string config values now fail because
there are no config observers for them.

Work around this by observing the change while we adjust the value.  We
assume pick_addresses() callers are smart enough to realize that their
result will be reflected by cct->_conf and not magically handled elsewhere.

Fixes: #5195, #5205
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-06-24 16:15:13 -07:00
Dan Mick
cad8cf5818 Add python-argparse to dependencies (for pre-2.7 systems)
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-06-24 14:50:07 -07:00
Sage Weil
f046dab88f mds: do not assume segment list is non-empty in standby_trim_segments
If we restart standby replay shortly after startup, before we actually have
any segments, we an trigger a segfault here:

 ceph version 0.64-441-gc39b99c (c39b99cdec)
 1: ceph-mds() [0x975caa]
 2: (()+0xfcb0) [0x7fc33b5a5cb0]
 3: (MDLog::standby_trim_segments()+0x192) [0x78a932]
 4: (MDS::C_MDS_StandbyReplayRestartFinish::finish(int)+0x39) [0x595f69]
 5: (Journaler::_finish_reprobe(int, unsigned long, Context*)+0x190) [0x7917b0]
 6: (Filer::_probed(Filer::Probe*, object_t const&, unsigned long, utime_t)+0x558) [0x7c6b38]
 7: (Objecter::C_Stat::finish(int)+0xc0) [0x7c7930]
 8: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe48) [0x7b2c78]
 9: (MDS::handle_core_message(Message*)+0xae8) [0x589858]
 10: (MDS::_dispatch(Message*)+0x2f) [0x589a1f]
 11: (MDS::ms_dispatch(Message*)+0x1d3) [0x58b4a3]
 12: (DispatchQueue::entry()+0x3f1) [0x943861]
 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x86e32d]

Fixes: #5333
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit abd0ff64e1)
2013-06-24 10:58:08 -07:00
Sage Weil
7ef921c8c2 Merge pull request #374 from ceph/wip-5427
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-06-24 10:20:24 -07:00
Sage Weil
cd98eb0c65 mon/AuthMonitor: make initial auth include rotating keys
This closes a very narrow race during mon creation where there are no
service keys.

Fixes: #5427
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-23 09:25:55 -07:00
Sage Weil
9b2dfb7507 mon: do not leak no_reply messages
I think I assumed no_reply() was releasing the references, but it is
not.  Which is better, since send_reply() doesn't either.  Fix the leaks
by dropping the message ref explicitly.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-06-23 08:53:12 -07:00
Sage Weil
ad12b0d61b mon: fix leak of MOSDFailure messages
We need to discard/cancel/free the failure report messages before we
cancel a report out.  Assert in the dtor to ensure we didn't forget.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-06-23 08:53:09 -07:00
Sage Weil
1aca370ed0 debian: ceph-common requires matching version of python-ceph
If they skew the ceph_argparse.py module may be missing.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-22 10:28:16 -07:00
Dan Mick
94eada4046 Add header comments and Inktank copyrights to ceph.in/ceph_argparse.py
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-06-21 18:39:59 -07:00
Dan Mick
67a3c1e48d ceph.in: rip out reusable code to pybind/ceph_argparse.py
Signed-off-by: Dan Mick <dan.mick@inktank.com>

Conflicts:
	src/ceph.in
2013-06-21 18:39:43 -07:00
Sage Weil
c4272a1758 ceph: even shinier
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-06-21 15:52:32 -07:00
Sage Weil
34ef2f2484 ceph: do not busy-loop on ceph -w
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-06-21 15:50:59 -07:00
Sage Weil
27912e5858 librados: make cmd test tolerate NXIO for osd commands
The cluster may be thrashing underneath us; tolerate NXIO in case the OSD
is currently down.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-21 14:53:22 -07:00
Dan Mick
31d221c3a4 ceph.in: remove some TAB chars
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-06-20 15:14:36 -07:00
Dan Mick
69e1a9121d ceph.in: fix ^C handling in watch (trap exception in while, too)
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-06-20 15:14:36 -07:00
Sage Weil
29f6f27729 ceph: --version as well as -v
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-20 15:04:51 -07:00
Sage Weil
ebb46c452e qa/workunits/misc/multiple_rsync.sh: wtf
2013-06-15T12:55:29.808 INFO:teuthology.task.workunit.client.0.err:+ rsync -auv --exclude local/ /usr/ usr.1
2013-06-15T12:55:29.808 INFO:teuthology.task.workunit.client.0.err:+ tee a
2013-06-15T12:55:29.820 INFO:teuthology.task.workunit.client.0.out:sending incremental file list
2013-06-15T12:56:46.019 INFO:teuthology.task.workunit.client.0.out:
2013-06-15T12:56:46.020 INFO:teuthology.task.workunit.client.0.out:sent 1452634 bytes  received 7485 bytes  19086.52 bytes/sec
2013-06-15T12:56:46.020 INFO:teuthology.task.workunit.client.0.out:total size is 3205063225  speedup is 2195.07
2013-06-15T12:56:46.020 INFO:teuthology.task.workunit.client.0.err:+ wc -l a
2013-06-15T12:56:46.021 INFO:teuthology.task.workunit.client.0.out:4 a
2013-06-15T12:56:46.022 INFO:teuthology.task.workunit.client.0.err:+ wc -l a
2013-06-15T12:56:46.022 INFO:teuthology.task.workunit.client.0.err:+ grep 4
2013-06-15T12:56:46.023 INFO:teuthology.task.workunit.client.0.out:4 a
2013-06-15T12:56:46.024 INFO:teuthology.task.workunit.client.0.err:+ rsync -auv --exclude local/ /usr/ usr.2
2013-06-15T12:56:46.024 INFO:teuthology.task.workunit.client.0.err:+ tee a
2013-06-15T12:56:46.112 INFO:teuthology.task.workunit.client.0.out:sending incremental file list
2013-06-15T12:57:17.172 INFO:teuthology.task.workunit.client.0.out:
2013-06-15T12:57:17.174 INFO:teuthology.task.workunit.client.0.out:sent 1452634 bytes  received 7485 bytes  46352.98 bytes/sec
2013-06-15T12:57:17.174 INFO:teuthology.task.workunit.client.0.out:total size is 3205063225  speedup is 2195.07
2013-06-15T12:57:17.175 INFO:teuthology.task.workunit.client.0.err:+ wc -l a
2013-06-15T12:57:17.175 INFO:teuthology.task.workunit.client.0.out:3 a

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 21e85f90be)
2013-06-20 12:19:16 -07:00
Sage Weil
fd769c0f21 qa/workunits/cephtool/test.sh: fix and cleanup several tests
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-20 11:28:26 -07:00
Sage Weil
f420e5c614 mon: drop deprecated 'stop_cluster'
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-20 11:23:38 -07:00
Sage Weil
4977b88a7c mds: make 'mds compat rm_*compat' idempotent
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-20 11:23:11 -07:00
Sage Weil
4a038d6df5 mon: make 'log ...' command wait for commit before reply
Previously we would just dump the command argument to our local log client
and reply immediately, which could lose the message if we then restarted.
Instead, commit directly and wait before replying.

Also, log as the actual client, not as the monitor processing the message.

Fixes: #5409
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-06-20 11:11:50 -07:00
Sage Weil
5de54f6a79 a/workunits/cephtool/test.sh: --no-log-to-stderr when examining stderr
We can get random messages to stderror from socket reconnects and such;
discard those if we are looking at stderr in the test.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-20 11:04:26 -07:00
Sage Weil
d60534b8f5 mon: more fix dout use in sync_requester_abort()
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-20 09:46:42 -07:00
Sage Weil
8a4ed58e39 mon: fix raw use of *_dout in sync_requester_abort()
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-20 08:36:26 -07:00
Samuel Just
c39b99cdec FileStore: handle observers in constructor/destructor
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-19 19:53:55 -07:00
Samuel Just
cf3bc25197 FileStore: apply changes after disabling m_filestore_replica_fadvise
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit ed8b0e65bd)
2013-06-19 19:15:53 -07:00
Alexandre Maragone
8c0daafe00 ceph-disk: make list_partition behave with unusual device names
When you get device names like sdaa you do not want to mistakenly conclude that
sdaa is a partition of sda.  Use /sys/block/$device/$partition existence
instead.

Fixes: #5211
Backport: cuttlefish
Signed-off-by: Alexandre Maragone <alexandre.maragone@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-19 16:11:45 -07:00
Sage Weil
95bd048062 os/FileStore: disable fadvise on XFS
fadvise(DONTNEED) on XFS can break writeback ordering and zeroing; see

      http://oss.sgi.com/archives/xfs/2013-06/msg00066.html

If we detect XFS, turn this option off.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-06-19 10:57:13 -07:00
Sage Weil
ded0a5f449 Revert "client: fix warning"
This reverts commit 4a3127f48d.

Wrong branch.
2013-06-19 09:58:41 -07:00
Joao Eduardo Luis
5e6dc4ea21 mon: Monitor: make sure we backup a monmap during sync start
First of all, we must find a monmap to backup.  The newest version.

Secondly, we must make sure we back it up before clearing the store.

Finally, we must make sure that we don't remove said backup while
clearing the store; otherwise, we would be out of a backup monmap if the
sync happened to fail (and if the monitor happened to be killed before a
new sync had finished).

This patch makes sure these conditions are met.

Fixes: #5256 (partially)
Backport: cuttlefish

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-18 21:58:36 -07:00
Joao Eduardo Luis
6284fdce79 mon: Monitor: obtain latest monmap on sync store init
Always use the highest version amongst all the typically available
monmaps: whatever we have in memory, whatever we have under the
MonmapMonitor's store, and whatever we have backed up from a previous
sync.  This ensures we always use the newest version we came across
with.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-18 21:58:19 -07:00
Joao Eduardo Luis
af5a9861d7 mon: Monitor: don't remove 'mon_sync' when clearing the store during abort
Otherwise, we will end up losing the monmap we backed up when we started
the sync, and the monitor may be unable to start if it is killed or
crashes in-between the sync abort and finishing a new sync.

Fixes: #5256 (partially)
Backport: cuttlefish

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-18 21:56:53 -07:00
Dan Mick
257490335a AuthMonitor: auth export's status message to ss, not ds
This puts it on stderr, not stdout

Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-06-18 15:44:32 -07:00
Sage Weil
64ee0148a5 ceph.spec: create /var/run on package install
The %ghost %dir ... line will make this get cleaned up but won't install
it.

Reported-by: Derek Yarnell <derek@umiacs.umd.edu>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
2013-06-18 14:51:24 -07:00
Dan Mick
bb799e6903 test_rados.py: add some tests for mon_command
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-18 14:23:05 -07:00
Dan Mick
64b4e4a6da rados.py: wrap target in c_char_p()
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-18 14:22:57 -07:00
Dan Mick
54f74325c7 rados.py: return error strings even if ret != 0
Key rados_free() off returned length, not ret

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-18 14:22:47 -07:00
Dan Mick
81e73c7a63 ceph.in: pass parsed conffile to Rados constructor
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-18 14:22:40 -07:00
Dan Mick
2fc8d86445 ceph.in: global var dontsplit should be capitalized
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-18 14:22:07 -07:00
Sage Weil
4a3127f48d client: fix warning
signed/unsigned comparison

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-18 14:09:18 -07:00
Sage Weil
ce7b5ea7d5 common/Preforker: fix warning
common/Preforker.h: In member function ‘int Preforker::signal_exit(int)’:
warning: common/Preforker.h:82:45: ignoring return value of ‘ssize_t safe_write(int, const void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]

This is harder than it should be to fix.  :(
  http://stackoverflow.com/questions/3614691/casting-to-void-doesnt-remove-warn-unused-result-error

Whatever, I guess we can do something useful with this return value.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-06-18 10:03:03 -07:00
Sage Weil
8bd936f077 client: fix warning
client/Client.cc: In member function 'virtual void Client::ms_handle_remote_reset(Connection*)':
warning: client/Client.cc:7892:9: enumeration value 'STATE_NEW' not handled in switch [-Wswitch]
warning: client/Client.cc:7892:9: enumeration value 'STATE_OPEN' not handled in switch [-Wswitch]
warning: client/Client.cc:7892:9: enumeration value 'STATE_CLOSED' not handled in switch [-Wswitch]

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-06-18 10:02:45 -07:00
Sage Weil
df8a3e5591 client: handle reset during initial mds session open
If we get a reset during our attempt to open an MDS session, close out the
Connection* and retry to open the session, moving the waiters over.

Fixes: #5379
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-06-17 19:54:51 -07:00
Sage Weil
92997a49bf mon: fix 'osd dump <epoch>'
The optional epoch argument was missing from the command spec.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2013-06-17 16:39:30 -07:00
Sage Weil
910af074fc Merge branch 'wip-5194' into next
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
2013-06-17 15:46:47 -07:00
Sage Weil
8c6b24e903 ceph-disk: add some notes on wth we are up to
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-17 15:43:40 -07:00