Commit Graph

17397 Commits

Author SHA1 Message Date
Sage Weil
36978a6329 mon: calculate rank by addr, not name
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 16:04:10 -08:00
Sage Weil
0045c90169 monmap: assign rank by sorting addr, not name
This allows monitors to bootstrap knowing peer addrs but not their names,
as when we specify mon_host.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 16:04:10 -08:00
Yehuda Sadeh
ebe5fc60d2 obsync: tear out rgw 2011-11-22 15:06:16 -08:00
Sage Weil
3a20b425d6 mon: name self in monmap if --public-addr specified during mkfs
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 14:53:45 -08:00
Yehuda Sadeh
a859763b1c rgw: don't remove tail of lru if that's what we touch 2011-11-22 10:31:25 -08:00
Sage Weil
aeeeade6e0 mon: mark down all connections when rank changes
The election and some other stuff depend on msg->get_source().num() to get
the peer rank, and that is part of the connection state.  If it changes,
we need to close old connections and open new ones so that we aren't
taken for someone else (like mon.-1).

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 10:09:41 -08:00
Sage Weil
bed3c4723c mon: handle rank change in bootstrap
The rank can change either because we probe and get a new monmap, or
because we get one via paxos.  Move the checks to bootstrap() to catch
both cases.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 10:08:48 -08:00
Sage Weil
8b46409312 mon: pick an address when joining and existing cluster
If we are joining an existing cluster, we can pick whatever address we
want (e.g., one specified by public_addr or public_network).

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 09:53:52 -08:00
Sage Weil
5ba356b31a mon: remove unused myaddr
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 09:52:58 -08:00
Sage Weil
0c9724d6fb mon: simplify suicide when removed from map
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 09:52:52 -08:00
Samuel Just
eb8d91feaf PG: it's not necessary to call build_inc_scrub_map in build_scrub_map
Because we have called osr.flush(), it's safe to tag map.valid_through
as last_update.   We will still have to catch up once we have stopped
writes and allowed the filestore to catch up anyway.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-11-21 17:46:21 -08:00
Sage Weil
0f4b59a4f3 Merge remote branch 'gh/subnet' 2011-11-21 16:17:21 -08:00
Sage Weil
fab1e55ee7 Merge remote branch 'gh/wip-mon' 2011-11-21 16:00:34 -08:00
Tommi Virtanen
c066e92638 mds, osd, synclient: Pick cluster_addr/public_addr based on *_network.
Instead of specifying an IP address in ceph.conf like

	[global]
	cluster_addr = 10.1.2.3

you can now avoid the node-specific configuration and just say

	[global]
	cluster_network = 10.1.2.0/24

The *_network variables can also take a whitespace-separated list of
networks, to be checked in that order:

	[global]
	cluster_network = 10.1.2.0/24 192.168.42.192/26
2011-11-21 14:27:45 -08:00
Tommi Virtanen
0477f23879 common/pickaddr: Pick cluster_addr/public_addr based on *_network. 2011-11-21 14:27:45 -08:00
Tommi Virtanen
eec61b4873 common/ipaddr: Add utility function to parse ip/cidr style networks.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-11-21 14:27:45 -08:00
Tommi Virtanen
0f748d4c9e common/ipaddr: Find a configured IP address in given subnet.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-11-21 14:27:44 -08:00
Tommi Virtanen
97464bcabe msg: Move public_addr use outside ->bind() 2011-11-21 13:37:39 -08:00
Tommi Virtanen
0f9a06051c common/str_list: Make unused return value void.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-11-21 13:37:39 -08:00
Tommi Virtanen
2bae3506b6 osd: Remove unused variable. 2011-11-21 13:37:39 -08:00
Sage Weil
3c8fec2d33 osd: fix 'stop' command
Special case.  We can't join the command_tp thread from itself.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-21 13:28:36 -08:00
Sage Weil
b47347bd7c osd: protect handle_osd_map requeueing with queue lock
pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock.  Messy.  Also, useless, since
wait_for_no_ops had a single caller in shutdown() that op_wq.drain() can
do for us.

Rip it out, and track queue size under the queue lock.

Fixes: #1727
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-21 13:23:59 -08:00
Sage Weil
70dfe8e9a0 osd: lock pg when requeuing requests
The op queue is shut down, so this is mostly safe, unless someone comes
through and does requeue_ops() from a callback or something.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-21 11:15:38 -08:00
Sage Weil
811145f758 paxosservice: tolerate _active() call when not active
This can happen when multiple C_Active events are queued, and the first
does a propose_pending() (moving us into updating state).

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-21 10:33:53 -08:00
Sage Weil
88963a181a objecter: simplify map request check
We should request a missing/intervening map if it appears to exist.
Otherwise, skip it.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-21 09:19:59 -08:00
Sage Weil
cd2e523fba objecter: cancel tick event on shutdown
Hopefully this is the root cause for

2011-11-20 23:57:41.555292 7f75dd743780 ceph version 0.38-205-g3b53b72
(commit:3b53b722b34b5284e6b8a5571a08d4b7ec276241), process ceph-fuse, pid
21223
 *  Caught signal (Segmentation fault) *
    in thread 7f75d9c6e700
    ceph version 0.38-205-g3b53b72
    (commit:3b53b722b34b5284e6b8a5571a08d4b7ec276241)
    1: /tmp/cephtest/binary/usr/local/bin/ceph-fuse() [0x6993a4]
    2: (()+0xfb40) [0x7f75dd0eeb40]
    3: (PerfCounters::set(int, unsigned long)+0x2a) [0x511bca]
    4: (Objecter::tick()+0x1f3) [0x653f43]
    5: (Objecter::C_Tick::finish(int)+0x15) [0x66aef5]
    6: (SafeTimer::timer_thread()+0x4b0) [0x5825c0]
    7: (SafeTimerThread::entry()+0x15) [0x586865]
    8: (Thread::_entry_func(void)+0x12) [0x52a832]
    9: (()+0x7971) [0x7f75dd0e6971]
    10: (clone()+0x6d) [0x7f75db97592d]

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-21 09:19:40 -08:00
Sage Weil
f60702822a paxos: fix sharing of learned commits during collect/last
We can learn either an uncommitted or committed value during the
collect/last recovery phase.  For the committed values, we need to remember
each peer's first/last_committed and share only at the end to avoid a
situation like:

 - mon.1 has same last_committed as us
 - mon.2 has newer last_commited, we save it
 - mon.3 has same last_commited as mon.1, we share new value
 - done... but mon.1 never got mon.2's newer commit.

Instead, save the commit sharing until the collect process completes, so
we know that any committed value learned from anyone is shared with
everyone who needs it.

This fixes a crash like

mon/Paxos.cc: In function 'void Paxos::handle_begin(MMonPaxos*)', in thread '7fd91192c700'
mon/Paxos.cc: 400: FAILED assert(begin->last_committed == last_committed)
 ceph version 0.38-208-g9aabd39 (commit:9aabd3982cceb7e8489412b4bfbb4c2387880de2)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0x72454e]
 2: (Paxos::handle_begin(MMonPaxos*)+0x363) [0x6499ef]
 3: (Paxos::dispatch(PaxosServiceMessage*)+0x2b4) [0x64db2c]
 4: (Monitor::_ms_dispatch(Message*)+0xdc6) [0x6205c2]
 5: (Monitor::ms_dispatch(Message*)+0x3a) [0x62831a]
 6: (Messenger::ms_deliver_dispatch(Message*)+0x63) [0x7d1f31]
 7: (SimpleMessenger::dispatch_entry()+0x7c2) [0x7bb786]
 8: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x6070fa]
 9: (Thread::_entry_func(void*)+0x23) [0x6f3f69]
 10: (()+0x7971) [0x7fd9153a1971]
 11: (clone()+0x6d) [0x7fd913c3092d]

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-20 14:26:09 -08:00
Yehuda Sadeh
3b53b722b3 rgw: support alternative date formatting
being used by s3cmd
2011-11-20 13:18:26 -08:00
Sage Weil
9aabd3982c paxosservice: consolidate _active and _commit
Use the same callback for when paxos goes active and for when it commits
something.  The response in both cases is the same.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-19 14:30:31 -08:00
Sage Weil
10fed791e9 paxosservice: remove unused committed() callback
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-19 14:30:31 -08:00
Sage Weil
b521710f10 mon: mdsmon: tick() from on_active() instead of committed()
Same effect, and avoids useless committed().

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-19 14:30:31 -08:00
Sage Weil
becfce356c mon: share random osd map from update_from_paxos, not committed()
This will let us remove committed() entirely.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-19 14:30:31 -08:00
Sage Weil
9920a168c5 config: support --no-<foo> for bool options
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-19 13:56:21 -08:00
Sage Weil
1a468c7e0b config: whitespace
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-19 13:56:17 -08:00
Greg Farnum
cc5b5e17e6 osdmon: set the maps-to-keep floor to be at least epoch 0
Looks like this conditional was just set backwards by mistake. There
have been a number of issues with OSDMap versions that are probably
related to this...
(Thanks to some smarts in trim_to, we at least did not trim ALL of
our maps. But on every tick prior to epoch 500 [that's the default]
the leader was trimming all old maps off the system.)

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-11-18 16:13:29 -08:00
Samuel Just
45cf89c108 Revert "osd: simplify finalizing scrub on replica"
This reverts commit dd5087fabb.

Calling osr.flush() is not quite enough since the onreadable callbacks
may not have been called (thus, last_update_applied may still lag behind
the tail of the log).

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-11-18 15:57:00 -08:00
Samuel Just
57ad8b2ebf FileStore.cc: onreadable callbacks in OpSequencer order is enough
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-11-18 15:56:59 -08:00
Greg Farnum
dedf2c4a06 osd: error responses should trigger all requested notifications.
There's no good reason I can find to limit error code responses to
the ACK.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-11-18 09:49:57 -08:00
Greg Farnum
09c20c5129 objecter: trigger oncommit acks if the request returns an error code.
Many users only set oncommit acks, so if they get an error code
(which comes only as a CEPH_OSD_OP_ACK right now) the request
disappears into the ether.
(And remove stupid debug statements while we're at it.)

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-11-18 09:49:57 -08:00
Sage Weil
9800faeb92 paxos: do not create_pending if !active
This avoids a scenario like:

- _active()
  - proposes value
- _commit()
  - creates new pending, even though in updating state

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-18 09:49:03 -08:00
Sage Weil
fa5876870a Revert "mon: don't propose new state from update_from_paxos"
This reverts commit 66c628acc8.
2011-11-18 09:43:09 -08:00
Sage Weil
66c628acc8 mon: don't propose new state from update_from_paxos
Proposing a new state from within update_from_paxos() confuses some callers,
like PaxosService::_active().  Instead, do it in the on_active() callback.
This also let's us collapse the check_osd_map() caller into on_active(),
and makes it happen on leaders and peons alike, which ought to avoid some
of the pg creation lag we see sometimes (presumably when the osds have
sessions with peons instead of the leader).

Fixes: #1708
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-17 20:45:54 -08:00
Yehuda Sadeh
6ae0f81e17 rgw: if swift url is not set up, just use whatever client used 2011-11-17 16:55:48 -08:00
Sage Weil
ef5ca293a7 fuse: fix readdir return code
Ignore ENOSPC generated by our own callback, as it is only used to
terminate the loop.

Broken by commit cd90061239.

Fixes: #1728
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-17 15:01:25 -08:00
Sage Weil
d61ba6441b paxos: fix trimming when we skip over incrementals
Remove open-coded trimming of old states and use our method (that also
removes additional per-state files).  Fixes old stray state files.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-17 14:11:38 -08:00
Sage Weil
367ab142d7 paxos: store stashed state _and_ incrementals
Paxos::share_state() may share a stashed state and incrementals that
follow; we need to store the same.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-17 14:10:34 -08:00
Sage Weil
6bc9a544b6 mon: elector: always start election via monitor
Don't go from active -> electing without passing (monitor) go.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-17 13:53:52 -08:00
Sage Weil
685450b76b common: libraries should not log to stdout/stderr
Certainly not by default.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-17 12:07:34 -08:00
Sage Weil
f1dd56d93d objecter: set skipped_map if we skip a map
This ensures that we resend _all_ requests, since we aren't sure which
may have mapped to a different primary and then back.  This was missed in
the original implementation in 4fe9cca5dd.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-17 11:56:37 -08:00
Sage Weil
5afef0209f objecter: add is_locked() asserts
Sanity check.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-17 11:39:55 -08:00