Commit Graph

17270 Commits

Author SHA1 Message Date
Sage Weil
dc167bac78 filejournal: set last_committed_seq based on fs, not journal
last_committed_seq is the last seq committed to the fs, not the journal.
Set it when we begin replay with the fs provided value, not from the newest
entry in the journal.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-05 09:37:10 -08:00
Sage Weil
321ecdaba2 v0.39 2011-12-02 09:01:31 -08:00
Samuel Just
75aff02371 OSDMap: build_simple_from_conf pg_num should not be 0 with one osd
Previously, pg_num would end up set to 0 if osd.0 is the only osd.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-12-02 09:00:27 -08:00
Josh Durgin
363ebb6ccc librbd: report an error if rbd header does not match
This will fail on future incompatible versions of the header format.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-12-01 12:38:57 -08:00
Sage Weil
353ee0004a mds: adjust flock lock state on export
Looks like this was missed when flocklock was added.  Did a quick grep and
it doesn't look like it is missing anywhere else.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-30 09:57:58 -08:00
Samuel Just
1c696b6566 doc: Add peering state diagram
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-11-29 16:35:59 -08:00
Sage Weil
30ede648fe Makefile: ipaddr.h, pick_address.h
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-29 15:36:07 -08:00
Sage Weil
77a62fdce4 Makefile: add missing uuid.h to tarball
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-29 13:31:38 -08:00
Sage Weil
8788a404ae osd: subscribe to next map if flagged FULL
This ensures the osd finds out when we become un-full in a timely manner.

Fixes: #1755
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-29 08:28:57 -08:00
Sage Weil
c2889fef42 mds: encode truncate_pending in inode
Otherwise we don't actually journal this value, and we get confused when
we replay a start_truncate and try to restart it.

Fixes: #1756
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-28 21:39:21 -08:00
Wido den Hollander
cef16732e7 debian init: Do not stop or start daemons when installing or upgrading
Signed-off-by: Wido den Hollander <wido@widodh.nl>
2011-11-28 09:01:50 -08:00
Sage Weil
ce65722739 mon: search for local ip during mkfs
If an address isn't explicitly specified during mkfs, look for an unnamed
monitor in the (generated) monmap and see if any of those addresses is
configured on the local machine.  If so, assume it's us, and name ourselves
in the seed monmap.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-27 16:11:29 -08:00
Sage Weil
61b9db3a8e pick_address: implement have_local_addr()
Check for a local ip from within a list of addresses.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-27 16:11:28 -08:00
Sage Weil
84b0059755 monclient: name nameless monitors noname-<foo>
This makes them easy to pick out as unnamed.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-27 16:04:52 -08:00
Sage Weil
7a453402e3 pick_address: whitespace
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-27 14:50:46 -08:00
Mark Kampe
30def38d21 corrected variable (con) to be consistent with prior examples (cluster)
Signed-off-by: Mark Kampe <mark.kampe@dreamhost.com>
2011-11-23 15:56:52 -08:00
Samuel Just
934e1e5251 ReplicatedPG: Also count overlaps for snapsets on snapdirs
Previously, the overlaps for snapdirs would not be included in
cstat causing the computed total to be incorrect.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-11-23 14:07:54 -08:00
Samuel Just
97d82ed950 ReplicatedPG: Account for clone space usage in make_writeable
Previously, we accounted for clone space usage inconsistently in
write_update_size_and_usage etc when walking through the operations.
make_writeable may change the most recent clone overlap, however, so we
can't handle it until then.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-11-23 14:07:08 -08:00
Sage Weil
32a6837839 Merge branch 'wip-mon' 2011-11-23 06:45:26 -08:00
Sage Weil
ad13d0b731 ceph: fix shutdown race
Shut down MonClient before messenger, to avoid race with MonClient::tick()
and MonClient::shutdown().

Fixes

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f44475e2849 in _L_lock_953 () from /lib/libpthread.so.0
#2  0x00007f44475e266b in __pthread_mutex_lock (mutex=0x14d8dc8) at pthread_mutex_lock.c:61
#3  0x00000000005ae090 in Mutex::Lock (this=0x14d8db8, no_lockdep=false) at ./common/Mutex.h:108
#4  0x000000000068440e in MonClient::shutdown (this=0x14d8c30) at mon/MonClient.cc:386
#5  0x00000000005b2653 in ceph_tool_common_shutdown (ctx=0x14d84c0) at tools/common.cc:661
#6  0x00000000005ada29 in main (argc=7, argv=0x7fff8a2394c8) at tools/ceph.cc:304

vs

#0  0x00007f44475e8a0b in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x00000000005eff6b in reraise_fatal (signum=11) at global/signal_handler.cc:59
#2  0x00000000005f0165 in handle_fatal_signal (signum=11) at global/signal_handler.cc:106
#3  <signal handler called>
#4  0x0000000000000000 in ?? ()
#5  0x000000000068661a in MonClient::tick (this=0x14d8c30) at mon/MonClient.cc:621
#6  0x0000000000689e3b in MonClient::C_Tick::finish(int) ()
#7  0x000000000061b3c5 in SafeTimer::timer_thread (this=0x14d8df8) at common/Timer.cc:102
#8  0x000000000061c6f0 in SafeTimerThread::entry() ()
#9  0x00000000005f1219 in Thread::_entry_func (arg=0x14e1a00) at common/Thread.cc:41
#10 0x00007f44475e0971 in start_thread (arg=<value optimized out>) at pthread_create.c:304
#11 0x00007f4445ead92d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#12 0x0000000000000000 in ?? ()

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-23 06:44:58 -08:00
Tommi Virtanen
414caa7d15 common/pick_address: Fix IP address stringification.
Different sockaddr_* have the actual address (sin_addr, sin6_addr)
at different offsets, and sockaddr->sa_data just isn't enough.
inet_ntop conspires by taking a void*. I could figure out the right
offset with a switch (found->sa_family), but let's go for the
supposedly write-once-run-with-any-AF solution, getnameinfo.

Which, naturally, takes an extra length argument that is AF-specific,
and not provided anywhere nicely by getifaddrs. Huzzah!

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-11-22 20:52:16 -08:00
Sage Weil
9870e2f77b mon: pick_addresses before common_init_finish
We can't modify g_conf->public_addr after that.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 16:28:42 -08:00
Sage Weil
036ad4c73c mon: set default port if not specified...
...when looking for self in monmap during mkfs.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 16:22:07 -08:00
Sage Weil
36978a6329 mon: calculate rank by addr, not name
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 16:04:10 -08:00
Sage Weil
0045c90169 monmap: assign rank by sorting addr, not name
This allows monitors to bootstrap knowing peer addrs but not their names,
as when we specify mon_host.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 16:04:10 -08:00
Yehuda Sadeh
ebe5fc60d2 obsync: tear out rgw 2011-11-22 15:06:16 -08:00
Sage Weil
3a20b425d6 mon: name self in monmap if --public-addr specified during mkfs
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 14:53:45 -08:00
Yehuda Sadeh
a859763b1c rgw: don't remove tail of lru if that's what we touch 2011-11-22 10:31:25 -08:00
Sage Weil
aeeeade6e0 mon: mark down all connections when rank changes
The election and some other stuff depend on msg->get_source().num() to get
the peer rank, and that is part of the connection state.  If it changes,
we need to close old connections and open new ones so that we aren't
taken for someone else (like mon.-1).

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 10:09:41 -08:00
Sage Weil
bed3c4723c mon: handle rank change in bootstrap
The rank can change either because we probe and get a new monmap, or
because we get one via paxos.  Move the checks to bootstrap() to catch
both cases.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 10:08:48 -08:00
Sage Weil
8b46409312 mon: pick an address when joining and existing cluster
If we are joining an existing cluster, we can pick whatever address we
want (e.g., one specified by public_addr or public_network).

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 09:53:52 -08:00
Sage Weil
5ba356b31a mon: remove unused myaddr
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 09:52:58 -08:00
Sage Weil
0c9724d6fb mon: simplify suicide when removed from map
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-22 09:52:52 -08:00
Samuel Just
eb8d91feaf PG: it's not necessary to call build_inc_scrub_map in build_scrub_map
Because we have called osr.flush(), it's safe to tag map.valid_through
as last_update.   We will still have to catch up once we have stopped
writes and allowed the filestore to catch up anyway.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-11-21 17:46:21 -08:00
Sage Weil
0f4b59a4f3 Merge remote branch 'gh/subnet' 2011-11-21 16:17:21 -08:00
Sage Weil
fab1e55ee7 Merge remote branch 'gh/wip-mon' 2011-11-21 16:00:34 -08:00
Tommi Virtanen
c066e92638 mds, osd, synclient: Pick cluster_addr/public_addr based on *_network.
Instead of specifying an IP address in ceph.conf like

	[global]
	cluster_addr = 10.1.2.3

you can now avoid the node-specific configuration and just say

	[global]
	cluster_network = 10.1.2.0/24

The *_network variables can also take a whitespace-separated list of
networks, to be checked in that order:

	[global]
	cluster_network = 10.1.2.0/24 192.168.42.192/26
2011-11-21 14:27:45 -08:00
Tommi Virtanen
0477f23879 common/pickaddr: Pick cluster_addr/public_addr based on *_network. 2011-11-21 14:27:45 -08:00
Tommi Virtanen
eec61b4873 common/ipaddr: Add utility function to parse ip/cidr style networks.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-11-21 14:27:45 -08:00
Tommi Virtanen
0f748d4c9e common/ipaddr: Find a configured IP address in given subnet.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-11-21 14:27:44 -08:00
Tommi Virtanen
97464bcabe msg: Move public_addr use outside ->bind() 2011-11-21 13:37:39 -08:00
Tommi Virtanen
0f9a06051c common/str_list: Make unused return value void.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-11-21 13:37:39 -08:00
Tommi Virtanen
2bae3506b6 osd: Remove unused variable. 2011-11-21 13:37:39 -08:00
Sage Weil
3c8fec2d33 osd: fix 'stop' command
Special case.  We can't join the command_tp thread from itself.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-21 13:28:36 -08:00
Sage Weil
b47347bd7c osd: protect handle_osd_map requeueing with queue lock
pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock.  Messy.  Also, useless, since
wait_for_no_ops had a single caller in shutdown() that op_wq.drain() can
do for us.

Rip it out, and track queue size under the queue lock.

Fixes: #1727
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-21 13:23:59 -08:00
Sage Weil
70dfe8e9a0 osd: lock pg when requeuing requests
The op queue is shut down, so this is mostly safe, unless someone comes
through and does requeue_ops() from a callback or something.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-21 11:15:38 -08:00
Sage Weil
811145f758 paxosservice: tolerate _active() call when not active
This can happen when multiple C_Active events are queued, and the first
does a propose_pending() (moving us into updating state).

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-21 10:33:53 -08:00
Sage Weil
88963a181a objecter: simplify map request check
We should request a missing/intervening map if it appears to exist.
Otherwise, skip it.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-21 09:19:59 -08:00
Sage Weil
cd2e523fba objecter: cancel tick event on shutdown
Hopefully this is the root cause for

2011-11-20 23:57:41.555292 7f75dd743780 ceph version 0.38-205-g3b53b72
(commit:3b53b722b34b5284e6b8a5571a08d4b7ec276241), process ceph-fuse, pid
21223
 *  Caught signal (Segmentation fault) *
    in thread 7f75d9c6e700
    ceph version 0.38-205-g3b53b72
    (commit:3b53b722b34b5284e6b8a5571a08d4b7ec276241)
    1: /tmp/cephtest/binary/usr/local/bin/ceph-fuse() [0x6993a4]
    2: (()+0xfb40) [0x7f75dd0eeb40]
    3: (PerfCounters::set(int, unsigned long)+0x2a) [0x511bca]
    4: (Objecter::tick()+0x1f3) [0x653f43]
    5: (Objecter::C_Tick::finish(int)+0x15) [0x66aef5]
    6: (SafeTimer::timer_thread()+0x4b0) [0x5825c0]
    7: (SafeTimerThread::entry()+0x15) [0x586865]
    8: (Thread::_entry_func(void)+0x12) [0x52a832]
    9: (()+0x7971) [0x7f75dd0e6971]
    10: (clone()+0x6d) [0x7f75db97592d]

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-11-21 09:19:40 -08:00
Sage Weil
f60702822a paxos: fix sharing of learned commits during collect/last
We can learn either an uncommitted or committed value during the
collect/last recovery phase.  For the committed values, we need to remember
each peer's first/last_committed and share only at the end to avoid a
situation like:

 - mon.1 has same last_committed as us
 - mon.2 has newer last_commited, we save it
 - mon.3 has same last_commited as mon.1, we share new value
 - done... but mon.1 never got mon.2's newer commit.

Instead, save the commit sharing until the collect process completes, so
we know that any committed value learned from anyone is shared with
everyone who needs it.

This fixes a crash like

mon/Paxos.cc: In function 'void Paxos::handle_begin(MMonPaxos*)', in thread '7fd91192c700'
mon/Paxos.cc: 400: FAILED assert(begin->last_committed == last_committed)
 ceph version 0.38-208-g9aabd39 (commit:9aabd3982cceb7e8489412b4bfbb4c2387880de2)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0x72454e]
 2: (Paxos::handle_begin(MMonPaxos*)+0x363) [0x6499ef]
 3: (Paxos::dispatch(PaxosServiceMessage*)+0x2b4) [0x64db2c]
 4: (Monitor::_ms_dispatch(Message*)+0xdc6) [0x6205c2]
 5: (Monitor::ms_dispatch(Message*)+0x3a) [0x62831a]
 6: (Messenger::ms_deliver_dispatch(Message*)+0x63) [0x7d1f31]
 7: (SimpleMessenger::dispatch_entry()+0x7c2) [0x7bb786]
 8: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x6070fa]
 9: (Thread::_entry_func(void*)+0x23) [0x6f3f69]
 10: (()+0x7971) [0x7fd9153a1971]
 11: (clone()+0x6d) [0x7fd913c3092d]

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-20 14:26:09 -08:00