Commit Graph

12272 Commits

Author SHA1 Message Date
Sage Weil
2cd2c56dd0 v0.24.3 2011-02-10 09:49:28 -08:00
Colin Patrick McCabe
b60444b5c1 make:add messages/MOSDRepScrub.h to NOINST_HEADERS
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-02-10 09:49:28 -08:00
Sage Weil
ec9d14c1be Merge remote branch 'origin/rep_scrub_wq' into stable 2011-02-08 16:22:01 -08:00
Sage Weil
cc525b3a3e osd: discard scrub reply if pg changed
build_scrub_map will bail out if the pg changed.  Discard the result in
that case since the primary will ignore it anyway.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-02-08 08:41:52 -08:00
Sage Weil
a948aa1180 osd: avoid map_lock for scrub_map reply
Using osd->osdmap->epoch without map_lock is dangerous.  We can avoid it
entirely by replying on the same connection as the request.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-02-08 08:41:14 -08:00
Sage Weil
36097c3ac5 osd: never rewrite log after {advance,activate}_map
pg->dirty_log is never true, so this is dead code.  And nothing in either
of those two methods updates the pg log.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-02-08 08:22:54 -08:00
Sage Weil
3055d09441 osd: always write backlog after creation
dirty_log is never set to true, so we would set the log.backlog flag but
not write it to disk.  If we restarted the OSD, we would think we had the
backlog in the log but in reality we would not.  clean_up_local() could
then erase almost every object in the PG.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-02-08 08:22:47 -08:00
Sage Weil
19afe11cc5 osd: fix no missing inferance
Add missing continue in last_update==last_complete (no missing) case.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-02-08 08:22:45 -08:00
Samuel Just
416292027d PG: remove sub_op_scrub
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-07 20:56:30 -08:00
Samuel Just
212f977f11 PG: switch _request_scrub_map to send MOSDRepScrub
Also switches sub_op_scrub_reply to sub_op_scrub_map to handle the
OSD_OP_SCRUB_MAP response.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-07 20:56:30 -08:00
Samuel Just
03c7b062d1 OSD: Adds handler for MOSDRepScrub
handle_rep_scrub enqueues the message in rep_scrub_wq.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-07 20:56:30 -08:00
Samuel Just
aed279e68f PG: added replica_scrub
Adds handler in PG for MOSDRepScrub messages.  replica_scrub will
replace sub_op_scrub.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-07 20:56:30 -08:00
Samuel Just
cb4fcfe316 OSD: Add rep_scrub_wq
Previously, replica scrubs would be handled in sub_op_scrub in the op
queue.  Replica scrubs will now be processed by rep_scrub_wq using the
disk tp.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-07 20:56:30 -08:00
Samuel Just
4cab2031dc rados: Adds CEPH_OSD_OP_SCRUB_MAP sub op
Previously, maps were requested with a sub_op and sent with a
sub_op_reply.  As maps will now be requested using a different message,
replicas will transmit scrub maps requested via MOSDRepScrub messages by
sending a sub_op of type CEPH_OSD_OP_SCRUB_MAP.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-07 20:56:01 -08:00
Samuel Just
7245b6a16e MOSDRepScrub: Adds a message for initiating a replica scrub
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-07 20:56:01 -08:00
Sage Weil
d7af21020e mon: ignore mds boot messages with zeroed port
On 0.24.2 I saw a zeroed port in the cmds log and in the mdsmap.  Ignore
anything from a cmds with a zeroed port to prevent the insanity from
spreading.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-02-06 20:49:59 -08:00
Sage Weil
5a50d339ed client: more carefully gaurd local cache truncate
This fixes an assert when len=0 in file_to_extents when we get some weird
metadata from the MDS.

Fixes: #778
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-02-06 13:56:25 -08:00
Sage Weil
e49dced7d4 signal: fix redefine warnings
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-02-03 11:54:39 -08:00
Samuel Just
400813cc41 ReplicatedPG: snap_trimmer fix leaked lock
Previous patch 7a02070b74 leaks the pg
lock.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-03 11:58:56 -08:00
Samuel Just
7a02070b74 ReplicatedPG:snap_trimmer should return if !clean or !active or !primary
The PG may become !clean or !active while in the osd snap_trim_wq.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-03 10:31:47 -08:00
Samuel Just
4587f1fe85 mount.ceph: option parsing fix
Passing -o secretfile would cause a segfault since searching for = would
result in a null pointer.  New version checks for that case.  Also, *end
cannot be a ,.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-02 14:33:54 -08:00
Samuel Just
ece4f61a8d FileStore: fix double close
curr_fd is already closed if cp == cur_seq.  This second close
occasionally ended up closing another thread's fd.  The next open would
tend to grab that fd in op_fd or current_fd which would then get closed
by the other thread leaving op_fd or current_fd pointing to some random
file (or a closed descriptor).

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-02-01 14:39:27 -08:00
Samuel Just
0f3198e8c6 OSD: update_osd_stat take heartbeat_lock
Previously update_osd_stat had a race with code modifying heartbeat_from
causing the iterator increment to occasionally segfault.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-01-28 18:24:12 -08:00
Greg Farnum
14c669c3f6 Locker: Drop loner correctly!
Our previous check for if we want to drop the loner was incorrect.
Now, it's fixed. Resolves a serious bug with inode write access.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-01-28 16:48:02 -08:00
Sage Weil
9e4325b298 mds: defer sending resolves until mdsmap.failed.empty()
There is no point sending resolves while there are still failed nodes,
since we can't complete.  We also trigger an assert if we try to send to
a failed node.  Instead just wait until failed.empty() and then start.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-01-28 12:57:32 -08:00
Sage Weil
35442744f4 osd: fix mutual exclusion for _dispatch
We want only one thread dispatching messages (either new or requeued), so
that we can preserve ordering.  Previously we weren't doing so for all
callers of do_waiters (tick() and the first in ms_dispatch()).

This fixes osd_sub_op(_reply) ordering problems that trigger the
now-famous repop queue assert.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-28 01:24:49 -08:00
Sage Weil
fbcf66906e osd: preserve ordering when ops are requeued
Requeue ops under osd_lock to preserve ordering wrt incoming messages.
Also drain the waiter queue when ms_dispatch takes the lock before calling
_dispatch(m).

Fixes: #743
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-26 10:08:30 -08:00
Sage Weil
7d65f6eabe osd: restart if the osdmap client, heartbeat, OR cluster addrs don't match
If we somehow get ourselves into a situation where the OSDMap addresses do
not match our actual addresses, restart and try again.  This is still
possible if multiple MOSDBoot messages end up in flight in the monitor,
say due to a monitor disconnect/reconnect, and we race with something that
marks us down in the map.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-26 10:08:30 -08:00
Sage Weil
47dc27a694 osd: avoid extraneous send_boot() calls
Only send_boot() on osdmap update if we are restarting.  Otherwise we can
end up with too many MOSDBoot messages in flight and the monitor may
apply an old one instead of a new one.  For example:

- cosd starts
- send_boot with address set A
- get an osdmap update
- send_boot again with address set A
- get an osdmap update.  now we're up.
- get osdmap update, now we're marked down,
- bind to address set B
- send_boot with address set B

and the monitor may apply the second MOSDBoot (with adddress set A).

This results in an online OSD using a cluster address that differs from
that in the OSDMap.  Which causes problems with peering, among other
things.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-26 10:08:29 -08:00
Samuel Just
ba998f05b7 ReplicatedPG: _rollback_to fix the just cloned condition
_rollback_to in the case that head was just cloned and that clone
includes snapid does not need to do anything.  Previously, snapid would
have to match the snap on the clone, but the condition should be that
snapid is contained within the clone's snaps set.

This bug was introduced in e189222f06

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-01-25 14:36:54 -08:00
Sage Weil
f7572de5cb v0.24.2 2011-01-24 12:53:22 -08:00
Sage Weil
4a49a87db7 msgr: make connection pipe reset atomic
Close a small and unlikely race.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-24 10:59:42 -08:00
Sage Weil
3a30eb75c4 msgr: include con in debug output
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-24 10:59:42 -08:00
Sage Weil
943fd14f79 filestore: don't wait min sync interval on explicit sync()
Also, if we do wait longer, wait on the same cond.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-24 10:59:42 -08:00
Samuel Just
785bf0fcbf ReplicatedPG: fix snap_trimmer log version bug
Previously, ctx->at_version would be the same as ctx->obs->oi.version
leading to the log entry having prior_version == version.
This bug was introduced in d1b85e06fb.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-01-21 14:30:43 -08:00
Greg Farnum
3e4a82e559 FileJournal: don't overflow the journal size.
Previously we were casting it to a uint64_t, but the left shift
occurs before the cast, so we were overflowing in some circumstances.
Split these up to prevent it.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-01-21 14:20:16 -08:00
Colin Patrick McCabe
444e930ab3 mds: respawn must unblock signals before exec
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-01-21 06:53:24 -08:00
Colin Patrick McCabe
59e8e1652a common: move signal blocking into signal.cc
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-01-21 06:53:04 -08:00
Colin Patrick McCabe
ba000d9c27 common: add signal_mask_to_str
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-01-21 06:47:05 -08:00
Sage Weil
aaed6eb3d0 msgr: always start reaper
If we didn't explicitly bind (i.e. are a client), then we don't start
the accepter.  That's fine. But the reaper thread start was also
conditional, when it shouldn't be; otherwise the client can't clean up
old Pipes (and their sockets).

Fixes: #732
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-21 10:08:26 -08:00
Sage Weil
027335afe3 monclient: fix locking
Hold lock in handle_* methods; assert lock held in all _* methods.

Fixes: #731
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-21 09:35:31 -08:00
Colin Patrick McCabe
ad8951aeeb signals: signal.cc: trim includes
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-01-20 03:44:57 -08:00
Colin Patrick McCabe
189cf33f50 common: re-install sighandlers after daemon()
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-01-20 03:44:57 -08:00
Colin Patrick McCabe
6041302efe common: move signal handler stuff into signal.cc
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-01-20 03:44:42 -08:00
Samuel Just
48ebab6d1c ReplicatedPG.cc: fix snap_trimmer object context error
Previously, snap_trimmer would get the clone object information from the
object store rather than using find_object_context.  This would cause
the cached version to not be updated with the new version in the case
that the object information got updated.  As a result, the need field of
the missing object could get a stale version inconsistent with the most
recent logged version.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-01-19 17:44:55 -08:00
Samuel Just
d1b85e06fb ReplicatedPG.cc: update coi version and prior_version to match log
Caused error where oi on clone would not get updated version when snaps
was updated.  oi.version would lag behind the missing item's need field
during recovery.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-01-19 17:44:55 -08:00
Samuel Just
e6b9731d00 ReplicatedPG.cc: fix use of potentially invalid pointer
rollback_to may not be initialized if ret != 0.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-01-19 17:44:55 -08:00
Samuel Just
4e3a4e2853 ReplicatedPG,PG,OSD: snap_trimmer should run only when the PG is clean
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-01-19 17:44:34 -08:00
Colin Patrick McCabe
35ef7bc98e signals: handle_fatal_signal: use SA_NODEFER
SA_RESETHAND | SA_NODEFER allows the "re-trigger default signal handler"
trick to work for signals other than SIGSEGV.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2011-01-19 05:14:20 -08:00
Colin Patrick McCabe
3326b753e5 signals: backtrace some more exotic fatal signals
We're not likely to see these, but if we do, we want it in the logs!

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2011-01-19 05:14:14 -08:00