build_scrub_map will bail out if the pg changed. Discard the result in
that case since the primary will ignore it anyway.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Using osd->osdmap->epoch without map_lock is dangerous. We can avoid it
entirely by replying on the same connection as the request.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
pg->dirty_log is never true, so this is dead code. And nothing in either
of those two methods updates the pg log.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
dirty_log is never set to true, so we would set the log.backlog flag but
not write it to disk. If we restarted the OSD, we would think we had the
backlog in the log but in reality we would not. clean_up_local() could
then erase almost every object in the PG.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Previously, replica scrubs would be handled in sub_op_scrub in the op
queue. Replica scrubs will now be processed by rep_scrub_wq using the
disk tp.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Previously, maps were requested with a sub_op and sent with a
sub_op_reply. As maps will now be requested using a different message,
replicas will transmit scrub maps requested via MOSDRepScrub messages by
sending a sub_op of type CEPH_OSD_OP_SCRUB_MAP.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
On 0.24.2 I saw a zeroed port in the cmds log and in the mdsmap. Ignore
anything from a cmds with a zeroed port to prevent the insanity from
spreading.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
This fixes an assert when len=0 in file_to_extents when we get some weird
metadata from the MDS.
Fixes: #778
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Passing -o secretfile would cause a segfault since searching for = would
result in a null pointer. New version checks for that case. Also, *end
cannot be a ,.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
curr_fd is already closed if cp == cur_seq. This second close
occasionally ended up closing another thread's fd. The next open would
tend to grab that fd in op_fd or current_fd which would then get closed
by the other thread leaving op_fd or current_fd pointing to some random
file (or a closed descriptor).
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Previously update_osd_stat had a race with code modifying heartbeat_from
causing the iterator increment to occasionally segfault.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Our previous check for if we want to drop the loner was incorrect.
Now, it's fixed. Resolves a serious bug with inode write access.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
There is no point sending resolves while there are still failed nodes,
since we can't complete. We also trigger an assert if we try to send to
a failed node. Instead just wait until failed.empty() and then start.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We want only one thread dispatching messages (either new or requeued), so
that we can preserve ordering. Previously we weren't doing so for all
callers of do_waiters (tick() and the first in ms_dispatch()).
This fixes osd_sub_op(_reply) ordering problems that trigger the
now-famous repop queue assert.
Signed-off-by: Sage Weil <sage@newdream.net>
Requeue ops under osd_lock to preserve ordering wrt incoming messages.
Also drain the waiter queue when ms_dispatch takes the lock before calling
_dispatch(m).
Fixes: #743
Signed-off-by: Sage Weil <sage@newdream.net>
If we somehow get ourselves into a situation where the OSDMap addresses do
not match our actual addresses, restart and try again. This is still
possible if multiple MOSDBoot messages end up in flight in the monitor,
say due to a monitor disconnect/reconnect, and we race with something that
marks us down in the map.
Signed-off-by: Sage Weil <sage@newdream.net>
Only send_boot() on osdmap update if we are restarting. Otherwise we can
end up with too many MOSDBoot messages in flight and the monitor may
apply an old one instead of a new one. For example:
- cosd starts
- send_boot with address set A
- get an osdmap update
- send_boot again with address set A
- get an osdmap update. now we're up.
- get osdmap update, now we're marked down,
- bind to address set B
- send_boot with address set B
and the monitor may apply the second MOSDBoot (with adddress set A).
This results in an online OSD using a cluster address that differs from
that in the OSDMap. Which causes problems with peering, among other
things.
Signed-off-by: Sage Weil <sage@newdream.net>
_rollback_to in the case that head was just cloned and that clone
includes snapid does not need to do anything. Previously, snapid would
have to match the snap on the clone, but the condition should be that
snapid is contained within the clone's snaps set.
This bug was introduced in e189222f06
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Previously, ctx->at_version would be the same as ctx->obs->oi.version
leading to the log entry having prior_version == version.
This bug was introduced in d1b85e06fb.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Previously we were casting it to a uint64_t, but the left shift
occurs before the cast, so we were overflowing in some circumstances.
Split these up to prevent it.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
If we didn't explicitly bind (i.e. are a client), then we don't start
the accepter. That's fine. But the reaper thread start was also
conditional, when it shouldn't be; otherwise the client can't clean up
old Pipes (and their sockets).
Fixes: #732
Signed-off-by: Sage Weil <sage@newdream.net>
Previously, snap_trimmer would get the clone object information from the
object store rather than using find_object_context. This would cause
the cached version to not be updated with the new version in the case
that the object information got updated. As a result, the need field of
the missing object could get a stale version inconsistent with the most
recent logged version.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Caused error where oi on clone would not get updated version when snaps
was updated. oi.version would lag behind the missing item's need field
during recovery.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
SA_RESETHAND | SA_NODEFER allows the "re-trigger default signal handler"
trick to work for signals other than SIGSEGV.
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>