Showing the current state and saying it is stuck doesn't tell you how it
is stuck (e.g. stuck unclean, stuck inactive, etc.). Also include the
stuck duration.
Fixes: #2876
Signed-off-by: Sage Weil <sage@inktank.com>
This is a fallback for when a user wishes to delete ALL benchmark files
matching a particular prefix. In the fast case, a metadata file tells us
enough to quickly delete the files in parallel. This is the slow case,
where each file's name must be checked against the prefix.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
This intelligently removes objects from a rados or rest benchmark run by
using parameters from the metadata file.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Store metadata for each benchmark run so that the objects can be
efficiently removed at a later point.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Per #2477, objects created during rados or rest write benchmark are
automatically cleaned up after the test. They can optionally be left in
place.
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
Lossless peers (osd<->osd, mds<->mds, mon<->mon) never reset sessions
to each other. In the osd and mds cases, there is no need to check for
session resets. More significantly, these checks can trigger with an
unfortunately sequence of socket failures. In particular,
- A sends connect request to B
- B accepts, increments connect_seq, then has a socket failure
before telling A
- A reconnects, stil with connect_seq == 0
- B sees connect_seq == 0 and thinks there was a reset
This warrants a closer look in the fs client <-> mds case, but for now,
in the cluster-internal communications, it is moot, since reset
detection is unnecessary.
In the monitor case: we do need to check with resets because the peers
reuse the same entity_addr_t's (nonce==0), which means that a daemon
restart is effectively a reset. In that case, use a different policy
that continues to check for resets.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
We currently prefer up osds, and then pull sequentially from peer_info
(strays we know about at the time). This adds an additional preference
for the current acting, which means we can avoid changes to acting when
they are largely useless.
In particular, I observed that we chose [5,3] and later (when recovery
completed) chose [5,1] because we had since heard about an eligible stray
on 1. That switch was basically a waste...
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
This properly destroys objects. In the process, remove usage_exit();
also kill error-handling in set_conf_param (never relevant for rbd.cc,
and if you call it with both pointers NULL, well...)
Also switch to EXIT_FAILURE for consistency.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Fixes: #2948
We are using a hash_map<> to map tids to Op*'s. In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order. These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.
Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.
Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map(). This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests. A simpler but slower fix would be to just use map<> instead.
This is one of many bugs contributing to #2947.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
We are using a hash_map<> to map tids to Op*'s. In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order. These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.
Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.
Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map(). This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests. A simpler but slower fix would be to just use map<> instead.
This is one of many bugs contributing to #2947.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
rgw_gc_max_objs: num of objects to used for gc shards
rgw_gc_obj_min_wait: min time for an object to become visible to gc
rgw_gc_processor_max_time: max time a for a single gc processor cycle
rgw_gc_processor_period: period between processors start
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
librados namespace was not specified, hence required including
source files to add using namespace. This fixes it.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Generate a simple "signed" report of the current cluster status. Include
a simple crc so that the report is vaguely verifiable.
This is part of #2829.
Signed-off-by: Sage Weil <sage@inktank.com>
We are calling requeue_ops() on each individual op, which means we need
to requeue in reverse order (newest first, oldest last).
Fixes: #2947
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
If we contract to 1 monitor, we win_standalone_election() without bumping
the election epoch. Racing paxos updates can then reach us without being
ignored and trigger an assert:
mon/Paxos.cc: In function 'void Paxos::handle_accept(MMonPaxos*)' thread 7f85eae05700 time 2012-08-20 16:01:00.843937
mon/Paxos.cc: 468: FAILED assert(state == STATE_UPDATING)
Fixes: #3003
Reported-by: John Wilkins <john.wilkins@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Given a ceph.conf that looks like
[osd.42]
host = localhost
mkcephfs used to exit with an obscure error message:
cat: /tmp/mkcephfs.MCBIHvn4Ru/key.*: No such file or directory
"localhost" was never intended to be a valid hostname to use there.
Warn if we see it, and skip the entry. You should use the proper short
hostname of the box.
As init-ceph and mkcephfs share this library, this change affects the
sysvinit scripts too. The behavior *shouldn't* change there (localhost
entries were ignored earlier, too), but you may see this extra
warning. Which is good.
Closes: #3001
Signed-off-by: Tommi Virtanen <tv@inktank.com>