If we get heartbeat messages from old epochs from peers that are not
current, drop them and mark the connection down. Even if they are peers
we _should_ have (because we haven't gotten a notify yet to learn about
a pg we should have but don't yet) we have a newer map epoch and will learn
about them shortly, reopening the connection.
Fixes: #1107
Signed-off-by: Sage Weil <sage@newdream.net>
- share the map with the cluster addr
- use the new {note,get}_peer_epoch helpers to do it sanely
- don't share if we're booting; see 818fa33a66
Signed-off-by: Sage Weil <sage@newdream.net>
This lets OSDMap::create_simple() see g_conf.osd_pool_default_size when
creating the initial data, metadata, and rbd pools.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
* Fix content-type handling
* add vvprint and use it in Object::equals.
* support RgwStore::prefix
* more tests
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
If client tries to lock a file, has to wait, and then cancels the attempt,
the client will send an unlock request to unwind its state.
- the unlock now removes the waiting lock attempt from the wait list
- when the lock request retries and finds it is no longer on the wait
list it will fail.
Signed-off-by: Sage Weil <sage@newdream.net>
Handle extended attributes that contain NULL bytes correctly, rather
than treating everything as zero-terminated C strings.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Previously, _activate_committed would access the osdmap epoch racing
with handle_osd_map's osdmap update. This would allow a message to be
sent from a replica to the primary tagged with the same epoch as
last_warm_restart, though the event actually occured before
last_warm_restart. Thus the primary would fail to ignore the event and
transition to crashed.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
There was an old change in file_eval() that was allowing us to switch from
SYNC to MIX or EXCL while there were rdlocks, which either caused lots of
lock thrashing or could (I think) hang things up completely. This was
from ea10a672, an ancient fix for something related that appears to have
taken out the rdlocked check by accident.
In my tests (one writer, one stat-er), this took things from long stalls
(up to 20 seconds) to very responsive stats. Yay!
Fixes: #791
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
If the user didn't specify any actions, print out a usage message rather
than silently exiting.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>