Normally we take a fresh map reference in PG::lock(). However,
_activate_committed needs to make sure the map hasn't changed significantly
before acting. In the case of #2068, the OSD map has moved forward and
the mapping has changed, but the PG hasn't processed that yet, and thus
mis-tags the MOSDPGInfo message.
Tag the message with the e epoch, and also pass down the primary's address
to send the message to the right location.
Fixes: #2068
Signed-off-by: Sage Weil <sage@newdream.net>
Clean means we have exactly the right number of replicas and recovery is
complete. Degraded means we do not have enough replicas, either because
recovery is in progress, or because acting is too small.
A consequence is that if we have a PG with len(up) == 1 but a pg_temp
mapping so that len(acting) == 2, it will be active and not clean.
Fixes: #2060
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
This still makes sure daemons don't start on boot.
When auto start was disabled it would also prevent logrotate from doing it's job.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage@newdream.net>
OSDs (src/osd/ClassHandler.cc) specifically look for libcls_*.so in
/usr/$libdir/rados-classes, so libcls_rbd.so and libcls_rgw.so need to
be shipped along with the base package.
Signed-off-by: Holger Macht <hmacht@suse.de>
Signed-off-by: Sage Weil <sage@newdream.net>
We can pause() multiple times, and we need as many unpause()s to actually
resume work.
This resolves problems where we have two actors interested in pausing a
queue, both want to stop work, and they aren't interacting/coordinating.
Signed-off-by: Sage Weil <sage@newdream.net>
Make some effort to stop work in progress, remove pid file, and exit with
informative error code.
Note that this is much simpler than the shutdown() exit path; I'm not sure
whether a complete teardown is useful. It's also difficult to maintain
and get right with everything else going on, and it's not clear that it's
worth the effort right now.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Based on http://evbergen.home.xs4all.nl/unix-signals.html.
Instead of his design, though, we write single bytes, and create a pipe per
signal we have handlers registered for.
Signed-off-by: Sage Weil <sage@newdream.net>
We can already create rados cluster handles with an existing CephContext,
but that is only useful if you are building something that has access to
ceph internals; the cct isn't exposed via the API itself.
Do so, for both teh cluster and pool handles. Add cluster handle accessor
for the C++ API too.
Fixes: #1821
Signed-off-by: Sage Weil <sage@newdream.net>
Filestore now properly fails to clone a non-existent object, which means
we should create one.
Fixes: #2062
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
These are also defined internally in ceph_fs.h, so use a guard. Annoying,
but gives us consistent naming (ceph_*/CEPH_*, not LIBCEPHFS_SETATTR_*).
Signed-off-by: Sage Weil <sage@newdream.net>
For now, until we have a better handle on the ext4 bug, and demonstrate
that it is a clear performance win with the full stack.
Signed-off-by: Sage Weil <sage@newdream.net>
Now, push progress is represented by ObjectRecoveryProgress. In
particular, rather than tracking data_subset_*ing, we track the furthest
offset before which the data will be consistent once cloning is complete.
sub_op_push now separates the pull response implementation from the
replica push implementation.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Require it for osd <-> osd and osd <-> mon communication.
This covers all the new encoding changes, except hobject_t, which is used
between the rados command line tool and the OSD for a object listing
position marker. We can't distinguish between specific types of clients,
though, and we don't want to introduce any incompatibility with other
clients, so we'll just have to make do here. :(
Signed-off-by: Sage Weil <sage@newdream.net>
A write may trigger via make_writeable the creation of a clone which
sorts before the object being written.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
If is_degraded returns true for backfill, the object may not be
in any replica's missing set. Only call start_recovery_op if
we actually started an op. This bug could cause a stuck
in backfill error.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>