Take the self-aliveness checks out of require_same_or_newer_map() and use
the new function for that and for require_up_osd_peer().
Signed-off-by: Greg Farnum <greg@inktank.com>
This checks both that a Message originates from an OSD, and that the OSD
is up in the given map epoch.
We use it in handle_replica_op so that we don't inadvertently add operations
from down peers, who might or might not know it.
Signed-off-by: Greg Farnum <greg@inktank.com>
This backend can be used to create one global namespace for multiple
RGW regions.
Using a CNAME DNS response the traffic is directed towards the RGW region
without using HTTP redirects.
We were not properly setting up Sessions on the local_connection for
fast_dispatch'ed Messages if the cluster_addr was set explicitly: the OSD
was not in the dispatch list at bind() time (in ceph_osd.cc), and nothing
called it later on. This issue was missed in testing because Inktank only
uses unified NICs.
That led to errors like the following:
When do ec-read, i met a bug which was occured 100%. The messages are:
2014-07-14 10:03:07.318681 7f7654f6e700 -1 osd/OSD.cc: In function
'virtual void OSD::ms_fast_dispatch(Message*)' thread 7f7654f6e700 time
2014-07-14 10:03:07.316782 osd/OSD.cc: 5019: FAILED assert(session)
ceph version 0.82-585-g79f3f67 (79f3f67491)
1: (OSD::ms_fast_dispatch(Message*)+0x286) [0x6544b6]
2: (DispatchQueue::fast_dispatch(Message*)+0x56) [0xb059d6]
3: (DispatchQueue::run_local_delivery()+0x6b) [0xb08e0b]
4: (DispatchQueue::LocalDeliveryThread::entry()+0xd) [0xa4a5fd]
5: (()+0x8182) [0x7f7665670182]
6: (clone()+0x6d) [0x7f7663a1130d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
To resolve this, we have the OSD invoke ms_handle_fast_connect() explicitly
in send_boot(). It's not really an appropriate location, but we're already
doing a bunch of messenger twiddling there, so it's acceptable for now.
Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
osd: add config for osd_max_object_name_len = 2048 (was hard-coded at 4096)
Reviewed-by: Haomai Wang <haomaiwang@gmail.com>
and the first patch was
Reviewed-by: Samuel Just <sam.just@inktank.com>
Our max object name is not limited by file name size, but by the length of
the name we can stuff in an xattr. That will vary from file system to
file system, so just make this 4096. In practice, it should be limited
via the global tunable, if it is adjusted at all.
Signed-off-by: Sage Weil <sage@redhat.com>
Make standby-replay MDSes much more careful about journal formats; both changing them and generally being aware.
Reviewed-by: Greg Farnum <greg@inktank.com>
Set a limit on the length of an attr name. The fs can only take 128
bytes, but we were not imposing any limit.
Add a test.
Reported-by: Haomai Wang <haomaiwang@gmail.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Most importantly, capture that attrs on FileStore can't be more than about
100 chars. The Linux xattrs can only be 128 chars, but we also have some
prefixing we do.
Signed-off-by: Sage Weil <sage@redhat.com>
Previously we had a hard coded limit of 4096. Objects > 3k crash the OSD
when running on ext4, although they probably work on xfs. But rgw only
generates objects a bit over 1024 bytes (maybe 1200 tops?), so let set a
more reasonable limit here. 2048 is a nice round number and should be
safe.
Add a test.
Fixes: #8174
Signed-off-by: Sage Weil <sage@redhat.com>
In the 0.82 release, standbyreplay MDS daemons would try
to reformat the jouranl if they saw an older version on
disk, where this should have only been done by the active
MDS for the rank. Depending on timing, this could cause
fatal corruption of the journal.
This change handles the following cases:
* only do reformat if not in standbyreplay (else raise EAGAIN
to keep trying til an active mds reformats it)
* if journal header goes away while in standbyreplay then raise
EAGAIN (handle rewrite happening in background)
* if journal version is greater than the max supported, suicide
Fixes: #8811
Signed-off-by: John Spray <john.spray@redhat.com>
Previously if the journal header contained invalid
write, expire or trimmed offsets, we would end up
hitting a hard-to-understand assertion much later.
Instead, raise the error right away if the fields
are identifiably bad at load time, and assert that
they're valid before persisting them.
Signed-off-by: John Spray <john.spray@redhat.com>
Previously this test assumed no pre-existing
filesystem and no MDS running. Generalize it
to nuke any existing filesystems found before
running, so that you can use it inside a vstart
cluster that had MDS>0.
Signed-off-by: John Spray <john.spray@redhat.com>
So that new MDSs in a new filesystem are guaranteed
to be up to date with anything we blacklisted
from a filesystem coming before.
Signed-off-by: John Spray <john.spray@redhat.com>
Detect leveldb, but do not let autoconf blindly link it with everything on the
planet.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Sighed-off-by: Sage Weil <sage@redhat.com>
Enable us to obtain the erasure-code-profile for a given erasure-pool.
Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
We need to return success if we get a dup command. Simply check whether
the fs is already enabled with the same pools and name.
Fixes: #8857
Signed-off-by: Sage Weil <sage@redhat.com>