Commit Graph

36056 Commits

Author SHA1 Message Date
John Spray
681a49c412 mon: forbid tier changes when in use by FS
* Removing tiers from a base pool in use by CephFS is forbidden.
* Using CephFS pools as tiers is forbidden.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-16 17:16:23 -07:00
John Spray
80441cda8c mon: prevent cache pools being used CephFS
Fixes two things:
 * EC pools are now permissible if they have a cache overlay
 * Pools are not permissible if they are a cache tier.

Fixes: #9435

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-16 17:16:23 -07:00
Somnath Roy
86a4bed673 FileStore: Race condition during object delete is fixed
There was a race condition (hence OSD crash) between lfn_unlink
and lfn_open. The reason was FDCache lookup was called without
taking index lock from lfn_open. Lookup will increase reference
count and thus Clear will not be able to delete those FDs. FDs
will be leaked. The assert within FDCache clear was hitting
because of this.

Fixes: #9480

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-09-16 15:36:06 -07:00
Loic Dachary
10b8966c8d crushtool: safeguard for missing --num-rep when --test
http://tracker.ceph.com/issues/9490 Fixes: #9490

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-16 21:18:52 +02:00
Loic Dachary
fdbfece81c Merge pull request #2497 from ceph/wip-xfs-inode64
ceph-disk: mount xfs with inode64 by default

Reviewed-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-09-16 15:12:24 +02:00
Loic Dachary
8b27997be9 mon: pool create must not always create a ruleset
The implicit creation of a ruleset when creating a pool is convenient
when nothing is specified. However, if the caller sets a ruleset name,
it should not implicitly create it but return ENOENT instead. Silently
creating a ruleset when there is a typo in the ruleset name is
confusing.

http://tracker.ceph.com/issues/9304 Fixes: #9304

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-16 11:51:59 +02:00
Loic Dachary
d5084f3f86 mon: add the get_crush_ruleset helper
By factoring a code snippet from prepare_pool_crush_ruleset with no
modification.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-16 11:27:54 +02:00
Loic Dachary
4b8c50fe2f tests: flush logs before grepping them
Otherwise the test races with the daemon writing the logs and can
sometime fail.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-16 10:36:39 +02:00
Loic Dachary
45731dbca7 os: FileStore::lfn_unlink always clears FDCache
Otherwise the FDCache will keep a file descriptor to a file that was
removed from the file system. This may create various type of errors
because the OSD checking the FDCache will assume the file that contains
information for an object exists although it does not. For instance in
the following:

      * rados put object file
      * rm file from the primary
      * repair the pg to which the object is mapped

if the FDCache is not cleared, repair will incorrectly pull a copy from
a replica and write it to the now unlinked file. Later on, it will
assume the file exists on the primary and only be partially correct :
the data can still be accessed via the file descriptor but any operation
using the path name will fail.

http://tracker.ceph.com/issues/8914 Fixes: #8914

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-16 10:28:46 +02:00
Loic Dachary
0ffc5ee53c tests: set the failure domain to OSD by default
So that tests do not need to do it to be able to use the default rbd
pool to store objects.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-16 10:28:44 +02:00
Loic Dachary
191d67cb46 tests: add get_osds() and get_pg() helpers
To get the ordered list of OSD to which an object is mapped and the name
of the corresponding PG.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-16 10:28:42 +02:00
Sage Weil
782848af59 Merge pull request #2499 from ceph/wip-9219-giant
wip-9219: subscribe to the newest osdmap when reconnecting to a monitor

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-15 17:40:28 -07:00
Greg Farnum
1b9226c723 osd: subscribe to the newest osdmap when reconnecting to a monitor
This is mostly relevant in testing clusters, but it ensures that an OSD
disconnecting from the monitor at the wrong time will still see any recent
map updates and prevent accidental loss of map injection into the OSD cluster.
Fixes: #9219

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-09-15 17:07:41 -07:00
Sage Weil
56ba341174 osdc/Objecter: fix command op cancellation race
Cancel the command op timeout event before we clear out the op from the
session struct.  This isn't strictly necessary because command_op_cancel
will "gracefully" handle the case where the tid is no longer present, but
this avoids that noise and is cleaner.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-09-15 16:45:19 -07:00
Sage Weil
baf7be9d30 osdc/Objecter: cancel timeout before clearing op->session
The C_CancelOp path assumes op->session != NULL.  Cancel that op before
we clear it.  This fixes a crash like

#0  pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:39
#1  0x00007fc82690a4b1 in RWLock::get_write (this=0x18, lockdep=<optimized out>) at ./common/RWLock.h:88
#2  0x00007fc8268f4d79 in Objecter::op_cancel (this=0x1f61830, s=0x0, tid=0, r=-110) at osdc/Objecter.cc:1850
#3  0x00007fc8268ba449 in Context::complete (this=0x1f68c20, r=<optimized out>) at ./include/Context.h:64
#4  0x00007fc8269769aa in RWTimer::timer_thread (this=0x1f61950) at common/Timer.cc:268
#5  0x00007fc82697a85d in RWTimerThread::entry (this=<optimized out>) at common/Timer.cc:200
#6  0x00007fc82651ce9a in start_thread (arg=0x7fc7e3fff700) at pthread_create.c:308

Signed-off-by: Sage Weil <sage@redhat.com>
2014-09-15 16:40:39 -07:00
Sage Weil
11496399ef ceph-disk: mount xfs with inode64 by default
We did this forever ago with mkcephfs, but ceph-disk didn't.  Note that for
modern XFS this option is obsolete, but for older kernels it was not the
default.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
2014-09-15 15:29:08 -07:00
John Spray
8c23ef0949 Merge pull request #2492 from ceph/wip-9284
#9284 - fix client RECALL handling and add health metrics

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-09-15 23:23:46 +01:00
Sage Weil
9d36d87c05 Merge pull request #2476 from ceph/wip-9307
rgw: push hash calculater deeper

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-15 15:19:07 -07:00
Josh Durgin
853ba2dfb3 Merge pull request #2493 from ceph/wip-rbd-objectcacher-hang
rbd: ObjectCacher reads can hang when reading sparse files

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-09-15 13:25:33 -07:00
Sage Weil
f2039c4e01 Merge pull request #2495 from dachary/wip-erasure-code-preload
erasure-code: preload fails if < 0

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-15 11:26:51 -07:00
Loic Dachary
ded1b303b5 erasure-code: preload fails if < 0
And not if < -1.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-15 20:21:14 +02:00
Sage Weil
0eef2d1b6f Merge pull request #2486 from jgalvez/master
init-radosgw.sysv: Support systemd for starting the gateway

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-15 09:41:45 -07:00
Loic Dachary
1941d7b60f Merge pull request #2472 from dachary/wip-9429-bench
erasure-code: fix erasure_code_benchmark goop (decode)

Reviewed-by: Janne Grunau <j@jannau.net>
2014-09-15 18:23:08 +02:00
John Spray
a140439f85 mds: limit number of caps inspected in caps_tick
This is to avoid hitting an O(caps) loop in the worst
cast scenario.  This mechanism is a little crude but
should be superceded at some point by admin socket
functionality to inspect session caps so that we
don't need to spit out this level of detail in logs.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:14 +01:00
John Spray
bf590f8a5d mds: keep per-client revoking caps list
...to avoid doing an O(caps) scan to find out
which clients are responsible for any late-revoking
caps during health checks.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:14 +01:00
John Spray
a6a0fd814b xlist: implement copy constructor
...so that I can have a std::map of them.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:14 +01:00
John Spray
fd04d5e662 mds: health metric for late releasing caps
Follow up on Yan Zheng's "mds: warn clients which
aren't revoking cap" to include a health metric
for this condition as well as the clog messages.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:14 +01:00
John Spray
05d69580b0 mon: trigger transaction on MDS health changes
I think this was previously only working as a side effect
of other MDS map changes.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:14 +01:00
John Spray
e6062b8d33 mds: add a health metric for failure to recall caps
Fixes: #9284

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:14 +01:00
John Spray
8c0f2555fe mds: add state for tracking RECALL progress
To be used later for generating health metrics
for clients which are failing to promptly service
CEPH_SESSION_RECALL_STATE messages.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:14 +01:00
John Spray
8199f80846 xlist: implement const_iterator
Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:14 +01:00
John Spray
00a002143a client: fix trim_caps for inodes in root
Previously client would fail to release caps for files
in the root directory in response to CEPH_SESSION_RECALL_STATE
messages.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:14 +01:00
John Spray
2b5bbab55c client: failure injection for cap release
Used for simulating a buggy client that trips
the error detection in #9282 (warn clients
which aren't revoking caps)

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:13 +01:00
John Spray
21f5e18ee3 client: fix potentially invalid read in trim_caps
trim_dentry can potentially free an inode, so get/put
it around the block where we use the inode's dn_set.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:13 +01:00
John Spray
9007217239 client: more precise cap trimming
Two fixes:
 * Client would unlink everything it could, instead of just
   meeting its goal, because caps.size() doesn't change until
   dentries are cleaned up later.  Take account of the trimmed
   count in the while() condition to fix that.
 * Don't count the root ino as trimmed, as although it has no
   dentries (of course), we will never give up the cap.

With this change, the client will now precisely achieve the number
of caps requested in CEPH_SESSION_RECALL_STATE messages.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:13 +01:00
John Spray
c328486f24 client: fix crash in trim_caps
In a75af4c2, procedure was added to invalidate root's dentries
if the trimming failed to free enough caps.  This would sometimes
crash because root->dir wasn't necessarily open.

Fix by only doing it if root dir is open, though I suspect this
may not be the end of it...

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-15 15:05:13 +01:00
Loic Dachary
68001fea75 Merge pull request #2485 from Abioy/master
bugfix: wrong socket address in log msg of Pipe.cc

Reviewed-by: Loic Dachary <loic-201408@dachary.org>
2014-09-15 15:40:44 +02:00
Abioy
83fd1cf84a bugfix: wrong socket address in log msg of Pipe.cc
paddr was not yet set up for the socket address

Signed-off-by: Yongyue Sun abioy.sun@gmail.com
2014-09-15 20:43:58 +08:00
Loic Dachary
92204287dc Merge pull request #2442 from dachary/wip-6754-jerasure-parameters
erasure-code: fix BlaumRoth sanity check on w

Reviewed-by: Andreas Peters <andreas.joachim.peters@cern.ch>
2014-09-15 12:24:19 +02:00
Loic Dachary
8e625a0032 Merge pull request #2488 from cernceph/docfix
doc: osd_backfill_scan_(min|max) are object counts

Reviewed-by: Loic Dachary <loic-201408@dachary.org>
2014-09-15 11:39:46 +02:00
Dan van der Ster
868b6b99fd doc: osd_backfill_scan_(min|max) are object counts
osd_backfill_scan_min and osd_backfill_scan_max set the number of
items grabbed during a single backfill scan, not an interval in
seconds. Correct the doc.

Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
2014-09-15 11:27:24 +02:00
Jason Dillaman
cdb7675a21 rbd: ObjectCacher reads can hang when reading sparse files
The pending read list was not properly flushed when empty objects
were read from a space file.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2014-09-15 00:53:50 -04:00
JuanJose 'JJ' Galvez
ddd52e87b2 init-radosgw.sysv: Support systemd for starting the gateway
When using RHEL7 the radosgw daemon needs to start under systemd.

Check for systemd running on PID 1. If it is then start
the daemon using: systemd-run -r <cmd>. pidof returns null
as it is executed too quickly, adding one second of sleep and
script reports startup correctly.

Signed-off-by: JuanJose 'JJ' Galvez <jgalvez@redhat.com>
2014-09-14 20:38:20 -07:00
Loic Dachary
d888753c0e Merge pull request #2484 from sjahl/master
doc: Added bucket management commands to ops/crush-map

Reviewed-by: Loic Dachary <loic-201408@dachary.org>
2014-09-14 17:46:04 +02:00
Stephen Jahl
d32b4286de doc: Added bucket management commands to ops/crush-map
Describes the CLI for adding and removing buckets, in addition to the
'moving' instructions which were already present.

Signed-off-by: Stephen Jahl <stephenjahl@gmail.com>
2014-09-14 10:41:16 -04:00
Sage Weil
3f0ca4668e Merge remote-tracking branch 'gh/giant' 2014-09-13 21:20:33 -07:00
Sage Weil
b285788c56 Merge pull request #2481 from sjahl/master
doc: fixes a formatting error on ops/crush-map
2014-09-13 12:46:24 -07:00
Stephen Jahl
b8a1ec08a1 doc: fixes a formatting error on ops/crush-map
Signed-off-by: Stephen Jahl <stephenjahl@gmail.com>
2014-09-13 15:31:53 -04:00
Loic Dachary
8d066732db Merge pull request #2467 from majianpeng/fix3
buffer: In rebuild_page_aligned for the last ptr is page aligned, no need call rebuild().

Reviewed-by: Loic Dachary <loic-201408@dachary.org>
2014-09-13 17:56:16 +02:00
Loic Dachary
04e40737f1 Merge pull request #2478 from ceph/wip-9445
global: fix hang when segv happens inside logging code

Reviewed-by: Loic Dachary <loic-201408@dachary.org>
2014-09-13 17:32:57 +02:00