Commit Graph

34614 Commits

Author SHA1 Message Date
Sage Weil
96863128e6 atomic: fix read() on i386, clean up types
Among other things, fixes #8969

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-30 14:52:06 -07:00
Sage Weil
6e6fc23c7e v0.83
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJT2AdNAAoJEH6/3V0X7TFt+5EP/iLOUg5o6aqBa/7yUNwtgZEb
 6cm1h8bWJpigP51oHGNyoeS4PnYWQ7DfvwKL/TDP8268g/o/w0DRvSnCZopHFos9
 j6Ci/VE9ag9uQNqW+NOph13k3fjf5KetzM5g/q/Ay4dKVCS2+9uABfosql9RNZa6
 ojhGKf2BtMgswfemq/0XWc49Ptimox5G/ntR+/xYm0s906q5wB1Y9Tvh2PNZo1Y1
 wL2qy9UnmonBLGIu0BIStKnub57VHCYbNqV6fl3W+Oct9f0znYPCqnRVUb2lw3Ie
 4KciilzteQCfurCCI7CQFmNEKCVpPlujiKo/q8CKIDgbwkMcUntCmW9QcmH3BzC5
 czYr695aokE+dt+MICRY+sIREY5achXynb7wnSon9JI8qrCUQ0o4fHQ1AZOio7V6
 +zcCdussqSeEXOoVUlNS5eGrzbY1kqrFcXY18WiCy6nadLapuUQTtJ5QALQyJ5TW
 8TidkkU6h0V4sworwpM6tiDLfq2UQwZ5NuP8MGz9DtOjMDbLHSE6TrRug7Irjj41
 4AKdkSWMhuXljm/rEsOi54ZGRUhq2VZ2xpnUD0WR8r/3lAP1d2UnefFlrSZaCN4z
 bNcrCncK7wre2UUyDQ/qJ+S808XPUPQaohbmb3Eg+Hr0mbkiJXYdNNyrUzL3wnkr
 E3YL+8sapzZKn9zOxNQ3
 =ujRc
 -----END PGP SIGNATURE-----

Merge tag 'v0.83'

v0.83
2014-07-29 16:23:12 -07:00
John Spray
440c820cce Merge pull request #2161 from ceph/wip-jcsp-test
Reviewed-by: Greg Farnum greg@inktank.com
2014-07-29 23:55:30 +01:00
John Spray
6bb3aeafcf mds: remove some rogue "using namespace std;"
Signed-off-by: John Spray <john.spray@redhat.com>
2014-07-29 22:32:43 +01:00
John Spray
c283ad4ba5 mds: handle replaying old format journals
To get back to the reformatting procedure that otherwise
occurs during MDLog::open, introduce an MDLog::reopen call
that MDS can use in the standbyreplay->standby transition
for the special case where the journal is old.

Fixes: #8869

Signed-off-by: John Spray <john.spray@redhat.com>
2014-07-29 22:32:43 +01:00
John Spray
07665ec4b3 mds: introduce explicit DaemonState instead of int
Signed-off-by: John Spray <john.spray@redhat.com>
2014-07-29 22:32:43 +01:00
John Spray
6b004f19da mds: refactor MDS boot
* Make boot_start private.
* Define boot stages in enum, replace int with type.
* Merge steps 0 and 1, 0 always fell through to 1.
* starting_done was only ever reached by a fall through
  from the previous step, so call it directly from there.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-07-29 22:32:18 +01:00
John Spray
6832ec041a mds: make MDS::replay_done clearer
... and add some assertions.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-07-29 22:32:05 +01:00
John Spray
e587088918 mds: remove unused purge_prealloc_ino
Signed-off-by: John Spray <john.spray@redhat.com>
2014-07-29 22:32:05 +01:00
John Spray
6be80873c3 mds: separate inode recovery queue from MDCache
Refactor to:
* have somewhere to put some logic for doing
  background recovery in future.
* trim a few lines from the oversized MDCache.cc
  whereever we can.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-07-29 22:32:05 +01:00
Sandon Van Ness
0d70989a89 python-ceph: require libcephfs.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2014-07-29 14:11:03 -07:00
Jenkins
78ff1f0a5d 0.83 2014-07-29 13:42:53 -07:00
Gregory Farnum
aa5f21cea0 Merge pull request #2159 from ceph/wip-undump
tools/cephfs: fuller header in dump/undump

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-07-29 16:40:31 -04:00
Sandon Van Ness
06c473610f Remove reference from mkcephfs.
A bit of colission from spec changes for the rhel7/ceph-common
changes and alfredo's pull request for wip-die-ceph-mkcephfs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit 1526546ddc)
2014-07-29 16:27:33 -04:00
Gregory Farnum
54330a0a09 Merge pull request #2156 from ceph/wip-upstart-nfile
upstart/ceph-osd.conf: bump nofile limit up by 10x

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-07-29 15:36:19 -04:00
Sage Weil
4045b2e837 doc/release-notes: typo
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-29 12:33:52 -07:00
Sage Weil
df1bad8f7e doc/release-notes: v0.80.5 release notes
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-29 12:23:33 -07:00
Sage Weil
9461d8e6ad Merge remote-tracking branch 'gh/next' 2014-07-29 11:16:24 -07:00
Greg Farnum
a949a55b1f Merge branch 'origin/wip-osd-leaks'
Reviewed-by: Greg Farnum <greg@inktank.com>
2014-07-29 10:34:46 -07:00
Gregory Farnum
0bd4c86238 Merge pull request #2139 from ceph/wip-journal-header
os/FileJournal: Update the journal header when closing journal

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-07-29 09:04:12 -04:00
Gregory Farnum
37eba045ec Merge pull request #2146 from ceph/wip-8932
ceph_test_rados_api_tier: do fewer writes in HitSetWrite

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-07-29 09:01:41 -04:00
Gregory Farnum
050ac87530 Merge pull request #2147 from ceph/wip-8931
osd: fix ops blocked by full cache tier dequeue

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-07-29 08:58:30 -04:00
Sage Weil
f36cffc986 unittest_crush_wrapper: fix build
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-28 17:18:56 -07:00
Dan Mick
7f913dcd52 Merge pull request #2150 from ceph/wip-libs
don't link everything with blkid, udev, and boost_threads
2014-07-28 17:06:41 -07:00
Josh Durgin
79c631668f Merge pull request #2153 from ceph/wip-fsx-overlap
librbd API fix + wip-fsx-overlap

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-28 14:30:51 -07:00
Sage Weil
7f12a0f4c6 Merge pull request #2152 from xiaoxichen/fix_ceph_df
PGMonitor: fix bug in caculating pool avail space

Reviewed-by: Sage Weil <sage@redhat.com>
2014-07-28 11:41:09 -07:00
Sandon Van Ness
1526546ddc Remove reference from mkcephfs.
A bit of colission from spec changes for the rhel7/ceph-common
changes and alfredo's pull request for wip-die-ceph-mkcephfs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
2014-07-28 10:38:41 -07:00
Xiaoxi Chen
9b03752203 Fix some style and checking issue
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2014-07-29 00:42:10 +08:00
Sage Weil
5773a374d0 upstart/ceph-osd.conf: bump nofile limit up by 10x
This should ensure that we don't hit this limit on all but the very biggest
clusters.  We seen it hit on a ~500 OSD dumpling cluster.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-28 09:27:20 -07:00
Sage Weil
cb20b99641 Merge pull request #2154 from simon3z/master
init: add systemd service files

Reviewed-by: Alfredo Deza <alfredo.deza@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-07-28 09:22:47 -07:00
John Spray
d3e5961d37 tools/cephfs: fuller header in dump/undump
There were two problems here:
 * write_pos was modified through an undump/dump cycle,
   because it was probed during recovery.
 * stream format was being forgotten.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-07-28 15:50:49 +01:00
Ilya Dryomov
e183a4d989 test_librbd_fsx: clone/flatten probabilities
Higher the clone probability to 8% and lower the probability of flatten
to 2%.  This should give us longer parent chaines (before this we would
usually have one parent and even then only for a few ops time).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-07-28 13:53:54 +04:00
Ilya Dryomov
bb095ffdbf test_librbd_fsx: randomize_parent_overlap
Truncate base images after they have been cloned from to cover more
code paths and make sure that clients look at snapshot parent_overlap
(i.e. parent_overlap of the base image at the time the snapshot was
taken) and not that of the base image (i.e. parent_overlap of the base
image as of now).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-07-28 13:53:54 +04:00
Ilya Dryomov
f6d1a920fd test_librbd_fsx: introduce rbd_image_has_parent()
A helper to check whether the image associated with the ctx has
a parent or not.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-07-28 13:53:54 +04:00
Ilya Dryomov
eb697dd9ee librbd: make rbd_get_parent_info() accept NULL out params
The C++ version of rbd_get_parent_info() allows passing NULL for parent
image name, image name and snapshot name out parameters.  Make C API do
the same both for consistency and to make it easier to check whether
the image at hand has a parent or not.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-07-28 13:53:54 +04:00
Xiaoxi Chen
04d0526718 PGMonitor: fix bug in caculating pool avail space
Currently for pools with different rules, "ceph df" cannot report
right available space for them, respectively. For detail assisment
of the bug ,pls refer to bug report #8943

This patch fix this bug and make ceph df works correctlly.

Fixes Bug #8943

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
2014-07-28 17:47:51 +08:00
Sage Weil
3695b255ae Merge pull request #2149 from yuyuyu101/wip-flush-set
Fix dup bh_write for TX state bh

Tested-by: Sage Weil <sage@redhat.com>
Reviewed-by: Haomai Wang <haomaiwang@gmail.com>

Original changeset 

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-27 19:39:34 -07:00
Sage Weil
b08470f0bf configure.ac: link libboost_thread only with json-spirit
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-27 16:58:08 -07:00
Sage Weil
9d23cc6aa6 configure: don't link blkid, udev to everything
These are already explicitly called out for libkrbd; don't need them in
LIBS.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-27 11:25:47 -07:00
Haomai Wang
de9cfcaa7d Only write bufferhead when it's dirty
The TX state bh should be skipped because the bh should be inflight. We only
need to write dirty bh. And TX and dirty state bh both should be waited until
flushed.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-07-27 13:37:49 +08:00
Josh Durgin
1c26266dbf ObjectCacher: fix bh_{add,remove} dirty_or_tx_bh accounting
tx buffers need to go on the bh_lru_rest as well, and removing erases
(not inserts) them into dirty_or_tx_bh.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-27 13:36:28 +08:00
Josh Durgin
727ac1d084 ObjectCacher: fix dirty_or_tx_bh logic in bh_set_state()
The else-if chain here was wrong. Handling dirty or tx buffers and
errors should be in independent conditions.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-07-27 13:36:19 +08:00
Haomai Wang
5283cfee5b Wait tx state buffer in flush_set
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-07-27 13:33:51 +08:00
Haomai Wang
d858fdc501 Add rbdcache max dirty object option
Librbd will calculate max dirty object according to rbd_cache_max_size, it
doesn't suitable for every case. If user set image order 24, the calculating
result is too small for reality. It will increase the overhead of trim call
which is called each read/write op.

Now we make it as option for tunning, by default this value is calculated.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-07-27 13:33:44 +08:00
Haomai Wang
b8a56685fe Reduce ObjectCacher flush overhead
Flush op in ObjectCacher will iterate the whole active object set, each
dirty object also may own several BufferHead. If the object set is large,
it will consume too much time.

Use dirty_bh instead to reduce overhead. Now only dirty BufferHead will
be checked.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2014-07-27 13:33:38 +08:00
Sage Weil
288908b331 Revert "Merge pull request #2129 from ceph/wip-librbd-oc"
This reverts commit 74b386f03e, reversing
changes made to 36265d0db0.

The dirty_or_tx list is used by flush_set, which means we can
resubmit new IOs for writes that are already in progress.  This
has a compounding effect that overwhelms the OSDs with dup IOs
and stalls out the client.

See, for example, teh failues in this run:
  /a/sage-2014-07-25_17:14:20-fs-wip-msgr-testing-basic-plana

The fix is probably pretty simple, but reverting for now to make
the tests pass.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-07-26 21:19:34 -07:00
Sage Weil
2088c267d6 Merge remote-tracking branch 'gh/next'
Conflicts:
	src/osdc/Journaler.h
2014-07-25 21:42:35 -07:00
John Spray
d3de69f8a5 mds: fix journal reformat failure in standbyreplay
In the 0.82 release, standbyreplay MDS daemons would try
to reformat the jouranl if they saw an older version on
disk, where this should have only been done by the active
MDS for the rank.  Depending on timing, this could cause
fatal corruption of the journal.

This change handles the following cases:
* only do reformat if not in standbyreplay (else raise EAGAIN
to keep trying til an active mds reformats it)
* if journal header goes away while in standbyreplay then raise
EAGAIN (handle rewrite happening in background)
* if journal version is greater than the max supported, suicide

Fixes: #8811

Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 5438500af8)
2014-07-25 15:34:09 -07:00
Sage Weil
96fb418f0e Merge pull request #2112 from ceph/wip-rbd-defaults
respect rbd_default_* parameters in /usr/bin/rbd

Reviewed-by: Sage Weil <sage@redhat.com>
2014-07-25 15:23:25 -07:00
Sage Weil
8fb761b660 osd/ReplicatedPG: requeue cache full waiters if no longer writeback
If the cache is full, we block some requests, and then we change the
cache_mode to something else (say, forward), the full waiters don't get
requeued until the cache becomes un-full.  In the meantime, however, later
requests will get processed and redirected, breaking the op ordering.

Fix this by requeueing any full waiters if we see that the cache_mode is
not writeback.

Fixes: #8931
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-25 14:50:52 -07:00