RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-10 13:10:46 +00:00

Author	SHA1	Message	Date
Danny Al-Gaaf	f65307a048	ceph_mon.cc: remove twice included sys/stat.h Fix includes: remove twice included sys/stat.h Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2012-11-28 08:25:29 -08:00
Dan Mick	c99d9c3ae7	rbd: fix import from stdin, add test Make import work; do I/O in image native block size. Note: creating sparse images is not currently attempted; could scan for runs of zeros and write discontiguous chunks to image. Fixes: #3503 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-27 17:29:23 -08:00
Dan Mick	a738f44bc4	rbd: allow export to stdout, add tests Fixes: #3502 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-27 17:29:16 -08:00
Sage Weil	b54099353c	Merge branch 'next'	2012-11-27 14:29:04 -08:00
Sage Weil	533a6d042d	Merge branch 'wip-mds-next' into next OSD bits Reviewed-by: Sam Just <sam.just@inktank.com> MDS bits Reviewed-by: Sage Weil <sage@inktank.com>	2012-11-27 14:13:09 -08:00
Sage Weil	10b68b2e1a	osd: detect (some) misordered ondisk tmaps Detect a misordered ondisk tmap... if we are already decoding it. We still leave the trailing bits unchecked. Signed-off-by: Sage Weil <sage@inktank.com>	2012-11-27 14:13:00 -08:00
Sage Weil	bc77e528a0	osd: verify TMAPPUT data is sorted The MDS may try to write unsorted data; make sure it is sorted before we write it. Signed-off-by: Sage Weil <sage@inktank.com>	2012-11-27 14:13:00 -08:00
Sage Weil	29fae494d0	osd: do not ENOENT on missing key on remove The MDS may include RM ops in a tmap update for items that were already removed: after restarting and replaying the journal, it doesn't know which dentries were previously committed and which were not. No other (known) users care about the error code. Signed-off-by: Sage Weil <sage@inktank.com>	2012-11-27 14:13:00 -08:00
Sage Weil	8e91d00b52	osd: tolerate misordered TMAP updates The previous tmap implementation requires that the update stream be sorted or else it will behave erratically (by placing new keys in the map out of order). This can cause very strange failures: reads may appear to return the correct result initially, but once intervening keys are remove they will not... depending on how read is implemented on the client side. Fix this by doing the optimized updates initially, but falling back to a slow implementation if an unsorted update is detected. It is slow, but such updates are rare. Signed-off-by: Sage Weil <sage@inktank.com>	2012-11-27 14:13:00 -08:00
Sage Weil	a4439f0fcc	osd: move tmapup into a helper Signed-off-by: Sage Weil <sage@inktank.com>	2012-11-27 14:13:00 -08:00
Dan Mick	919db19426	rbd workunit: Add tests for clones across pools Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit `ece11b0ed9`)	2012-11-27 14:06:33 -08:00
Dan Mick	5164075040	rbd workunit: Make "remove_images" silent Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit `0be9b15b79`)	2012-11-27 14:06:30 -08:00
Dan Mick	ca5549e8c9	rbd workunit: Add tests for cross-pool rename Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit `a96ede944f`)	2012-11-27 14:06:27 -08:00
Dan Mick	bbd343a1d1	rbd: tests for copy with explicit/implicit pool names Validate change to not assume dest pool == src pool Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit `39180430b9`)	2012-11-27 14:06:23 -08:00
Dan Mick	e612afc2c0	rbd: fix import pool assumptions import allows specifying one image, implicitly or explicitly the "source" image, even though it's really the destination. Fix up the reassignment of 'source' to 'dest', and check for and complain about specifying two different pools or images for import. Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit `c219698149`)	2012-11-27 14:06:21 -08:00
Dan Mick	81d3830738	rbd: change destpool assumptions. Don't default destpool to srcpool; it's surprising, and not useful/helpful enough to violate the convention that "default pool is rbd" Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit `3b0c360528`)	2012-11-27 14:06:18 -08:00
Dan Mick	724cfd1b41	rbd: --size fixes * require --size/-s for both create and resize * explicitly permit create with size 0. Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit `08f47a42b5`)	2012-11-27 14:06:15 -08:00
Dan Mick	66b148e3ab	rbd: allow parsing image@snap even if --pool given Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit `e452df6dad`)	2012-11-27 14:06:12 -08:00
Sage Weil	8e9554e175	Merge remote-tracking branch 'gh/wip-mon-workloadgen' into next	2012-11-27 12:54:40 -08:00
Joao Eduardo Luis	3112cd8fbe	test: mon: run_test.sh: helper script for the mon's workloadgen Takes advantage of qa/workunits/mon/workloadgen.sh to avoid duplicating code. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2012-11-27 20:00:44 +00:00
Joao Eduardo Luis	2a681052b2	qa: workunits: mon: add workloadgen's workunit Uses test/mon/test_osd_workloadgen to generate a bunch of map changes Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2012-11-27 20:00:44 +00:00
Joao Eduardo Luis	e1820d870e	test: mon: workload generator User-space tool that interacts with the monitor, with the objective of generating a workload mimicking a set of OSDs and clients. As it is, the tool will mimic any number of OSDs, by keeping in-memory stubs that will act as independent OSDs, generating random operations that will induce map updates; the client stub, on the other hand, performs no operations besides connecting to the monitor and whatever happens between the Objecter class and the monitor (mainly keeping updated with map updates). Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2012-11-27 20:00:44 +00:00
Joao Eduardo Luis	f5029074da	messages: MLog: make ctor's uuid argument a const Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2012-11-27 20:00:44 +00:00
Joao Eduardo Luis	317777436a	mon: Monitor: use existing strict_strtol() on parse_pos_long() Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2012-11-27 20:00:44 +00:00
Joao Eduardo Luis	f7276deaff	crush: relax the order by which rules and buckets must be defined Before we only allowed buckets (say, 'root') to be defined before rules. With this patch, we allow buckets and rules to be defined by any order, although some care should be taken when creating the plain-text crush map, or the crushtool will error out when a rule uses a bucket only defined later on in the file. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2012-11-27 20:00:44 +00:00
Joao Eduardo Luis	1fcccd3ea4	crushtool: rework how verbosity works 'verbose' was a bool that would either be passed as one or zero to class CrushCompile. However, most messages would only be outputted with a verbose level > 1. This patch makes it so that multiple '-v' increase the verbosity level; i.e., -v mean verbose = 1; -v -v means verbose = 2; and so forth. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2012-11-27 20:00:44 +00:00
Sage Weil	15b4ac58b2	Merge remote-tracking branch 'gh/wip-perf' into next Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-11-27 09:29:03 -08:00
Sage Weil	60d8206286	Merge remote-tracking branch 'gh/wip-crush' into next	2012-11-27 09:28:18 -08:00
Danny Al-Gaaf	d4bc3729fd	fix syncfs handling in error case If the call to syncfs() fails, don't try to call syncfs again via syscall(). If HAVE_SYS_SYNCFS is defined, don't fall through to try syscall() with SYS_syncfs or __NR_syncfs. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2012-11-27 08:52:52 -08:00
Dan Mick	9c76ed6244	Merge branch 'wip-rbd-cmdparse' Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-26 22:03:49 -08:00
Dan Mick	ece11b0ed9	rbd workunit: Add tests for clones across pools Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-26 22:02:35 -08:00
Dan Mick	0be9b15b79	rbd workunit: Make "remove_images" silent Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-26 22:02:25 -08:00
Dan Mick	a96ede944f	rbd workunit: Add tests for cross-pool rename Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-26 22:02:18 -08:00
Dan Mick	39180430b9	rbd: tests for copy with explicit/implicit pool names Validate change to not assume dest pool == src pool Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-26 21:38:11 -08:00
Dan Mick	c219698149	rbd: fix import pool assumptions import allows specifying one image, implicitly or explicitly the "source" image, even though it's really the destination. Fix up the reassignment of 'source' to 'dest', and check for and complain about specifying two different pools or images for import. Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-26 21:38:05 -08:00
Dan Mick	3b0c360528	rbd: change destpool assumptions. Don't default destpool to srcpool; it's surprising, and not useful/helpful enough to violate the convention that "default pool is rbd" Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-26 21:37:55 -08:00
Dan Mick	08f47a42b5	rbd: --size fixes * require --size/-s for both create and resize * explicitly permit create with size 0. Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-26 21:37:47 -08:00
Dan Mick	e452df6dad	rbd: allow parsing image@snap even if --pool given Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-26 21:37:06 -08:00
Yan, Zheng	854a78669f	mds: don't add not issued caps when confirming cap receipt There is message ordering race in cephfs kernel client. We compose cap messages when i_ceph_lock is hold. But when adding messages to the output queue, the kernel releases i_ceph_lock and acquires a mutex. So it is possible that cap messages are send out of order. If the kernel client send a cap update, then send a cap release, but the two messages reach MDS out of order. The update message will re-add the released caps. This patch adds code to check if caps were actually issued when confirming cap receipt. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2012-11-26 21:14:45 -08:00
Yan, Zheng	642ffc923d	mds: check parent inode's versionlock when propagating rstats To propagate rstats to one level up, we need lock both parent inode's nestlock and versionlock. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2012-11-26 21:14:44 -08:00
Yan, Zheng	a3aad3c3c5	mds: fix anchor table update The reference count of an anchor table entry that corresponds to directory is number of anchored inodes under the directory. But when updating anchor trace for an directory inode, the code only increases/decreases its new/old ancestor anchor table entries' reference counts by one. This patch probably resolves BUG #1850. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2012-11-26 21:14:44 -08:00
Yan, Zheng	57310268a0	mds: don't expire log segment before it's fully flushed Expiring log segment before it's fully flushed may cause various issues during log replay. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2012-11-26 21:14:44 -08:00
Sage Weil	3e988d45c0	osdc/ObjectCacher: touch Objects in lru Touch the object when we touch one of it's bh's, and when we touch it during readx (possibly because it is negative). Signed-off-by: Sage Weil <sage@inktank.com>	2012-11-26 21:14:35 -08:00
Sage Weil	a41dde3de5	Merge branch 'next'	2012-11-26 21:13:57 -08:00
Sage Weil	16215d9ca8	osdc/ObjectCacher: remove unused waitfor_{rd,wr} Signed-off-by: Sage Weil <sage@inktank.com>	2012-11-26 21:13:35 -08:00
Sage Weil	011d1e79ab	osdc/ObjectCacher: do pin object during write This hopefully resolves #3431. We originally did this in `46897fd4ff`, and then reverted in `caed0e917f`. The current conundrum: - commit_set() will issue a write and queue a waiter on a tid - discard will discard all BufferHeads and unpin the object - trim will try to close and fail assert(ob->can_close()) But: - we can't wake the waiter on discard because we don't know what range(s) it is waiting for; discard needn't be the whole object. So: pin the object so it doesn't get trimmed, and unpin when we write. Adjust can_close() so that it is based on the lru pin status, and assert that pinned implies the previous conditions are all true. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>	2012-11-26 21:13:32 -08:00
Sage Weil	6efe977f3d	mon, osd: adjust msgr requires for CRUSH_TUNABLES2 feature Make this code a bit manageable for multiple features. Signed-off-by: Sage Weil <sage@inktank.com>	2012-11-26 17:15:45 -08:00
Sage Weil	0cc47ff682	crush: introduce CRUSH_TUNABLES2 feature For the chooseleaf_descend_once flag. Signed-off-by: Sage Weil <sage@inktank.com>	2012-11-26 17:15:45 -08:00
Jim Schutt	88f218181a	crush: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed Consider the CRUSH rule step chooseleaf firstn 0 type <node_type> This rule means that <n> replicas will be chosen in a manner such that each chosen leaf's branch will contain a unique instance of <node_type>. When an object is re-replicated after a leaf failure, if the CRUSH map uses a chooseleaf rule the remapped replica ends up under the <node_type> bucket that held the failed leaf. This causes uneven data distribution across the storage cluster, to the point that when all the leaves but one fail under a particular <node_type> bucket, that remaining leaf holds all the data from its failed peers. This behavior also limits the number of peers that can participate in the re-replication of the data held by the failed leaf, which increases the time required to re-replicate after a failure. For a chooseleaf CRUSH rule, the tree descent has two steps: call them the inner and outer descents. If the tree descent down to <node_type> is the outer descent, and the descent from <node_type> down to a leaf is the inner descent, the issue is that a down leaf is detected on the inner descent, so only the inner descent is retried. In order to disperse re-replicated data as widely as possible across a storage cluster after a failure, we want to retry the outer descent. So, fix up crush_choose() to allow the inner descent to return immediately on choosing a failed leaf. Wire this up as a new CRUSH tunable. Note that after this change, for a chooseleaf rule, if the primary OSD in a placement group has failed, choosing a replacement may result in one of the other OSDs in the PG colliding with the new primary. This requires that OSD's data for that PG to need moving as well. This seems unavoidable but should be relatively rare. Signed-off-by: Jim Schutt <jaschut@sandia.gov>	2012-11-26 17:15:45 -08:00
Yehuda Sadeh	0beeb47c43	rgw: document ops logging setup Fixes: #3530 Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-11-26 15:55:00 -08:00

1 2 3 4 5 ...

22589 Commits