Commit Graph

19412 Commits

Author SHA1 Message Date
Sage Weil
5922e2c2b8 crush: pass weight vector size to map function
Pass the size of the weight vector into crush_do_rule() to ensure that we
don't access values past the end.  This can happen if the caller misbehaves
and passes a weight vector that is smaller than max_devices.

Currently the monitor tries to prevent that from happening, but this will
gracefully tolerate previous bad osdmaps that got into this state.  It's
also a bit more defensive.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-14 11:24:32 -07:00
Sage Weil
376f0d509b crush: adjust max_devices appropriately in insert_item()
If we insert a new item, make sure max_devices is still the max id + 1.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-14 11:24:31 -07:00
Sage Weil
320d1ebf9e mon: fail 'osd crush set ...' is osd doesn't exist
If an osd doesn't exist, don't let users add/update it in the crush map.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-14 11:24:28 -07:00
Sage Weil
bb74b8b3aa osdmap: filter out nonexistent osds from map
It is possible that the crush map contains device ids that do not exist as
osds.  Filter them out of the CRUSH result.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-12 14:18:27 -07:00
Sage Weil
7ce157d60f utime_t: no double ctor
error: os/FileJournal.h:48:51: call of overloaded ‘utime_t(int)’ is ambiguous

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-10 10:09:30 -07:00
Sage Weil
90fb40303f objectcacher: make *_max_dirty_age tunables; pass to ctor
This replaces the hard-coded 1 second writeback timer.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-08 16:19:51 -07:00
Sage Weil
82a3600378 librbd: set cache defaults to 32/24/16 mb
Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-08 16:08:22 -07:00
Sage Weil
f9a9888015 Merge branch 'wip-rbd-wt'
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-05-08 16:04:12 -07:00
Sage Weil
d96bf6c95a test_filestore_workloadgen: name the Mutex variable
This is for interpreting lockdep reports.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-07 21:42:51 -07:00
Sage Weil
5c3e985c23 Merge remote branch 'gh/wip-wrkldgen-throughput' 2012-05-07 21:41:46 -07:00
Joao Eduardo Luis
8bacc51b83 workloadgen: time tracking using ceph's utime_t's instead of timevals.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-05-07 19:54:32 -07:00
Joao Eduardo Luis
772276cb30 workloadgen: forcing the user to specify a data and journal.
These default arguments, although handy when we just want to run the test,
just mess things up when we don't actually need them. If we don't specify
them on the CLI, we'll end up using the default ones, and that is just
annoying.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-05-07 19:54:32 -07:00
Joao Eduardo Luis
f2a2a6eb4b workloadgen: add option to specify the max number of in-flight txs.
Use '--test-max-in-flight VAL' (default: 50) or check '--help' for more.
Also, allow the test to work even if we don't specify a conf file.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2012-05-07 19:54:32 -07:00
Joao Eduardo Luis
662729f693 workloadgen: Allow finer control over what the generator does.
Allow the user to have more control on:
    - the sizes of the data being written by the operations;
    - which operations are suppressed from execution;
    - view the throughput;
    - specify the periodicity of throughput output.

For the CLI options, '--help' should suffice.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-05-07 19:54:32 -07:00
Sage Weil
ac903210d0 Merge branch 'wip-rgw-bench'
Conflicts:
	debian/rules
2012-05-07 15:57:31 -07:00
Sage Weil
6c2c883c17 libs3: trailing / does strange things to EXTRA_DIST
drwxr-xr-x 1031/1031         0 2012-05-07 11:15 ceph-0.46/src/libs3/inc/
drwxr-xr-x 1031/1031         0 2012-05-04 15:28 ceph-0.46/src/libs3/inc/inc/
-rw-r--r-- 1031/1031      2343 2012-05-04 15:28 ceph-0.46/src/libs3/inc/inc/simplexml.h

etc.  Freaking autotools!

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-07 11:16:43 -07:00
Sage Weil
efc0701cf9 Merge remote-tracking branch 'gh/wip-osd-peering'
Reviewed-by: Sam Just <sam.just@inktank.com>
2012-05-07 09:25:12 -07:00
Sage Weil
e20fbac855 Makefile: drop librgw.so unittests
Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-06 14:52:25 -07:00
Sage Weil
99ee622e67 ceph.spec: kill librgw
Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-06 14:50:39 -07:00
Sage Weil
caab859b6d debian: kill librgw.so
Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-06 14:50:30 -07:00
Sage Weil
17114f266a osd: reset last_peering_interval on replica activate
There was a silent bug in the activate 'acks' that go from the replica back
to the primary.  Prior to 86aa07d7a91ac23074e76551c3a6db3a5736cffa, we
were passing same_interval_since to the callback, which mean that
sometimes _activate_committed() would ignore it and we wouldn't update
last_epoch_started.  This was mosty invisible; the next peering event would
just, in some cases, look at more past intervals than it needed to.

In 86aa07d7a91ac23074e76551c3a6db3a5736cffa we fixed this so that the check
is correct.  (We noticed because now we aren't setting the pg CLEAN flag
until after last_epoch_started is updated.)  That, in turn, revealed a
similar bug that we're fixing here: the replica's last_peering_reset could
be lower than the primary's, such that the activate 'ack' info is ignored.

To fix this, simply set last_peering_reset to the current epoch when the
replica activates; this will always be greater than the primary's.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-06 14:18:22 -07:00
Sage Weil
f4befb3978 libs3: dist and distdir make targets
Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-06 13:23:14 -07:00
Sage Weil
a46cc71948 Makefile: include libs3/ contents in dist tarball
Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-06 13:22:40 -07:00
Sage Weil
e2ee1973f9 Makefile: osdc/Journaler is only used by the mds
Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-06 12:53:15 -07:00
Sage Weil
2e7251e7fe Makefile: librgw.la -> librgw.a; and use it
The various rgw tools were all recompiling my_libradosgw_src files over
again.  Instead build a single .a (not .la!) and link that in.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-06 12:48:30 -07:00
Sage Weil
aa782b4671 Makefile: libos.la -> libos.a
There is a -laio associated with this, so use a var instead of referring to
it by name.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-06 09:32:46 -07:00
Sage Weil
938f4ac4c0 Makefile: libosd.la -> libosd.a
Faster build.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-06 09:23:25 -07:00
Sage Weil
d96e084007 Makefile: libmon.la -> libmon.a
Builds >2x as fast.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-06 09:22:24 -07:00
Sage Weil
7dbcc1c8d8 libs3: added 'make check' target
Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-06 08:34:01 -07:00
Sage Weil
827d222aba debian: build-depend on libxml2-dev
Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-06 08:31:11 -07:00
Sage Weil
385142305a objectcacher: make cache sizes explicit
Make ObjectCacher users specify the cache size for each ObjectCacher
instances.  This avoids the confusing config namespace for the object
cache (client_oc_*), and also will make it possible to eventually have
cache sizes that vary between (say) RBD images.

- drop unused client_oc_max_sync_write
- add rbd_cache_max_size, max_dirty, target_dirty config values (these are
  the defaults for each image)

We probably want to add librbd calls to specify the cache size on a
per-image basis?  Alternatively, we should make it possible to share a
cache pool between multiple images in some explicit way.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-05 16:32:22 -07:00
Sage Weil
b5e9995f59 objectcacher: delete unused onfinish from flush_set
Once upon a time the caller would do this, but none of those have survived,
and this makes more sense.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 16:32:22 -07:00
Sage Weil
6f3221a9f5 objectcacher: explicit write-thru mode
If the max_dirty config is 0, switch to write-thru mode, which will
explicitly flush and wait on the range we just dirtied.

Closes: #2335
Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-05 16:32:21 -07:00
Sage Weil
c19f998a8a common: add C_Cond
Similar to C_SafeCond, but assume finisher already holds the relevant lock.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-05 16:32:21 -07:00
Sage Weil
38edd3bb07 objectcacher: user helper to get starting point in buffer map
A common pattern is to search for the first buffer intersecting or
following an object offset.  Use a helper for that.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-05 16:31:57 -07:00
Sage Weil
c8bd471b59 objectcacher: flush range, set
Add ability to flush a range of an object, or a vector of ObjectExtents.  Flush
any buffers that intersect the specified range, or the entire object if len==0.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-05 16:25:26 -07:00
Sage Weil
b50a4c92e6 mon: add safety checks for 'mds rm <gid>' command
- make sure the gid exists
- only remove it if it's inactive (state < 0)

Fixes: #2188
Signed-off-by: Sage Weil <sage@inktank.com>
2012-05-05 13:21:37 -07:00
Sage Weil
8ec476e526 osd: do not mark pg clean until active is durable
Do not mark a PG CLEAN or set last_epoch_clean until after the PG activate
is stable on all replicas.

This effectively means that last_epoch_clean will never fall in an interval
that follows last_epoch_started's interval.  It *can* be >
last_epoch_started when it falls within the same interval.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 13:08:09 -07:00
Sage Weil
86aa07d7a9 osd: check against last_peering_reset in _activate_committed
We are checking against last_peering_reset in _activate_committed(), so we
need to pass in that value to compare against; last_peering_reset may be
greater than same_interval_since, e.g. on a replica that learns about the
PG after the initial creation epoch.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 13:08:09 -07:00
Sage Weil
9d7ec04b69 osd: tweak slow request warnings
- always include 'slow request' in the warning string
- only summarize if we warn about anything (they all may have backed off)
- be more concise

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 13:08:04 -07:00
Sage Weil
a4b42fc3ce keyring: clean up error output
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 10:13:41 -07:00
Sage Weil
ae0ca7be3c keyring: catch key decode errors
Return EINVAL on decoding errors.

Other decode_base64() callers are already guarded.

Fixes: #2124
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 10:13:28 -07:00
Sage Weil
6812309edf debian: depend on uuid-runtime
We use uuidgen for osd creation.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 10:03:56 -07:00
Sage Weil
3509b039a2 safe_io: int -> ssize_t
int is 32-bit on 64-bit archs, but ssize_t is 64-bits.  This fixes overflow
when reading large (>2GB) extends.

Fixes: #2275
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-05-05 10:01:44 -07:00
Sage Weil
203a7d67aa objectcacher: wait directly from writex()
This gives us access to the original ObjectExtent (useful later), and
simplifies the callers.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-04 20:31:01 -07:00
Sage Weil
991c93ed27 mon: fix call to get_uuid() on non-existant osd
Didn't catch this with vstart.sh testing.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-05-04 16:02:00 -07:00
Yehuda Sadeh
150adcceb6 debian: add rules for rest-bench
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:55:34 -07:00
Yehuda Sadeh
53f642e29a rest-bench: build conditionally
added configure --with-rest-bench, and configure --with-system-libs3

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:55:25 -07:00
Yehuda Sadeh
f60444fbeb obj_bencher: changed interface
Passing bufferlist and not const bufferlist in aio_write(). We assign
it to another object which is not const, and it doesn't work too
well.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:27 -07:00
Yehuda Sadeh
d54ef1c805 rest-bench: change thread context for libs3 calls
Apparently S3_put_object() and S3_get_object() need to
run on the same thread as S3_runall_request_context() (at least
per context). So We now call them in the workqueue thread.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-05-04 15:53:27 -07:00