Commit Graph

23585 Commits

Author SHA1 Message Date
Sage Weil
c549a0cf6f common/PrioritizedQueue: buckets -> tokens
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:41 -08:00
Sage Weil
128fcfcac7 note puller's max chunk in pull requests
this lets us calculate a cost value
2013-01-22 14:47:40 -08:00
Sage Weil
b685f727d4 osd: add OpRequest flag point when commit is sent
With writeahead journaling in particular, we can get requests that
stay in the queue for a long time even after the commit is sent to the
client while we are waiting for the transaction to apply to the fs.
Instead of showing up as 'waiting for subops', make it clear that the
client has gotten its reply and it is local state that is slow.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:40 -08:00
Sage Weil
a1bf8220e5 osd: set PULL subop cost to size of requested data
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:40 -08:00
Sage Weil
e8e0da1a57 osd: use Message::get_cost() function for queueing
The data payload is a decent proxy for cost in most cases, but not all.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:40 -08:00
Sage Weil
bec96a234c osd: debug msg prio, cost, latency
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:40 -08:00
Sage Weil
40654d6d53 filestore: filestore_queue_max_ops 500 -> 50
Having a deep queue limits the effectiveness of the priority queues
above by adding additional latency.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:39 -08:00
Sage Weil
1233e86170 osd: target transaction size 300 -> 30
Small transactions make pg removal nicer to the op queue.  It also slows
down PG deletion a bit, which may exacerbate the PG resurrection case
until #3884 is addressed.

At least on user reported this fixed an osd that kept failing due to
an internal heartbeat failure.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:39 -08:00
Sage Weil
44dca5c8c5 filestore: disable extra committing queue allowance
The motivation here is if there is a problem draining the op queue
during a sync.  For XFS and ext4, this isn't generally a problem: you
can continue to make writes while a syncfs(2) is in progress.  There
are currently some possible implementation issues with btrfs, but we
have not demonstrated them recently.

Meanwhile, this can cause queue length spikes that screw up latency.
During a commit, we allow too much into the queue (say, recovery
operations).  After the sync finishes, we have to drain it out before
we can queue new work (say, a higher priority client request).  Having
a deep queue below the point where priorities order work limits the
value of the priority queue.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:39 -08:00
Sage Weil
cfe4b85193 os/FileStore: allow filestore_queue_max_{ops,bytes} to be adjusted at runtime
The 'committing' ones too.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:39 -08:00
Sage Weil
101955a6b8 osd: make osd_max_backfills dynamically adjustable
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:38 -08:00
Sage Weil
9230c863b3 osd: make OSD a config observer
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 14:47:38 -08:00
Sam Lang
6401abf8d0 qa/workunit: Add iozone test script for sync
The iozone-sync.sh script runs iozone testing
various sync flags, O_SYNC, O_DSYNC, O_RSYNC.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-01-22 15:05:55 -06:00
Sam Lang
72147fd3a1 objectcacher: Remove commit_set, use flush_set
commit_set() and flush_set() are identical in functionality,
so use flush_set everywhere and remove commit_set from
the code.

Also fixes a bug in flush_set where the finisher context was
getting freed twice if no objects needed to be flushed.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-01-22 15:05:06 -06:00
Joe Buck
00b1186922 testing: add workunit to run hadoop internal tests.
This workunit runs the internal tests for our local branch of hadoop-common.
Requires ant be installed on the host running the test.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
2013-01-22 12:43:37 -08:00
Sage Weil
4a871b559d Merge branch 'wip-config'
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-01-22 10:25:37 -08:00
Sage Weil
359d0e98c1 config: report on log level changes
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 10:24:40 -08:00
Sage Weil
c5e095177c config: clean up output
Report a simple list of key='value', without extra verbosity.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-22 10:24:37 -08:00
Sage Weil
d7d8192283 config: don't make noise about 'internal_safe_to_start_threads'
This is set on start, and subsequently gets into the changed set.
Once any other config value is injected, it is the first thing reported
by the logs, but is confusing and useless to the user.  Hide it.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-21 08:45:10 -08:00
Sage Weil
3399860de2 Merge remote-tracking branch 'gh/next' 2013-01-21 08:22:36 -08:00
Greg Farnum
2e39dd5e6f mds: fix default_file_layout constructor
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-21 08:21:20 -08:00
Greg Farnum
e461f0966a mds: fix byte_range_t ctor
I do not think we saw any bugs from this, but anything that involved
capability issues on restart or migrate might have been caused by
this.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-01-21 08:21:13 -08:00
Sage Weil
17160843d0 osd: calculate initial PG mapping from PG's osdmap
The initial values of up/acting need to be based on the PG's osdmap, not
the OSD's latest.  This can cause various confusion in
pg_interval_t::check_new_interval() when calling OSDMap methods due to the
up/acting OSDs not existing yet (for example).

Fixes: #3879
Reported-by: Jens Kristian S?gaard <jens@mermaidconsulting.dk>
Tested-by: Jens Kristian S?gaard <jens@mermaidconsulting.dk>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-01-20 16:11:10 -08:00
Dan Mick
2491f976e4 workunits/cephtool: add tests for ceph osd pool set/get
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-01-18 22:36:31 -08:00
Sage Weil
ea9628fba4 Merge remote-tracking branch 'gh/next' 2013-01-18 20:57:40 -08:00
Travis Rhoden
48308954cb Clarify journal size based on filestore max sync
The docs had the recommended journal size based on the option
"filestore min sync interval" when it should have been
"filestore max sync interval".

While in there, fix a couple of typos -- multiple when it should
be multiply, and a missing word.  Change "Should at least twice"
to "Should be at least twice..."

Signed-off-by: Travis Rhoden <trhoden@gmail.com>
2013-01-18 22:26:07 -05:00
Dan Mick
aea898db2b ceph: reject negative weights at ceph osd <n> reweight
Check the integer (fixed-point) value to avoid any worries
about floating-point rounding.  Add tests for reweight < 0.

Fixes: #3872
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
2013-01-18 18:32:21 -08:00
Dan Mick
7d9d7651be workunit/cephtool: Use '! cmd' when expecting failure
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-01-18 18:32:13 -08:00
Samuel Just
0cb760f31b OSD: do deep_scrub for repair
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-01-18 16:01:52 -08:00
Sage Weil
684a8f8f84 Merge branch 'wip-pg-removal'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-01-18 15:45:03 -08:00
Sage Weil
5e00af406b osd: set pg removal transactions based on configurable
Use the osd_target_transaction_size knob, and gracefully tolerate bogus
values (e.g., <= 0).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-18 15:44:41 -08:00
Sage Weil
4712e984d3 osd: make pg removal thread more friendly
For a large PG these are saturating the filestore and journal queues.  Do
them synchronously to make them more friendly.  They don't need to be fast.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-18 15:44:41 -08:00
Sage Weil
bc994045ad os: move apply_transactions() sync wrapper into ObjectStore
This has nothing to do with the backend implementation.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-18 15:44:41 -08:00
Sage Weil
f6c69c3f1a os: add apply_transaction() variant that takes a sequencer
Also, move the convenience wrappers into the interface and funnel through
a single implementation.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-18 15:44:41 -08:00
Sam Lang
4bdcfbffa0 client: Respect O_SYNC, O_DSYNC, and O_RSYNC
If the file is opened with O_SYNC, O_DSYNC, or O_RSYNC, we need to
flush cached data (and metadata for O_SYNC) on a write.
For O_RSYNC, we need to flush dirty data on a read.
This patch adds a file_flush() call to the objectCacher
to allow a specific range to be flushed from the cache, and
in the O_SYNC,O_DSYNC case for write and O_RSYNC case for read,
calls that function waiting for the flush to complete.  The patch
also adds a flags field directly to the file handle struct, and
replaces the append boolean with the use of the flags field directly.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
2013-01-18 15:43:33 -06:00
Sage Weil
b4e0f7ca72 Merge remote-tracking branch 'gh/wip-client-pool-api'
Reviewed-by: Sage Weil <sage@inktank.com>
2013-01-18 13:31:15 -08:00
Josh Durgin
045af95908 qa: remove xfstest 068 from qemu testing
This tests fsfreeze, which sometimes hangs in xfs in linux 3.2

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-01-18 12:21:19 -08:00
Dan Mick
1f911fd061 ceph: allow osd pool get to get everything you can set
osd pool get was missing size, min_size, crash_replay_interval,
and crush_ruleset; they're all easily added.

Fixes: #3869
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-01-18 12:42:16 -08:00
Sage Weil
49726dcf97 os/FileStore: only flush inline if write is sufficiently large
Honor filestore_flush_min in the inline flush case.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-01-18 12:14:48 -08:00
Sage Weil
8ddb55d34c os/FileStore: fix compile when sync_file_range is missing;
If sync_file_range is not present, we always close inline, and flush
via fdatasync(2).

Fixes compile on ancient platforms like RHEL5.8.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-01-18 12:14:40 -08:00
Sage Weil
b8d5e28651 doc/rados/operations/crush: need kernel v3.6 for first round of tunables
Reported-by: rl219 in #ceph on irc.oftc.net
Signed-off-by: Sage Weil <sage@inktank.com>
2013-01-18 11:05:03 -08:00
Noah Watkins
736966f38b java: support get pool id/replication interface
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2013-01-18 10:35:44 -08:00
Noah Watkins
40415d1c2f libcephfs: add pool id/size lookup interface
Adds new interfaces ceph_get_pool_id() and ceph_get_pool_replication()
to libcephfs.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2013-01-18 10:33:50 -08:00
John Wilkins
76e715ba8f doc: Added link to rotation section.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-18 00:25:28 -08:00
John Wilkins
e1741ba602 doc: Added hyperlink to log rotation section.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-18 00:25:08 -08:00
John Wilkins
612717af9b doc: Added section on log rotation.
fixes: #3776

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-18 00:24:22 -08:00
John Wilkins
48f414686e Merge branch 'master' of https://github.com/ceph/ceph 2013-01-17 23:33:06 -08:00
John Wilkins
83326588c7 doc: Modified index to include mon-osd-interaction.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-17 23:32:26 -08:00
John Wilkins
d6fc92dfae doc: Added a section describing mon/osd interaction.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-01-17 23:31:47 -08:00
Gary Lowell
bebdc70b42 build: Add perl installation dependency to rpm and debian packages.
There was already a dependency on python in the debian control file,
a similar dependency was added to the rpm spec file.  perl is needed
for the logrotate script, so a dependecy was on perl wass added to
both. Bug 3768.

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-01-17 22:43:07 -08:00