Commit Graph

19279 Commits

Author SHA1 Message Date
Sage Weil
2673875f0c mon: do not mark booting osds in if NOIN flag is set
If the NOIN osdmap flag is set, do not mark booting osds in.  Normally
we would for a range of reasons (always, new, auto-marked-out), but block
them all.

Do not limit manual 'ceph osd in N' commands.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 15:47:40 -07:00
Sage Weil
9ff535ad2c mon: always remove booting osds from down_pending_out
The down_pending_out tracks OSDs that are down that we may want to
auto-mark out.  If an osd boots, it should be removed from this list
because it is no longer down; it doesn't matter whether it is marked in
or not.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 15:28:36 -07:00
Sage Weil
addfb2c670 mon: prevent osd mark-down with NODOWN flag
If the NODOWN osdmap flag is set,

 - ignore osd failure reports
 - do not mark osds down due to lack of osd/pg stats

We *do* still allow explicit admin 'ceph osd down N' commands, and a
booting OSD to mark the previous instance of itself down.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 14:31:42 -07:00
Sage Weil
f0773863ce osd: do not attempt to boot if NOUP
If NOUP is set, do not send the boot message.

We already send onetime subscriptions to the osdmap, so we will find out
about osdmap flag changes.  If it is cleared later, we'll pass into
start_boot() and _got_boot_version() again and send it then.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 14:31:42 -07:00
Sage Weil
d84255ecba mon: prevent osd from booting if NOUP
Do not add an osd attempting to boot to the map if NOUP is sent.  Instead,
send it the latest osdmap so it knows that it's not allowed to boot.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 14:16:42 -07:00
Sage Weil
d3f55dd917 mon: 'osd {set,unset} {noin,noout,noup,nodown}'
Move the set/unset flag code into a helper, and also use that for the
pause/unpause commands.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 14:10:47 -07:00
Sage Weil
6003325d24 osdmap: add NOUP, NODOWN, NOIN, NOOUT flags
These prevent OSDs from being marked up, down, in, or out, respectively.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 14:10:46 -07:00
Josh Durgin
34ef3f3765 Merge remote branch 'origin/wip-rbd-snapid' into next
Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 13:59:09 -07:00
Sage Weil
e51772ca19 librbd: pass errors removing head back to user
In particular, the OSD may return EBUSY if there are still watchers.
Ignore ENOENT, as that may indicate we are cleaning up a previously
aborted removal.

Fixes: #2311
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 13:39:38 -07:00
Sage Weil
ccf7d9309e mon: clean up handle_osd_timeouts a bit
Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-24 10:55:22 -07:00
Sage Weil
36ffed45e8 mon: fix pg stats timeout
We clear out the osd entry when an osd goes up or down.  Thus, if we find
it missing from an up osd, we should start the timer.  Otherwise we get
behavior like this

2012-04-24 13:22:47.888291 7fa5bc587700 mon.peon5752@0(leader).osd e21633 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:22:50.076394 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:22:52.903558 7fa5bc587700 mon.peon5752@0(leader).osd e21638 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:15.144532 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:23:17.967118 7fa5bc587700 mon.peon5752@0(leader).osd e21663 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:22.173778 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:23:22.981556 7fa5bc587700 mon.peon5752@0(leader).osd e21668 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:45.245380 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot

when the pg stats message doesn't arrive quickly enough.

Fixes: #2341
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-04-24 10:55:18 -07:00
Sage Weil
7b832f4266 mon: fix whitespace
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-24 10:49:30 -07:00
Greg Farnum
2b302015e7 mon: fix pgmonitor ratio commands
The indices were set incorrectly when I whipped thi sup. That's what
you get for not testing nor being careful enough in review.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2012-04-24 10:35:32 -07:00
Josh Durgin
d28f850fe1 test_rbd: add tests for snap_set and more complicated resizing
* snap_set to a deleted (and recreated) snapshot
* resizing down (truncating) and back up
* resizing to non-object-aligned sizes

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-04-24 08:57:31 -07:00
Josh Durgin
7add136f90 librbd: reset needs_refresh flag before re-reading header
This way we can't miss an update if we get a notify during ictx_refresh.
Specifically, a race like this:

Thread 1               Thread 2              Process 2

ictx_refresh()
read_header()
                                             snap_create()
                       notify()
                       need_refresh = true
process header...
need_refresh = false

If this happened, we would not re-read the header with the new
snapshot, so the snapshot would not happen at the intended point
in time, but only after we re-read the header again.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-04-24 08:57:31 -07:00
Josh Durgin
3ef3ab8a15 librbd: clean up snapshot handling a bit
* snapid should determine whether our mapped snapshot is gone, not snapname
* snap_set(<nonexistent_snap>) shouldn't reset us to CEPH_NOSNAP
* snapname should be set before using the it in the perfcounter name
* snapname and image name don't need to be passed as arguments since an
  ImageCtx already contains that info
* ictx_check() doesn't need to check for non-existent snaps - only I/Os care,
  so check in check_io() instead

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-04-24 08:57:31 -07:00
Josh Durgin
e17b5a85be librbd: clarify handle_sparse_read condition
The earlier condition is >. != means < at this point, and the nesting
is unnecessary.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2012-04-24 08:57:31 -07:00
Sage Weil
2bdaba4f01 run_seed_to.sh: rework the script, make it more flexible and broaden the tests.
Allow for '-h' and other options such as disabling the journal sync tests,
defining it is to be run on a btrfs FS, enabling exit on error (default is
now 'off'), and allow certain env variables to specify additional options
to each store.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-23 20:31:02 -07:00
Sage Weil
e65b797164 librbd: rev version for discard addition
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 19:43:25 -07:00
Sage Weil
19ba34753e osdmaptool: fix clitests for lack of localized pgs
Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-23 14:48:02 -07:00
Sage Weil
637de4d762 mon: load CompatSet features on startup
Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-23 14:48:02 -07:00
Sage Weil
771fd05b02 mon: set auid for mon-created pools to 0
Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-23 14:48:02 -07:00
Sage Weil
a51434445a mon: ignore/remove localized pgs
This will trigger on the next OSDMap update.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:02 -07:00
Sage Weil
f01b6dd54f test_ioctls: remove preferred osd
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:02 -07:00
Sage Weil
b3cdc21a08 cephfs: remove preferred osd setting
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:02 -07:00
Sage Weil
198544ad85 mds: remove preferred from ceph_file_layout
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:02 -07:00
Sage Weil
21ef979550 client: rip out preferred_pg thing
This wasn't even named properly.  Blech.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:01 -07:00
Sage Weil
6d1344c3f6 libcephfs: disable ceph_set_default_preferred_pg
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:01 -07:00
Sage Weil
f164b87785 osdmap: do not forcefeed preferred osd to crush
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:01 -07:00
Sage Weil
b8f4acfb29 osd: remove preferred from object_locator_t
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:01 -07:00
Sage Weil
0138a76491 osd: ignore localized pgs
- do not load them on startup
- ignore any we hear about over the wire

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:01 -07:00
Sage Weil
94adf5d969 osd: remove localized pgs from pg_pool_t
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 14:48:01 -07:00
Sage Weil
0777613654 Merge remote-tracking branch 'gh/wip-discard' 2012-04-23 13:58:34 -07:00
Sage Weil
43d1a9201c run_seed_to.sh: remove stray arg
This crept in in commit d1740bd586.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-23 09:30:50 -07:00
Joao Eduardo Luis
0112e74915 run_seed_to.sh: rework the script, make it more flexible and broaden the tests.
Allow for '-h' and other options such as disabling the journal sync tests,
defining it is to be run on a btrfs FS, enabling exit on error (default is
now 'off'), and allow certain env variables to specify additional options
to each store.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-23 13:19:39 +01:00
Sage Weil
e9ecd1b384 perfcounters: tolerate multiple loggers with the same name
Make them unique by appending -<ptr>, so that the json we dump will remain
valid.

We may also want to allow people to share counters of the same type.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-22 14:23:52 -07:00
Sage Weil
be438d6e4e Merge branch 'master' into wip-discard 2012-04-21 21:01:49 -07:00
Sage Weil
97f507ffb7 Makefile: disable format-security warning
The prt() varargs function generates this warning

test/rbd/fsx.c: In function ‘prt’:
warning: test/rbd/fsx.c:203:2: format not a string literal and no format arguments [-Wformat-security]
warning: test/rbd/fsx.c:205:3: format not a string literal and no format arguments [-Wformat-security]

Disable that check for the fsx build only.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-21 20:28:45 -07:00
Sage Weil
c8377e466c filestore: verify that fiemap works
Check for a bug present in older versions of ext4.  If present, disable
FIEMAP.  See #2328.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-21 14:31:42 -07:00
Sage Weil
7471a9b1e2 rados: fix error printout for mapext
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-21 13:32:46 -07:00
Sage Weil
07ddff4271 librbd: instrument with perfcounters
Track IO operations on a per-image basis.

Implements: #1451
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-21 12:50:05 -07:00
Sage Weil
fb9fdf45b0 librbd: fix ictx_check pointer weirdness by using std::string
I was seeing failures of LibRBD.TestIOToSnapshot where we would fail to
refresh after rollback, even though the snap existed.  I assume it is
because the std::string whose c_str() we were pointing to was reallocated.

Use a std::string here instead.

This code is weird.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-20 17:13:08 -07:00
Samuel Just
888a082f23 FileJournal: don't wait flusher until completions are queued
Fixes: #2324
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2012-04-20 17:01:21 -07:00
Sage Weil
d1740bd586 filestore: fix collection_add journal replay problem
In collection_add we have a two-phase guard set on the linked object via
the old name.  During replay, we might see that the dest name is missing
and replay the operation, and in the process overwrite a newer guard with
an older one.

Avoid this by checking the source name too, and skipping the operation
entirely if a new guard exists.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-20 16:56:57 -07:00
Sage Weil
92b299afc5 FileStoreDiff: flip sense of diff*() methods around
true means diff, false means same.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2012-04-20 16:56:06 -07:00
Sage Weil
ca369c98a3 test_idempotent_sequence: Use FileStoreDiff class instead.
Use FileStoreDiff instead of having the diff code embedded in the test,
allowing for more tests and people to use the code in case it comes in
hand.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-20 16:55:55 -07:00
Joao Eduardo Luis
5466ebc776 test_idempotent_sequence: Output missing options on "usage".
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-20 16:51:52 -07:00
Joao Eduardo Luis
474612918e FileStoreDiff: check if two FileStores match.
This code should be on a stand-alone class, instead of being embedded on
a single test, in case someone or something find it useful somewhere down
the line.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
2012-04-20 16:51:40 -07:00
Sage Weil
4ddbbf5467 librbd: allow image resize to non-block boundaries
The caller is still invalidating the entire cache, so we don't need to
deal with discard at this level.  That might be worth cleaning up
later, though.

Fixes: #2296
Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-20 16:51:01 -07:00
Sage Weil
165038d589 objectcacher: rename truncate_set -> discard_set, and use discard
Do not assume the object extents are at the trailing edge of objects.
Instead, discard arbitrary extents.  Fix callers.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-04-20 16:51:01 -07:00