Commit Graph

20204 Commits

Author SHA1 Message Date
Pascal de Bruijn | Unilogic Networks B.V
96587f39e3 Robustify ceph-rbdnamer and adapt udev rules
Below is a patch which makes the ceph-rbdnamer script more robust and
fixes a problem with the rbd udev rules.

On our setup we encountered a symlink which was linked to the wrong rbd:

  /dev/rbd/mypool/myrbd -> /dev/rbd1

While that link should have gone to /dev/rbd3 (on which a
partition /dev/rbd3p1 was present).

Now the old udev rule passes %n to the ceph-rbdnamer script, the problem
with %n is that %n results in a value of 3 (for rbd3), but in a value of
1 (for rbd3p1), so it seems it can't be depended upon for rbdnaming.

In the patch below the ceph-rbdnamer script is made more robust and it
now it can be called in various ways:

  /usr/bin/ceph-rbdnamer /dev/rbd3
  /usr/bin/ceph-rbdnamer /dev/rbd3p1
  /usr/bin/ceph-rbdnamer rbd3
  /usr/bin/ceph-rbdnamer rbd3p1
  /usr/bin/ceph-rbdnamer 3

Even with all these different styles of calling the modified script, it
should now return the same rbdname. This change "has" to be combined
with calling it from udev with %k though.

With that fixed, we hit the second problem. We ended up with:

  /dev/rbd/mypool/myrbd -> /dev/rbd3p1

So the rbdname was symlinked to the partition on the rbd instead of the
rbd itself. So what probably went wrong is udev discovering the disk and
running ceph-rbdnamer which resolved it to myrbd so the following
symlink was created:

  /dev/rbd/mypool/myrbd -> /dev/rbd3

However partitions would be discovered next and ceph-rbdnamer would be
run with rbd3p1 (%k) as parameter, resulting in the name myrbd too, with
the previous correct symlink being overwritten with a faulty one:

  /dev/rbd/mypool/myrbd -> /dev/rbd3p1

The solution to the problem is in differentiating between disks and
partitions in udev and handling them slightly differently. So with the
patch below partitions now get their own symlinks in the following style
(which is fairly consistent with other udev rules):

  /dev/rbd/mypool/myrbd-part1 -> /dev/rbd3p1

Please let me know any feedback you have on this patch or the approach
used.

Regards,
Pascal de Bruijn
Unilogic B.V.

Signed-off-by: Pascal de Bruijn <pascal@unilogicnetworks.net>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-16 17:34:22 -07:00
Sage Weil
52f96b9fd1 log: apply log_level to stderr/syslog logic
In non-crash situations, we want to make sure the message is both below the
syslog/stderr threshold and also below the normal log threshold.  Otherwise
we get anything we gather on those channels, even when the log level is
low.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-16 16:02:14 -07:00
Sage Weil
64f745008b log: fix event gather condition
We should gather an event if it is below the log or gather threshold.

Previously we were only gathering if we were going to print it, which makes
the dump no more useful than what was already logged.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-16 15:36:44 -07:00
Samuel Just
c7fb964c07 PG::RecoveryState::Stray::react(LogEvt&): reset last_pg_scrub
We need to reset the last_pg_scrub data in the osd since we
are replacing the info.

Probably fixes #2453

In cases like 2453, we hit the following backtrace:

     0> 2012-05-19 17:24:09.113684 7fe66be3d700 -1 osd/OSD.h: In function 'void OSD::unreg_last_pg_scrub(pg_t, utime_t)' thread 7fe66be3d700 time 2012-05-19 17:24:09.095719
osd/OSD.h: 840: FAILED assert(last_scrub_pg.count(p))

 ceph version 0.46-313-g4277d4d (commit:4277d4d3378dde4264e2b8d211371569219c6e4b)
 1: (OSD::unreg_last_pg_scrub(pg_t, utime_t)+0x149) [0x641f49]
 2: (PG::proc_primary_info(ObjectStore::Transaction&, pg_info_t const&)+0x5e) [0x63383e]
 3: (PG::RecoveryState::ReplicaActive::react(PG::RecoveryState::MInfoRec const&)+0x4a) [0x633eda]
 4: (boost::statechart::detail::reaction_result boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list3<boost::statechart::custom_reaction<PG::RecoveryState::MQuery>, boost::statechart::custom_reaction<PG::RecoveryState::MInfoRec>, boost::statechart::custom_reaction<PG::RecoveryState::MLogRec> >, boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*)+0x130) [0x6466a0]
 5: (boost::statechart::simple_state<PG::RecoveryState::ReplicaActive, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x81) [0x646791]
 6: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x5b) [0x63dfcb]
 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x11) [0x63e0f1]
 8: (PG::RecoveryState::handle_info(int, pg_info_t&, PG::RecoveryCtx*)+0x177) [0x616987]
 9: (OSD::handle_pg_info(std::tr1::shared_ptr<OpRequest>)+0x665) [0x5d3d15]
 10: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x2a0) [0x5d7370]
 11: (OSD::_dispatch(Message*)+0x191) [0x5dd4a1]
 12: (OSD::ms_dispatch(Message*)+0x153) [0x5ddda3]
 13: (SimpleMessenger::dispatch_entry()+0x863) [0x77fbc3]
 14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x746c5d]
 15: (()+0x7efc) [0x7fe679b1fefc]
 16: (clone()+0x6d) [0x7fe67815089d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Because we don't clear the scrub state before reseting info,
the last_scrub_stamp state in the info.history structure
changes without updating the osd state resulting in the
above assert failure.

Backport: stable

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-16 14:07:49 -07:00
Sage Weil
b7814dbefb osd: based misdirected op role calc on acting set
We want to look at the acting set here, nothing else.  This was causing us
to erroneously queue ops for later (wasting memory) and to erroneously
print out a 'misdrected op' message in the cluster log (confusion and
incorrect [but ignored] -ENXIO reply).

Fixes: #2022
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-16 10:57:33 -07:00
Sage Weil
14d2efc438 mon/MonitorStore: always O_TRUNC when writing states
It is possible for a .new file to already exist, potentially with a
larger size.  This would happen if:

 - we were proposing a different value
 - we crashed (or were stopped) before it got renamed into place
 - after restarting, a different value was proposed and accepted.

This isn't so unlikely for the log state machine, where we're
aggregating random messages.  O_TRUNC ensure we avoid getting the tail
end of some previous junk.

I observed #2593 and found that a logm state value had a larger size on
one mon (after slurping) than the others, pointing to put_bl_sn_map().

While we are at it, O_TRUNC put_int() too; the same type of bug is
possible there, too.

Fixes: #2593
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-16 10:57:08 -07:00
Josh Durgin
5a5597f6c5 qa: download tests from specified branch
These python tests aren't installed, so they need to be downloaded

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-13 13:35:07 -07:00
Yehuda Sadeh
f33c0bee28 rgw: don't override subuser perm mask if perm not specified
Bug #2650. We were overriding subuser perm mask whenever subuser
was modified, even if perm mask was not passed.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-12 09:41:04 -07:00
James Page
65c43e341b debian: fix ceph-fs-common-dbg depends
Signed-off-by: James Page <james.page@ubuntu.com>
2012-07-12 06:58:56 -07:00
Sage Weil
99a048d882 rados: more usage cleanup
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-11 18:54:30 -07:00
Dan Mick
0081c8e420 rados: usage message
Bad linebreaks, wrapping, stringification, missing doc for bench args

    Signed-off-by: Dan Mick <dan.mick@inktank.com>
    Reviewed-by: Samuel Just <sam.just@inktank.com>
2012-07-11 18:53:35 -07:00
Yehuda Sadeh
173d592a4e rados tool: remove -t param option for target pool
Bug #2772. This fixes an issue that was introduced when we
added the 'rados cp' command. The -t param was already used
for rados bench. With this change the only way to specify
a target pool is using --target-pool.
Though this problem is post argonaut, the 'rados cp' command
has been backported, so we need this fix there too.

Backport: argonaut

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-11 17:11:15 -07:00
Sage Weil
2c001b28fb Makefile: don't install crush headers
This is leftover from when we built a libcrush.so.  We can re-add when we
start doing that again.

Reported-by: Laszlo Boszormenyi <gcs@debian.hu>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-11 09:19:16 -07:00
Sage Weil
fa96e19f4d Merge branch 'stable' into next 2012-07-10 18:21:29 -07:00
Sage Weil
0f917c2f14 osd: guard class call decoding
Backport: argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 18:21:06 -07:00
Sage Weil
0ff6c97983 test_stress_watch: just one librados instance
This was creating a new cluster connection/session per iteration, and
along with it a few service threads and sockets and so forth.

Unfortunately, librados leaks like a sieve, starting with CephContext
and ceph::crypto::init().  See #845 and #2067.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 18:21:00 -07:00
Samuel Just
ee1c029da4 ReplicatedPG: don't warn if backfill peer stats don't match
pinfo.stats might be wrong if we did log-based recovery on the
backfilled portion in addition to continuing backfill.

bug #2750

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-10 18:19:57 -07:00
Sage Weil
d3c97dae78 librados: take lock when signaling notify cond
When we are signaling the cond to indicate that a notify is complete,
take the appropriate lock.  This removes the possibility of a race
that loses our signal.  (That would be very difficult given that there
are network round trips involved, but this makes the lock/cond usage
"correct.")

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 18:18:28 -07:00
Sage Weil
ec490d878d client: fix locking for SafeCond users
Need to wait on flock, not client_lock.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-10 18:17:43 -07:00
Sage Weil
b387077b1d debian: include librados-config in librados-dev
Reported-by: Laszlo Boszormenyi <gcs@debian.hu>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-08 20:33:12 -07:00
Sage Weil
03c2dc244a lockdep: increase max locks
Hit this limit with the rados api tests.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-06 16:45:29 -07:00
Sage Weil
b554d112c1 config: add unlocked version of get_my_sections; use it internally
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-06 16:45:24 -07:00
Sage Weil
01da287b8f config: fix lock recursion in get_val_from_conf_file()
Introduce a private, already-locked version.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-06 16:45:08 -07:00
Sage Weil
c73c64a0f7 config: fix recursive lock in parse_config_files()
The _impl() helper is only called from parse_config_files(); don't retake
the lock.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-06 16:45:05 -07:00
Yehuda Sadeh
97c1562dda rgw: handle response-* params
Handle response-* params that set response header field values.
Fixes #2734, #2735.
Backport: argonaut

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-06 16:43:57 -07:00
Sage Weil
6646e891ff rgw: initialize fields of RGWObjEnt
This fixes various valgrind warnings triggered by the s3test
test_object_create_unreadable.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-06 16:43:07 -07:00
Yehuda Sadeh
b33553aae6 rgw: handle response-* params
Handle response-* params that set response header field values.
Fixes #2734, #2735.
Backport: argonaut

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-06 16:44:58 -07:00
Sage Weil
74f687501a osd: add missing formatter close_section() to scrub status
Also add braces to make the open/close matchups easier to see.  Broken
by f366173927.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-06 15:17:30 -07:00
Mike Ryan
020b299613 pg: report scrub status
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-07-06 13:45:06 -07:00
Mike Ryan
db6d83b3ed pg: track who we are waiting for maps from
Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-07-06 13:45:04 -07:00
Mike Ryan
e1d4855fa1 pg: reduce scrub write lock window
Wait for all replicas to construct the base scrub map before finalizing
the scrub and locking out writes.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
2012-07-06 13:45:02 -07:00
Yehuda Sadeh
3df51040b1 rgw: don't store bucket info indexed by bucket_id
Issue #2701. This info wasn't really used anywhere and we weren't
removing it. It was also sharing the same pool namespace as the
info indexed by bucket name, which is bad.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-06 10:18:09 -07:00
Yehuda Sadeh
27409aa161 rgw: don't store bucket info indexed by bucket_id
Issue #2701. This info wasn't really used anywhere and we weren't
removing it. It was also sharing the same pool namespace as the
info indexed by bucket name, which is bad.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-06 10:17:21 -07:00
Yehuda Sadeh
84ba6bf6e1 Merge branch 'stable' into next 2012-07-06 10:16:07 -07:00
Yehuda Sadeh
9814374a2b test_rados_tool.sh: test copy pool
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-06 10:15:34 -07:00
Yehuda Sadeh
d75100667a rados tool: copy object in chunks
Instead of reading the entire object and then writing it,
we read it in chunks.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-06 10:15:34 -07:00
Yehuda Sadeh
16ea64fbde rados tool: copy entire pool
A new rados tool command that copies an entire pool
into another existing pool.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-06 10:15:34 -07:00
Yehuda Sadeh
960c212480 rados tool: copy object
New rados command: rados cp <src-obj> [dest-obj]

Requires specifying source pool. Target pool and locator can be specified.
The new command preserves object xattrs and omap data.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-06 10:15:34 -07:00
Yehuda Sadeh
d59b2db4ab Merge remote-tracking branch 'origin/stable' into next 2012-07-06 10:12:23 -07:00
Sage Weil
23d31d3e2a ceph.spec.in: add ceph-disk-{activate,prepare}
Reported-by: Jimmy Tang <jtang@tchpc.tcd.ie>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-06 08:47:44 -07:00
Wido den Hollander
ea11c7f9d8 Allow URL-safe base64 cephx keys to be decoded.
In these cases + and / are replaced by - and _ to prevent problems when using
the base64 strings in URLs.

Signed-off-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-05 07:32:28 -07:00
Sage Weil
7fa85790fb osd: add missing formatter close_section() to scrub status
Also add braces to make the open/close matchups easier to see.  Broken
by f366173927.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-04 13:59:04 -07:00
Sage Weil
c0b01cda10 Merge branch 'stable'
Conflicts:
	src/test/cli/radosgw-admin/help.t
2012-07-04 09:30:21 -07:00
Wido den Hollander
f67fe4e368 librados: Bump the version to 0.48
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-04 09:21:01 -07:00
Samuel Just
bcfcf8efd5 librados: add assert_version as an operation on an ObjectOperation
Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-04 07:32:23 -07:00
Samuel Just
39eaa23076 ReplicatedPG: do not set reply version to last_update
The version should be oi.user_version as set above.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2012-07-04 07:29:12 -07:00
Sage Weil
e6e36c0a72 rgw: initialize fields of RGWObjEnt
This fixes various valgrind warnings triggered by the s3test
test_object_create_unreadable.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-04 07:27:32 -07:00
Sage Weil
f6cdd85223 Merge remote-tracking branch 'gh/wip-crush' 2012-07-03 16:49:29 -07:00
Yehuda Sadeh
35b9ec881a rgw-admin: use correct modifier with strptime
Bug #2658: used %I (12h) instead of %H (24h)

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-03 16:24:28 -07:00
Yehuda Sadeh
da251fe885 rgw: send both swift x-storage-token and x-auth-token
older clients need x-storage-token, newer x-auth-token

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-03 16:24:20 -07:00