Commit Graph

13425 Commits

Author SHA1 Message Date
Samuel Just
241e29bdfd CephxProtocol.cc: invalid authorizer data should not crash the osd
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-03-18 17:41:32 -07:00
Colin Patrick McCabe
54f7d83e41 ceph.spec.in: some CentOS fixes
BuildRequires: cryptopp-devel has been replaced by nss-devel.  Skip
google-perftools-devel because that package is not available for x86-64.
Add python.

Don't install libcls_rbd.so.1.0.0.debug.

Package crbdnamer and librados-config.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-18 17:07:36 -07:00
Colin Patrick McCabe
1065bef0eb pybind: convert to new API
Fix the python bindings to use the new librados API.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-18 15:10:46 -07:00
Sage Weil
9db1ecf34f backtrace: user the proper version header
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-18 15:09:26 -07:00
Sage Weil
07ba8ee853 libceph: use the proper version header
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-18 15:08:43 -07:00
Sage Weil
f772a16384 libceph: pull version from new version define
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-18 14:49:28 -07:00
Sage Weil
55bb9ef821 configure: no ~
This confuses fedora and isn't really necessary.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-18 14:47:09 -07:00
Sage Weil
e7f3df7233 use 'git describe' version 2011-03-18 14:37:17 -07:00
Colin Patrick McCabe
87e4aa2395 librados: bump minor version number
rados_create_internal -> rados_create_with_config

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-18 11:31:27 -07:00
Colin Patrick McCabe
624410fbc1 librados: rados_ioctx_lookup -> rados_pool_lookup
rados_pool_lookup has nothing to do with io contexts!

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-18 10:59:52 -07:00
Colin Patrick McCabe
a3475610eb direct_io_test: use mkstmp instead of mkostemps
mkostemps isn't present in older glibc versions, like the ones in CentOS
5.5. We don't really use any of the extra functionality of mkostemps in
this test.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-17 18:02:17 -07:00
Colin Patrick McCabe
4db8801bba Makefile: check for new enough version of gtkmm
Versions older than 2.13 don't build, so check for that with automake.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-17 17:50:07 -07:00
Sage Weil
6b3baf2e82 msgr: move test binaries to updated msgr bind/start interface
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-17 12:06:26 -07:00
Sage Weil
acd7d74453 msgr: fix start() return value
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-17 12:05:29 -07:00
Samuel Just
601f59857e PG,OSD: activate pg during replay
Replay PGs already accept and queue transactions.  PGs will now go to
active during replay in order to simplify the state reported to the user
and to allow recovery to being.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
2011-03-17 12:00:20 -07:00
Tommi Virtanen
68a2f46f23 blobhash: Avoid size_t in templatized hash functions.
On S/390, the earlier rjhash<size_t> failed with
"no match for call to '(rjhash<long unsigned int>) (size_t&)'".
It seems the rjhash<size_T> logic was only enabled
on some architectures, and relied on some pretty deep
internals of the bit layout (__LP64__).

Use an explicitly 32-bit type as early as possible, and
convert back to size_t only when really needed. This
should work, and simplifies the code. In theory, we might
have a narrower output (size_t might be 64-bit, max value
we now output is 32-bit), but this doesn't matter as this
is only ever used for picking a slot in an in-memory hash
table, hash(key) modulo num_of_buckets, there won't be >4G
buckets.

Closes: #837

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
2011-03-17 11:59:26 -07:00
Sage Weil
7670b26400 msgr: let user explicitly set nonce
There will be problems if two messengers use the same entity_addr_t because
they are on the same ip and choose the same nonce (e.g., because they are
in the same process).  Let the caller sort this out in whatever way it
finds most appropriate.

For libceph, librados, and csyn, all N million to the pid.

Fixes: #877
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-17 11:33:56 -07:00
Colin Patrick McCabe
2fb9323c22 config: whitespace fix
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-17 11:17:12 -07:00
Colin Patrick McCabe
e7e2bb884f config: fix get_val, set_val
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-17 11:16:01 -07:00
Josh Durgin
70d92b7fab librados: check whether objecter is initialized before shutting it down
Fixes failing unit test Librados.CreateShutdown

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-03-17 10:28:47 -07:00
Josh Durgin
2dae8ad523 objecter: close all sessions when shutdown
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-03-16 16:44:04 -07:00
Sage Weil
248946940f mds: fix replay of fragment ROLLBACK
In the rollback event the bits are negative.  Replay accordingly.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-16 14:52:33 -07:00
Sage Weil
6fa470ba51 common: disable log_per_instance for non-daemons
Turn off the logging and symlink rotation, not just symlink rotation.

This is a somewhat arbitrary distinction (log per instance only for
daemons), but its only used by vstart and only really useful for
development/debugging, so who cares.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-16 14:52:31 -07:00
Sage Weil
cae43fc7d0 Makefile: drop libradosgw_a LDFLAGS
Fixes the warning

src/Makefile.am:299: variable `libradosgw_a_LDFLAGS' is defined but no program or
src/Makefile.am:299: library has `libradosgw_a' as canonic name (possible typo)

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-16 14:52:31 -07:00
Sage Weil
1fbd3a706f mds: resync fragmentation during cache rejoin
During rejoin we may find that different MDSs have different fragmentation
for directories.  When that happens we should refragment as needed on the
replicas to match what's on the primary.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-16 14:52:31 -07:00
Colin Patrick McCabe
32fce3ca2a rados_create: correctly handle null id
Passing a null id to rados_create means "use the default id."

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-16 14:21:25 -07:00
Sage Weil
9e1828af76 objecter: make response_data bufferlist static
Putting it on the heap unnecessary additional complexity.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-16 13:53:39 -07:00
Colin Patrick McCabe
c548976293 rados_create: set id based on parameter
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-16 12:25:02 -07:00
Colin Patrick McCabe
b1c3321641 librados: add rados_create_internal
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-16 12:06:28 -07:00
Josh Durgin
a70b5a81a5 filestore: return negative error code if open fails
ENOENT was being treated as a read of length 2, causing #890.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-03-16 11:23:43 -07:00
Sage Weil
3f442f0686 init-ceph, mkcephfs: fix $name normalization
Strip leading . only, to tolerate osd0 and osd.0.

This also turns osd.....foo -> osd.foo, but that's better than
osd.foo.bar -> osd.foobar.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-15 22:23:46 -07:00
Sage Weil
d7f6000b57 init-ceph: use consistent $type.$id naming
Use $type.$id, regardless of what the user uses.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-15 22:21:06 -07:00
Sage Weil
844093fbf7 osd: only update last_epoch_started after all replicas commit peering results
The PG info.history.last_epoch_started is important because it bounds how
far back in time we think we need to look in order to fully recover the
contents of the PG.  That's because every replica commits the PG peering
result (the info and pg log) when it activates.

In order for this to work properly, we can only advance last_epoch_started
_after_ the peer results are stable on disk on all replicas.  Otherwise a
poorly timed failure (or set of failures) could lose the PG peer results
and we wouldn't go back far enough in time to find them.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-15 22:18:45 -07:00
Sage Weil
87e6d37217 Merge remote branch 'origin/stable' 2011-03-15 20:57:16 -07:00
Colin Patrick McCabe
2319ae13aa logging: don't add --debug
--debug is already taken to change the global debug level.
Just offer -d for now.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-15 14:59:08 -07:00
Josh Durgin
9862afa49e testlibrbd, testradospp: read default conf file
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-03-15 15:00:19 -07:00
Colin Patrick McCabe
abc64b01a3 logging: --foreground options reorganization
-f now just means stay in the foreground.
-d now means stay in the foreground and log to foreground.
Both options now disable pid-files.

Update man pages.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-15 14:49:10 -07:00
Sage Weil
22241f8dae librbd: int -> ssize_t for aio completion wrappers too
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-15 13:18:53 -07:00
Sage Weil
d93c118423 librbd: ssize_t return values for read, write
size_t is 32bits on 64bit archs.  Use ssize_t (long) for return values.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-15 13:16:41 -07:00
Sage Weil
24342a7191 filestore: instrument filestore, journal throughput and throttling
Signed-off-by: Sage Weil <sage@newdream.net>

Conflicts:

	src/os/FileJournal.cc
	src/os/FileStore.cc
	src/os/FileStore.h
	src/os/JournalingObjectStore.cc
2011-03-15 12:43:29 -07:00
Sage Weil
3ecfbfbbd3 filestore: adjust op_queue throttle max during fs commit
The underlying FS (btrfs at least) will block writes for a period while it
is doing a commit.  If an OSD workload is write limited, we should raise
the op_queue max (operations that are queued to be applied to disk) during
the commit period.

For example, for a normally journal throughput limited (writeahead mode)
workload:

 - journal queue throttle normally limits things.
 - sync starts
 - journaled items getting moved to op_queue soon fills up op_queue max
 - all writes stop
 - sync completes
 - op_queue drains, new writes come in again
 - journal queue throttle fills up, again starts limiting tput

For an fs throughput limited workload (writeahead):

 - kernel buffer cache hits dirty limit
 - op_queue throttle limits tput
 - sync starts
 - opq stalls, new writes stall on throttler
 - sync completes
 - opq drains (quickly: kernel has no dirty pages)
 - new writes flood in
 - etc.
(Actually this isn't super realistic, because hitting the kernel dirty
limit will do all sorts of other weird things with userland memory
allocations.)

In both cases, the commit phase blocks up the op queue, and raising the
limit temporarily will keep things flowing.  This should be ok because the
disks are still busy during this period; they're just flushing dirty
data and metadata.  Once the sync completes the opq will quickly dump dirty
data into the kernel page cache and "catch up".

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-15 12:43:29 -07:00
Colin Patrick McCabe
51b93726d9 testrados: test default conf file location
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-15 11:15:06 -07:00
Colin Patrick McCabe
4c22c159d2 librados: add default to rados_conf_read_file
In rados_conf_read_file, read from the default configuration file
locations if the library user passes NULL as the location of the
configuration file.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-15 11:15:06 -07:00
Sage Weil
ca613786f8 rbd: int -> int64_t on do_export
Prevent 32-bit overflow.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-15 11:07:10 -07:00
Sage Weil
174aa56c4b librbd: use int64_t for read_iterate
The read_iterate can cover > addressable memory on 32-bit archs.

Reported-by: Jeff Wu <cpwu@tnsoft.com.cn>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-15 09:47:32 -07:00
Josh Durgin
4ee75a881e objecter: fix leak of bufferlist from MPoolOpReply
bufferlist->claim already clears the source bufferlist,
but setting it to NULL prevented it from being destroyed.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-03-14 17:11:41 -07:00
Sage Weil
723a92657b Merge branch 'stable'
Conflicts:
	configure.ac
	debian/changelog
	src/cfuse.cc
	src/rgw/rgw_rest.cc
2011-03-14 16:44:31 -07:00
Colin Patrick McCabe
df8c00945f cfuse: set proper defaults
Since cfuse usually runs as a nonprivileged user, its defaults must be a
little different from those of the other daemons. Add a flag to
common_init which can be used to set unprivileged daemon defaults.

SimpleMessenger::start() now just takes a boolean telling it whether to
daemonize. It doesn't need to check global variables or other arguments;
it just daemonizes if you tell it to; otherwise not.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
2011-03-14 15:45:05 -07:00
Sage Weil
7f4a161e7f v0.25.1
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-14 15:39:48 -07:00
Sage Weil
db25852fd9 cfuse: always daemonize hack
Always daemonize, until the next round of common_init fixes lands.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
2011-03-14 15:39:48 -07:00