RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2024-12-23 20:03:56 +00:00

Author	SHA1	Message	Date
Josh Durgin	15bb9ba9fb	objecter: initialize linger op snapid Since they are write ops now, it must be CEPH_NOSNAP or the OSD returns EINVAL. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-21 23:23:02 -08:00
David Zafman	5648117626	Add test for list_watchers() C++ interface Signed-off-by: David Zafman <david.zafman@inktank.com>	2013-02-21 21:50:02 -08:00
David Zafman	1c3241e3bf	Add listwatchers command to rados Signed-off-by: David Zafman <david.zafman@inktank.com>	2013-02-21 21:50:02 -08:00
David Zafman	af339aee46	Add ObjectReadOperation and IoCtx functions Signed-off-by: David Zafman <david.zafman@inktank.com>	2013-02-21 21:50:02 -08:00
David Zafman	cfe923920c	librados: expose a list of watchers on an object Add new op CEPH_OSD_OP_LIST_WATCHERS Add Objecter handling Signed-off-by: David Zafman <david.zafman@inktank.com>	2013-02-21 21:50:02 -08:00
David Zafman	bf5cf3318d	Add rados_types.h header file Signed-off-by: David Zafman <david.zafman@inktank.com>	2013-02-21 21:50:01 -08:00
Dan Mick	8c05af5dc3	configuration parsing: give better error for missing = A ceph.conf line with "key" and no "= value" currently shows "unexpected character while parsing putative key value, at char N line M". There's no reason it can't be clearer. Fixes: #4229 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-02-21 21:45:27 -08:00
Sage Weil	dc181224ab	osd/PG: fix typo, missing -> omissing From `ce7ffc3440`. Signed-off-by: Sage Weil <sage@inktank.com>	2013-02-21 17:55:21 -08:00
Josh Durgin	94ae725465	test_librbd_fsx: fix image closing Always close the image we opened in check_clone(), and check the return code of the rbd_close() called before cloning. Refs: #3958 Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-21 17:39:22 -08:00
Sage Weil	6c08c7c1c6	objecter: separate out linger_read() and linger_mutate() A watch is a mutation, while a notify is a read. The mutations need to pass in a proper snap context to be fully correct. Also, make the WRITE flag implicit so the caller doesn't need to pass it in. Signed-off-by: Sage Weil <sage@inktank.com>	2013-02-21 17:31:41 -08:00
Sage Weil	de4fa95f03	osd: make watch OSDOp print sanely Signed-off-by: Sage Weil <sage@inktank.com>	2013-02-21 17:31:41 -08:00
Sage Weil	60ebf02a28	Merge branch 'next'	2013-02-21 17:30:46 -08:00
Sage Weil	dd007db3ca	ceph_common.sh: fix iteration of items in ceph.conf This broke in `c8f528a407`. Signed-off-by: Sage Weil <sage@inktank.com>	2013-02-21 17:30:06 -08:00
Dan Mick	6cb53740f2	ceph-conf.rst: missing '=' in example network settings Signed-off-by: Dan Mick <dan.mick@inktank.com>	2013-02-21 17:02:17 -08:00
Sage Weil	9af94eea20	Merge remote-tracking branch 'gh/wsp.bobtail.2merge'	2013-02-21 15:45:36 -08:00
Samuel Just	ce7ffc3440	PG::proc_replica_log: adjust oinfo.last_complete based on omissing Otherwise, search_for_missing may neglect to check the missing set for some objects assuming that if the need version is prior to last_complete, the replica must have it. Fixes: #4994 Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-02-21 15:37:14 -08:00
Samuel Just	8086d1d8c0	Merge remote-tracking branch 'upstream/wip_clone_attrs' Reviewed-by: Sage Weil <sage@inktank.com>	2013-02-21 14:42:33 -08:00
Greg Farnum	79f09bf33e	MDS: remove a few other unnecessary is_base() checks We should let users remove xattrs as well as set them. ;) And the check in handle_client_setlayout was totally useless -- perhaps intended for setdirlayout? This is a follow-on to `9f82ae60fa` and should be taken wherever it goes. Signed-off-by: Greg Farnum <greg@inktank.com>	2013-02-21 14:30:42 -08:00
Greg Farnum	9f82ae60fa	mds: allow xattrs on the root inode This was previously disallowed because Once Upon a Time, the root inode wasn't persisted to disk and was an entirely in-memory construct. But it's safe now, and has been for a while. Signed-off-by: Greg Farnum <greg@inktank.com>	2013-02-21 14:21:08 -08:00
Greg Farnum	6bd8781dda	mds: use inode_t::layout for dir layout policy This cherry-pick is going in the reverse direction of normal. That's because this direction makes for the minimal change -- this patchset is required to fix the loss of directory layouts we were previously seeing, but fixing it requires changing the encoding versions. So we wrote it on top of Bobtail and let it update the struct_v's as they existed then. Note that we here change a few encoding versions in ways which are NOT COMPATIBLE with previous development code (but not any releases). In particular, development code introduced and this removes the file_layout_policy_t, and some of the CInode and EMetaBlob encoding struct_v values were used in development code to mean one thing, but mean something different due to the Bobtail patch. Remove the default_file_layout struct, which was just a ceph_file_layout, and store it in the inode_t. Rip out all the annoying code that put this on the heap. To aid in this usage, add a clear_layout() function to inode_t. Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Signed-off-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit `36ed407e0f`) Conflicts: src/mds/CInode.cc src/mds/CInode.h src/mds/MDCache.cc src/mds/Server.cc src/mds/events/EMetaBlob.h Cherry-pick- Reviewed-by: Sage Weil <sage@inktank.com>	2013-02-21 13:44:01 -08:00
Sage Weil	84ef1649c5	mds: parse ceph..layout vxattr key/value content Use qi to parse a strictly formatted set of key/value pairs. Be picky about whitespace. Any subset of recognized keys is allowed. Parse the same set of keys as the ceph..layout.* vxattrs. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit `5551aa5b3b`)	2013-02-21 13:44:01 -08:00
Sage Weil	fea77682a6	osdc/Objecter: unwatch is a mutation, not a read This was causing librados to unblock after the ACK on unwatch, which meant that librbd users raced and tried to delete the image before the unwatch change was committed..and got EBUSY. See #3958. The watch operation has a similar problem. Signed-off-by: Sage Weil <sage@inktank.com>	2013-02-21 13:28:47 -08:00
Samuel Just	81bd996428	FileStore::_clone: use _fsetattrs rather than _setattrs The omap portion of the clone happened above in DBObjectMap::clone. Only the fs stored attrs need to be explicitely copied. Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-02-21 13:28:26 -08:00
Samuel Just	5b48e63c03	FileStore::_setattrs: use _fsetattrs Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-02-21 13:26:56 -08:00
Samuel Just	c33c51f01f	FileStore: add _fsetattrs Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-02-21 13:26:40 -08:00
Samuel Just	2ec04f9633	FileStore::_setattrs: only do omap operations if necessary Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-02-21 13:25:49 -08:00
Samuel Just	83fad1c7f2	FileStore::_setattrs no need to grab an Index lock for the omap operations Signed-off-by: Samuel Just <sam.just@inktank.com>	2013-02-21 13:24:42 -08:00
Yehuda Sadeh	08efb158ae	Merge pull request #67 from jaharkes/content_length Handle empty CONTENT_LENGTH environment variable.	2013-02-21 12:59:06 -08:00
Jan Harkes	ad00fc72e1	Fix failing > 4MB range requests through radosgw S3 API. When a range request is made for more than rgw_get_obj_max_req_size bytes the first returned chunk sets 'ret' to STATUS_PARTIAL_CONTENT and all remaining chunks behave as if there is an error state and only return a minimal header. Fix this by passing STATUS_PARTIAL_CONTENT to set_req_state_err, but leave the 'ret' member variable untouched. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> (cherry picked from commit `c83a01d4e8`)	2013-02-21 12:51:40 -08:00
Yehuda Sadeh	e5a01317db	Merge pull request #66 from jaharkes/range_requests Fix failing > 4MB range requests through radosgw S3 API.	2013-02-21 12:42:06 -08:00
Jan Harkes	96896eb092	Handle empty CONTENT_LENGTH environment variable. nginx seems to be providing a CONTENT_LENGTH environment variable with no data when the request body is empty. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>	2013-02-21 15:36:30 -05:00
Jan Harkes	c83a01d4e8	Fix failing > 4MB range requests through radosgw S3 API. When a range request is made for more than rgw_get_obj_max_req_size bytes the first returned chunk sets 'ret' to STATUS_PARTIAL_CONTENT and all remaining chunks behave as if there is an error state and only return a minimal header. Fix this by passing STATUS_PARTIAL_CONTENT to set_req_state_err, but leave the 'ret' member variable untouched. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>	2013-02-21 15:29:11 -05:00
Sage Weil	4277265d99	osd: an interval can't go readwrite if its acting is empty Let's not forget that min_size can be zero. Fixes: #4159 Signed-off-by: Sage Weil <sage@inktank.com>	2013-02-21 11:32:39 -08:00
Josh Durgin	a1ae856287	librbd: make sure racing flattens don't crash The only way for a parent to disappear is a racing flatten completing, or possibly in the future the image being forcibly removed. In either case, continuing to flatten makes no sense, so stop early. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-21 11:26:49 -08:00
Josh Durgin	995ff0e3ea	librbd: use rwlocks instead of mutexes for several fields Image metadata like snapshots, size, and parent is frequently read, but rarely updated. During flatten, we were depending on the parent lock to prevent the parent ImageCtx from disappearing out from under us while we read from it. The copy-up path also needed the parent lock to be able to read from the parent image, which lead to a deadlock. Convert parent_lock, snap_lock, and md_lock to RWLocks, and change their use to read instead of exclusive locks where appropriate. The main place exclusive locks are needed is in ictx_refresh, so this is pretty simple. This fixes the deadlock, since parent_lock is only needed for read access in both flatten and the copy-up operation. cache_lock and refresh_lock are only really used for exclusive access, so leave them as regular mutexes. One downside to this is that there's no way to assert is_locked() for RWLocks, so we'll have to be very careful about changing code in the future. Fixes: #3665 Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-21 11:19:40 -08:00
Josh Durgin	e0f8e5a80d	common: add lockers for RWLocks This makes them easier to use, especially instead of existing mutexes. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-21 11:15:43 -08:00
Sage Weil	c8d0889df5	Merge branch 'next' Conflicts: src/osd/ReplicatedPG.cc	2013-02-21 10:44:04 -08:00
Sage Weil	6d8dfb18fe	osd: clear recovery state on pg removal This ensures we release our in-progress recovery counters, which prevents recovery from getting blocked indefinitely when a pool removal races with recovery ops. Fixes: #4217 Backport: bobtail Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>	2013-02-21 10:43:20 -08:00
Josh Durgin	94e5deebc6	test: fix run-rbd-tests pool deletion Use the new safety check Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-21 10:38:38 -08:00
Joao Eduardo Luis	6612b0402e	ceph-object-corpus: use temporary 'wsp.master.new' corpus until we get merged into master Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-02-21 18:29:36 +00:00
Joao Eduardo Luis	beafca57fb	Merge branch 'wsp.bobtail.2merge' into wsp.bobtail.master Conflicts: src/.gitignore src/Makefile.am src/include/ceph_features.h src/mon/MDSMonitor.cc src/mon/PGMonitor.cc	2013-02-21 18:04:22 +00:00
Joao Eduardo Luis	04dac7ee7a	vstart.sh: Create mon data directory before --mkfs Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-02-21 18:02:23 +00:00
Joao Eduardo Luis	89f920492d	test: ObjectMap: add a generic leveldb store tool Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-02-21 18:02:23 +00:00
Joao Eduardo Luis	cb85fb7d9a	mon: ceph-mon: convert an old monitor store to the new format With the single-paxos patches we shifted from an approach with multiple paxos instances (one for each paxos service) keeping their own versions to a single paxos instance for all the paxos services, thus ending up with a single global version for paxos. With the release of v0.52, the monitor started tracking these global versions, keeping them for the single purpose of making it possible to convert the store to a single-paxos format. This patch now introduces a mechanism to convert a GV-enabled store to the single-paxos format store when the monitor is upgraded. As we require the global versions to be present, we first check if the store has the GV feature set: if not we will not proceed, but we will start the conversion otherwise. In the end of the conversion, the monitor data directory will have a brand new 'store.db' directory, where the key/value store lies, alongside with the old store. This makes it possible to revert to a previous monitor version if things go sideways, without jeopardizing the data in the store. The conversion is done as during a rolling upgrade, without any intervention by the user. Fire up the new monitor version on an old store, and the monitor itself will convert the store, trim any lingering versions that might not be required, and proceed to start as expected. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-02-21 18:02:23 +00:00
Joao Eduardo Luis	19e5098afe	mon: Add an offline monitor store converter This tool will convert an old monitor store format (bobtail) to the new key/value store-backed, single-paxos format. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-02-21 18:02:22 +00:00
Joao Eduardo Luis	091fa826d9	os: LevelDBStore: scrap init() and create open() and create_and_open() The init() function always implicitly created a new store if it was missing. This patches makes init() a private function accepting a bool that used to specify whether or not we want to create the store if it does not exists, and creates two functions: open() and create_and_open(). open() will fail if the store we are trying to open does not exist; create_and_open() maintains the same behavior as the previous behavior of init() and will create the store if it does not exist before opening it. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>	2013-02-21 18:02:22 +00:00
Joao Eduardo Luis	cab3411b4a	mon: Monitor: Add monitor store synchronization support Synchronize two monitor stores when one of the monitors has diverged significantly from the remaining monitor cluster. This process roughly consists of the following steps: 0. mon.X tries to join the cluster; 1. mon.X verifies that it has diverged from the remaining cluster; 2. mon.X asks the leader to sync; 3. the leader allows mon.X to sync, pointing out a mon.Y from which mon.X should sync; 4. mon.X asks mon.Y to sync; 5. mon.Y sends its own store in one or more chunks; 6. mon.X acks each received chunk; go to 5; 7. mon.X receives the last chunk from mon.Y; 8. mon.X informs the leader that it has finished synchronizing; 9. the leader acks mon.X's finished sync; 10. mon.X bootstraps and retries joining the cluster (goto 0.) This is the most simple and straightforward process that can be hoped for. However, things may go sideways at any time (monitors failing, for instance), which could potentially lead to a corrupted monitor store. There are however mechanisms at work to avoid such scenario at any step of the process. Some of these mechanisms include: - aborting the sync if the leader fails or leadership changes; - state barriers on synchronization functions to avoid stray/outdated messages from interfering on the normal monitor behavior or on-going synchronization; - store clean-up before any synchronization process starts; - store clean-up if a sync process fails; - resuming sync from a different monitor mon.Z if mon.Y fails mid-sync; - several timeouts to guarantee that all the involved parties are still alive and participating in the sync effort. - request forwarding when mon.X contacts a monitor outside the quorum that might know who the leader is (or might know someone who does) [4]. Changes: - Adapt the MMonProbe message for the single-paxos approach, dropping the version map and using a lower and upper bound version instead. - Remove old slurp code. - Add 'sync force' command; 'sync_force' through the admin socket. Notes: [1] It's important to keep track of the paxos version at the time at which a store sync starts. Given that after the sync we end up with the same state as the monitor we are synchronizing from, there is a chance that we might end up with an uncommitted paxos version if we are synchronizing with the leader (there's some paxos stashing done prior to commit on the leader). By keeping track at which version the sync started, we can then let the requester to which version he should cap its paxos store. [2] Furthermore, the enforced paxos cap, described on [1], is even more important if we consider the need to reapply the paxos versions that were received during the sync, to make sure the paxos store is consistent. If we happened to have some yet-uncommitted version in the store, we could end up applying it. [3] What is described in [1] and [2]: Fixes: #4026 Fixes: #4037 Fixes: #4040 [4] Whenever a given monitor mon.X is on the probing phase and notices that there is a mon.Y with a paxos version considerably higher than the one mon.X has, then mon.X will attempt to synchronize from mon.Y. This is the basis for the store sync. However this might hold true, the fact is that there might be a chance that, by the time mon.Y handles the sync request from mon.X, mon.Y might already be attempting a sync himself with some other mon.Z. In this case, the appropriate thing for mon.Y to do is to forward mon.X's request to mon.Z, as mon.Z should be part of the quorum, know who the leader is or be the leader himself -- if not, at least it is guaranteed that mon.Z has a higher version than both mon.X and mon.Y, so it should be okay to sync from him. Fixes: #4162 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-02-21 18:02:22 +00:00
Joao Eduardo Luis	6db25a3885	message: MMonSync: Monitor Synchronization message The monitor's synchronization process requires a specific message type to carry the required informations. Since this process significantly differs from slurping, reusing the MMonProbe message is not an option as it would require major changes and, for all intetions and purposes, it would be far outside the scope of the MMonProbe message. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-02-21 18:02:22 +00:00
Joao Eduardo Luis	d8a5cf6b4f	mon: MonitorDBStore: add store iterators to obtain chunks for sync We created an interface specific to the MonitorDBStore, which can be used to create iterators to obtain chunks for sync. Two different iterators were defined: one that will iterate over the whole store, focusing on the specified set of prefixes; another that will iterate over only one specific prefix. These two different iterators allow us build the sync process in two distinct phases: 1) obtain all key/value pairs for paxos and all paxos services, bundle them in chunks and send them over the wire; and 2) obtain all the paxos versions, bundle them in chunks and send them over the wire. Also, we are currently considering a chunk to be (at most) 1 MB worth of data, although it can be tuned using 'mon_sync_max_payload_size' option. mon: MonitorDBStore: add crc support when --mon-sync-debug is set Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-02-21 18:02:22 +00:00
Joao Eduardo Luis	b33d4eacaa	mon: Paxos: get rid of slurp-related code Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2013-02-21 18:02:22 +00:00

... 11 12 13 14 15 ...

24750 Commits