RepoMirrors/ceph

mirror of https://github.com/ceph/ceph synced 2025-01-19 01:21:49 +00:00

Author	SHA1	Message	Date
Sage Weil	186a595ca0	Merge branch 'next'	2012-07-24 11:49:41 -07:00
Sage Weil	f565ace62a	osd: fix pg log zeroing Zero the right number of bytes. Fixes a bug where we clobber legit log data. Fortunately this is only triggered with osd preserve pg log = false, which was not the default until recently in master. Fixes: #2799 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Mike Ryan <mike.ryan@inktank.com>	2012-07-24 11:02:37 -07:00
Yehuda Sadeh	3e886799d9	Merge branch 'wip-2763'	2012-07-24 10:10:22 -07:00
Pierre Rognant	d67ad0db64	Wireshark dissector updated, work with the current development tree of wireshark. The way I patched it is not really clean, but it can be useful if some people quickly need to inspect ceph network flows.	2012-07-24 10:09:27 -07:00
Yehuda Sadeh	52f51a24e2	wireshar/ceph/packet-ceph.c: fix eol Removing extra char from dos eol format. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-24 10:09:27 -07:00
Joao Eduardo Luis	a3d57a6e43	os: KeyValueDB: Add virtual raw_key() function to return (prefix,key) pair If we were to use solely the key() function, whenever we had a key with, say, prefix 'Foo' and key 'Bar', the key() function would return something similar to 'Foo<separator>Bar'. Therefore, obtaining the prefix and the key would require one to be aware of the separator used, and, since that is implementation specific, we can't rely on such prior knowledge. This new function must then be implemented by any derivative class of KeyValueDB, and is expected to return a pair (prefix,key) for the current iterator's position -- the key() function should behave as previously, returning only the 'key' component of the pair. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2012-07-24 02:30:14 +01:00
Joao Eduardo Luis	a16d9c64da	os: KeyValueDB: allow finer-grained control of transaction operations This patch introduces the possibility of using single key/value modification operations into the transaction interface. Until now, any 'set' or 'rmkeys' operations required a map of keys to be provided to the function, which made the task of removing or setting a bunch of keys easier. Doing these same operations for a single key, however, would entail creating a map with a single key. Instead, this patch adds two new virtual abstract functions, to be implemented by derivative classes, which set or remove one single key/value, and we then implement the map-based, existing functions in terms of these new functions. We also update the derivative classes of KeyValueDB in order to reflect these changes (i.e., LevelDBStore and KeyValueDBMemory). Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>	2012-07-24 02:30:14 +01:00
Sage Weil	6c0fa50944	doc: update information about stable vs development releases Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-23 17:39:12 -07:00
Josh Durgin	48bd839b1e	librbd: replace assign_bid with client id and random number The assign_bid method has issues with replay because it is a write that also returns data. This means that the replayed operation would return success, but no data, and cause a create to fail. Instead, let the client set the bid based on its global id and a random number. This only affects the creation of new images, since the bid is put into an opaque string as part of the object prefix. Keep the server side assign_bid around in case there are old clients still using it. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-07-23 17:16:01 -07:00
Sage Weil	67832c34a2	osd: fix ACK ordering on resent ops The wait_for_ondisk handling fixed COMMIT ordering, but the ACKs need to go back in the same order too. For example: - op A is queued - client disconnects, both ACK and COMMIT replies are lost - client reconnects - op A and B are sent - op A is queued - op B is applied, ACK is sent - op A and B COMMITs are sent -> client's ack callbacks will see B and then A. Fix this by creating a waiting_for_ack queue as well, and sending ACK responses as needed. Also handle the case where the ACK should be sent immediately when the retry event is received. Fixes: #2823 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Mike Ryan <mike.ryan@inktank.com>	2012-07-23 16:51:03 -07:00
Yehuda Sadeh	96dbc412df	rados::cls:🔒 move api types into namespace By popular demand, moved public api into namespace. This required some changes to ceph_dencoder to get some template annoyance working. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-23 16:01:32 -07:00
Sage Weil	d9bfe9547d	v0.49 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQDZfmAAoJEH6/3V0X7TFtra0P/iXVIF+hcSpjZZApNe90Pa21 ZrmC7Nu+0skrWtkfFyN1GuDsngDllZh+D7O6bUVozQVxKoz9bahsLDmlfwj1vi7N AyV1sWIGU1wBUmuYqXHOT3Kl7R3SuJjML4bDVi4YCb3HGERUo0O1PBnowSltoE5J Q0etTZWxuAjD5iOZTC2U5RIn0YOa0pCdrjHzPelkwrkJvNtvB9Voo4VFGKevMxUR RrDV85oBovj8XqTZsjO91vX5LFy0RG+Mb3sCoTk6A2T1gp3EOoMOAx2kNls5tgW1 JivrrPVddgI10u+6DnVBZOJPnhcO3yCVmwSPjUK0xPOQ0YyEjOMWovS/ZzD5Lr6K FQpmuwkPIQ2+XVMMmta9TByy+r7h3ddGc7BcNB7Tfy9/AtxhPRARKsXzCfMQn4mD kvLXViL5uLzR+ZmCU40LfHQSpWXzHyxVV60LKqg4yUp//LE9Q6HgStw2nNklHggi ihY2SDAQf8WYhbbBbxuANI4TdxLeK1iEKLzqZikqUBXkU2q6fP+tYVV8niGhGi7l QzmLZmotr0kAhutaMTRf74NrFoZqLbW5grf+5JHPQyB6Q0KhykSQ5KbCB6AOzQyG Aff5Vu1QVkbmE81DbxogHdpUdPn7t5L6qitKNAQCGu8LSIxFJomub5Z/9Z5J7/f0 ZNRyGNHs1c6qWkTk5kP0 =6eMd -----END PGP SIGNATURE----- Merge tag 'v0.49' v0.49	2012-07-23 12:43:19 -07:00
Sage Weil	ca6265d0f4	v0.49	2012-07-23 11:28:08 -07:00
Sage Weil	c8f1311988	mon: make 'ceph osd rm ...' wipe out all state bits, not just EXISTS This ensures that when a new osd reclaims that id it behaves as if it were really new. Backport: argonaut Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-23 10:47:10 -07:00
Sage Weil	5fcb22f03c	mkcephfs: add sync between btrfs scan and mount This appears to fix problems with mount failing for at least one user. Reported-by: Paul Pettigrew <Paul.Pettigrew@mach.com.au> Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-23 09:21:09 -07:00
Sage Weil	2d7e2cbf26	crush: fix name map encoding We screwed up and encoded using the name 'int' type instead of int32_t. That means people have systems encoding this as both 32 and 64 bit, depending on their architecture. This could be worse: x86_64 still has a 32-bit int (at least in my environment). In any case, mixing both word sizes in their clusters is broken as a result, with the exception of the kernel code, which doesn't decode this part of the map and will tolerate differently-sized servers. Fix this by: * encoding using int32_t now * decoding either 32-bit or 64-bit values, by assuming that the strings will always be non-empty. This appears to be the case. However: * any cluster with 64-bit ints must upgrade all at once, or else the new code will start encoding 32-bit values and the old code will be confused. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>	2012-07-21 09:15:06 -07:00
Sage Weil	b497bdacf5	osd/OpTracker: fix use-after-free And formatting. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-21 08:50:47 -07:00
Samuel Just	a6735ab009	OpRequest,OSD: track recent slow ops This should be helpful while investigating slow performance. OpRequests now track events with timestamp in addition to dumping them to the log. OpHistory keeps up to a configurable number of the slowest ops over a configurable recent time interval. The admin socket interface for the OSD now has a dump_historic_ops command which dumps the stored slow ops. Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-07-20 17:20:16 -07:00
Samuel Just	d624f3435f	Merge branch 'next'	2012-07-20 14:32:44 -07:00
Samuel Just	9e207aa881	test/store_test.cc: verify collection_list_partial results are sorted Synthetic test now also varies snapshots and uses a small variety of hashes. Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-07-20 13:59:25 -07:00
Yehuda Sadeh	49877cdeda	cls_lock: cls_lock_id_t -> cls_lock_locker_id_t Renamed type to make more sense. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-20 13:41:51 -07:00
Yehuda Sadeh	315bbea511	cls_lock: document lock properties Added some comments about different lock properties. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-20 13:28:19 -07:00
Yehuda Sadeh	056d42cf91	cls_log: update a comment Was missing output param description. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-20 13:16:05 -07:00
Yehuda Sadeh	2c7d782177	rados: lock info keeps expiration, not duration We pass duration in the request, but internally we keep the expiration. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-20 13:11:54 -07:00
Yehuda Sadeh	d16844c890	rados tool: add advisory lock control commands Can now lock, break lock, list locks and show lock info. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-20 13:00:43 -07:00
Yehuda Sadeh	2f8de8943e	cls_lock: objclass for advisory locking Providing an objclass to create and manipulate advisory locking. Also providing a client api to control it. A lock may either be exclusively locked or shared among multiple lockers. A locker is identified by the rados client name, and by a cookie-string. A lock may be assigned with a tag that every operation on that lock should use. A lock can be unlocked by the client that locked it, or may be broken by other clients. When a non-zero lock duration is assigned to a lock by a locker, that locker expires after that time duration. A lock may have a description. Locks on a specific object can be listed. Lockers of a specific lock can be enumerated (by get_info). Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-20 12:59:07 -07:00
Yehuda Sadeh	9c5c3edfcc	objclass: add api calls to get/set xattrs added the following functions: cls_cxx_getxattr cls_cxx_getxattrs cls_cxx_setxattr Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-20 12:55:55 -07:00
Samuel Just	adc9b91f37	os/HashIndex: use set<pair<string, hobject_t>> rather than multimap Multimap does not make any guarantees about ordering of different values with the same key. list_by_hash, however, assumes that the iterator order matches hobject_t order. Thus, we use set<pair<string, hobject_t> > to get the proper ordering. Backport: stable Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-07-20 12:29:03 -07:00
Sage Weil	0b84384fd4	mon: shut up about sessionless MPGStats messages If the mon gets a reset on the client connection, it clears the session on the connection. This is perfectly normal to see. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-19 22:14:11 -07:00
Sage Weil	6580450fbc	osd: clean up boot method names Prefix subsequent steps with _. Better names. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-19 21:27:40 -07:00
Sage Weil	369fbf6110	osd: defer boot if heartbeatmap indicates we are unhealthy If the OSD is bogged down or unresponsive, we should not try to join the cluster. This was observed on congress (slow/clogged op_tp combined with osdmap thrashing). Fixes: #2502 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-19 21:27:37 -07:00
Sage Weil	d76df212c8	Merge branch 'next' Conflicts: src/include/ceph_features.h	2012-07-19 20:22:35 -07:00
Sage Weil	dec936923f	osd/mon: subscribe (onetime) to pg creations on connect Ask the monitor for pending pg creations each time we connect. Normally, this is a freebie check. If there are pending creations, though, it ensures that the OSD finds out about them even if the original lame broadcast didn't reach it. Specifically: - osd is hunting for a monitor, but isn't yet connected - new pgs are created - send_pg_creates() sends out create messages, but osd does get it - osd finally connects to a mon Fixes: #2151 (tho the bug description is bad) Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>	2012-07-19 17:13:09 -07:00
Sage Weil	7f58b9beee	mon: track pg creations by osd Track the pending pg creations by osd, and use a helper to send out that messages. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-19 17:13:09 -07:00
Sage Weil	4c6c927b27	Revert "rbd: fix usage for snap commands" This reverts commit `42de6873f9`. Actually, these are fine! Dan made them all kinds of fancy.	2012-07-19 16:45:07 -07:00
Sage Weil	42de6873f9	rbd: fix usage for snap commands Snap commands take '--snap <snapname> <imagename>'. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-19 16:48:18 -07:00
Mike Ryan	58cd27fd29	doc: add missing dependencies to README Signed-off-by: Mike Ryan <mike.ryan@inktank.com>	2012-07-19 11:29:40 -07:00
Sage Weil	6f381affdc	add CRUSH_TUNABLES feature bit Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-18 19:49:58 -07:00
Samuel Just	e3349a2a3d	OSD::handle_osd_map: don't lock pgs while advancing maps We no longer do anything with the pgs here. PG map advancing is now handled in OSD::advance_pg asyncronously. Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-07-18 15:37:28 -07:00
Sage Weil	c8ee30160d	osd: add osd_debug_drop_pg_create_{probability,duration} options This will let us exercise more of the pg creation code. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-18 14:26:16 -07:00
Samuel Just	8f5562ffe6	OSD: write_if_dirty during get_or_create_pg after handle_create In the case that the pg is newly created, we will activate during that call, so the info and log will be dirty. Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-07-18 14:26:16 -07:00
Samuel Just	ca9f713004	OSD: actually send queries during handle_pg_create During the osd threading refactor, we lost the do_queries call in favor of dispatch_context. However, this did not include the queries triggered prior to pg instantiation. Instead, use the rctx to send the queries. Part of #2771. Without the queries being sent, can_create_pg will never become true. Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-07-18 14:26:16 -07:00
Josh Durgin	0d0b468914	Merge branch 'next'	2012-07-18 12:58:47 -07:00
Sage Weil	5dd68b95b1	objecter: always resend linger registrations If a linger op (watch) is sent to the OSD and updates the object, and then the client loses the reply, it will resend the request. The OSD will see that it is a dup, however, and not set up the in-memory session state for the watch. This in turn will break the watch (i.e., notifies won't get delivered). Instead, always resend linger registration ops, so that we always have a unique reqid and do the correct session registeration for each session. * track the tid of the registation op for each LingerOp * mark registrations ops as should_resend=false; cancel as needed * when we send a new registration op, cancel the old one to ensure we ignore the reply. This is needed becuase we resend linger ops on any pg change, not just a primary change. * drop the first_send arg to send_linger(), as we can now infer that from register_tid == 0. The bug was easily reproduced with ms inject socket failures = 500 and the test_stress_watch utility. Fixes: #2796 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-18 12:55:35 -07:00
Samuel Just	76efd9772c	OSD: publish_map in init to initialize OSDService map Other areas rely on OSDService::get_map() to function, possibly before activate_map is first called. In particular, with handle_osd_ping, not initializing the map member results in: ceph version 0.48argonaut-413-g90ddc5a (commit:90ddc5ae51627e7656459085d7e15105c8b8316d) 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x71ba9a] 2: (()+0xfcb0) [0x7fcd8243dcb0] 3: (OSD::handle_osd_ping(MOSDPing)+0x74d) [0x5dbdfd] 4: (OSD::heartbeat_dispatch(Message)+0x22b) [0x5dc70b] 5: (SimpleMessenger::DispatchQueue::entry()+0x92b) [0x7b5b3b] 6: (SimpleMessenger::dispatch_entry()+0x24) [0x7b6914] 7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7762fd] 8: (()+0x7e9a) [0x7fcd82435e9a] 9: (clone()+0x6d) [0x7fcd809ea4bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Signed-off-by: Samuel Just <sam.just@inktank.com>	2012-07-18 10:44:36 -07:00
Sage Weil	7586cde9de	qa/workunits/suites/pjd.sh: bash -x This will let us see what test is failing, exactly, and what its inputs were. Hoping to help find #2187. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-18 10:52:44 -07:00
Josh Durgin	675d630203	ObjectCacher: fix cache_bytes_hit accounting Misses are not hits! Signed-off-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-18 10:25:13 -07:00
John Wilkins	4e1d973e46	doc: Fixed heading text. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-07-18 07:35:35 -07:00
John Wilkins	ebc577361c	doc: favicon.ico should be new Ceph icon. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-07-18 07:35:00 -07:00
John Wilkins	3a377c44e1	doc: Overhauled Swift API documentation. Signed-off-by: John Wilkins <john.wilkins@inktank.com>	2012-07-17 21:28:59 -07:00

1 2 3 4 5 ...

20518 Commits