v0.15 /- clean up msgr protocol checks /- kclient: checkpatch fixes, cleanups. allow msg revoke (nice interface cleanup) /- monclient fixes; ceph detects monitor session drop /- msgr: protocol check cleanups; ack seq # fix; /- debian: radosgw package, fix header perms /- kclient: GET_DATALOC ioctl /- kclient: osdc bug fix /- kclient: clean up debugfs layout v0.16 - kclient: fix msgr bug (out_qlen thing) - kclient cleanup: uninline strings, use pr_fmt, prefix frag_ macros - kclient: xattr cleanups - kclient: fix invalidate recursion bug - libceph: identify self - hadoop: set primary replica on self - kclient: akpm review fixups - uninline frags - uninline string hash - document data structures - audit all inline in kclient - ceph_buffer and vmalloc? - ceph_i_test smp_mb instead of spinlock - bit ops in messenger - name args in ceph_osd_op union - disk format, wire protocol changes - use sockaddr_storage; some ipv6 groundwork v0.16.1 - mds: put migration vectors in mdsmap - rgw: fix - include buffer.c in kernel package, tarball v0.17 - kclient: fix multiple mds mdsmap decoding - kclient: fix mon subscription renewal - crush: fix crush map creation with empty buckets (occurs on larger clusters) - osdmap: fix encoding bug (crashes kclient); make kclient not crash - msgr: simplified policy, failure model - mon: less push, more pull - mon: request routing - mon cluster expansion - osd: fix pg parsing, restarts on larger clusters v0.18 - osd: basic ENOSPC handling - big endian fixes (required protocol/disk format change) - osd: improved object -> pg hash function; selectable - crush: selectable hash function(s) - mds restart bug fixes - kclient: mds reconnect bug fixes - fixed mds log trimming bug - fixed mds cap vs snap deadlock - filestore: faster flushing - uclient,kclient: snapshot fixes - mds: fix recursive accounting bug - uclient: fixes for 32bit clients - auth: 'none' security framework - mon: "safely" bail on write errors (e.g. ENOSPC) - mds: fix replay/reconnect race (caused (fast) client reconnect to fail) - mds: misc journal replay, session fixes v0.19 - ms_dispatch fairness - qa: snap test. maybe walk through 2.6.* kernel trees? - osd: rebuild pg log - osd: handle storage errors - rebuild mds hierarchy - kclient: msgs built with a page list - kclient: retry alloc on ENOMEM when reading from connection? pending wire, disk format changes - add v to PGMap, PGMap::Incremental bugs - osd pg split breaks if not all osds are up... - mds memory leak - mislinked directory? (cpusr.sh, mv /c/* /c/t, more cpusr, ls /c/t) - premature filejournal trimming? - weird osd_lock contention during osd restart? - kclient: after reconnect, cp: writing `/c/ceph2.2/bin/gs-gpl': Bad file descriptor - need to somehow wake up unreconnected caps? hrm!! - kclient: socket creation - snaprealm thing ceph3:~# find /c /c /c/.ceph /c/.ceph/mds0 /c/.ceph/mds0/journal /c/.ceph/mds0/stray [68663.397407] ceph: ceph_add_cap: couldn't find snap realm 10000491bb5 ... ceph3:/c# [68724.067160] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 [68724.071069] IP: [] __send_cap+0x237/0x585 [ceph] [68724.078917] PGD f7a12067 PUD f688c067 PMD 0 [68724.082907] Oops: 0000 [#1] PREEMPT SMP [68724.082907] last sysfs file: /sys/class/net/lo/operstate [68724.082907] CPU 1 [68724.082907] Modules linked in: ceph fan ac battery psmouse ehci_hcd ohci_hcd ide_pci_generic thermal processor button [68724.082907] Pid: 10, comm: events/1 Not tainted 2.6.32-rc2 #1 H8SSL [68724.082907] RIP: 0010:[] [] __send_cap+0x237/0x585 [ceph] [68724.114907] RSP: 0018:ffff8800f96e3a50 EFLAGS: 00010202 [68724.114907] RAX: 0000000000000000 RBX: 0000000000000354 RCX: 0000000000000000 [68724.114907] RDX: 0000000000000000 RSI: ffff8800f76e8ba8 RDI: ffff8800f581a508 [68724.114907] RBP: ffff8800f96e3bb0 R08: 0000000000000000 R09: 0000000000000001 [68724.114907] R10: ffff8800cea922b8 R11: ffffffffa0082982 R12: 0000000000000001 [68724.114907] R13: 0000000000000000 R14: ffff8800cea95378 R15: 0000000000000000 [68724.114907] FS: 00007f54be9a06e0(0000) GS:ffff880009200000(0000) knlGS:0000000000000000 [68724.114907] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [68724.114907] CR2: 0000000000000088 CR3: 00000000f7118000 CR4: 00000000000006e0 [68724.178904] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [68724.178904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [68724.178904] Process events/1 (pid: 10, threadinfo ffff8800f96e2000, task ffff8800f96e02c0) [68724.178904] Stack: [68724.178904] ffff8800f96e0980 ffff8800f96e02c0 ffff8800f96e3a80 ffffffff8106a3b9 [68724.178904] <0> ffff8800f96e3a80 0000000000000003 00006589ac4ca260 0000000000000004 [68724.178904] <0> 0cb13589944c0262 0000000000000000 ffff8800f96e3b30 ffffffff81ca7c80 [68724.178904] Call Trace: [68724.178904] [] ? get_lock_stats+0x19/0x4c [68724.178904] [] ? mark_held_locks+0x4d/0x6b [68724.178904] [] ceph_check_caps+0x740/0xa70 [ceph] [68724.178904] [] ? get_lock_stats+0x19/0x4c [68724.178904] [] ? put_lock_stats+0xe/0x27 [68724.178904] [] ceph_check_delayed_caps+0xcb/0x14a [ceph] [68724.178904] [] delayed_work+0x3f/0x368 [ceph] [68724.178904] [] ? worker_thread+0x229/0x398 [68724.178904] [] worker_thread+0x283/0x398 [68724.178904] [] ? worker_thread+0x229/0x398 [68724.178904] [] ? delayed_work+0x0/0x368 [ceph] [68724.178904] [] ? preempt_schedule+0x3e/0x4b [68724.306901] [] ? autoremove_ceph3:/c# [68724.067160] greg - osd: error handling - uclient: readdir from cache - mds: basic auth checks later - document on-wire protocol - authentication - client reconnect after long eviction; and slow delayed reconnect - repair - mds security enforcement - client, user authentication - cas - osd failure declarations - rename over old files should flush data, or revert back to old contents rados - make rest interface superset of s3? - create/delete snapshots - list, access snapped version - perl swig wrapper - 'rados call foo.bar'? - merge pgs - destroy pg_pools - autosize pg_pools? - security repair - namespace reconstruction tool - repair pg (rebuild log) (online or offline? ./cosd --repair_pg 1.ef?) - repair file ioctl? - are we concerned about - scrubbing - reconstruction after loss of subset of cdirs - reconstruction after loss of md log - data object - path backpointers? - parent dir pointer? - mds scrubbing kclient - ENOMEM - message pools - sockets? (this can actual generates a lockdep warning :/) - use page lists for large messages? e.g. reconnect - fs-portable file layout virtual xattr (see Andreas' -fsdevel thread) - statlite - audit/combine/rework/whatever invalidate, writeback threads and associated invariants - add cap to release if we get fouled up in fill_inode et al? - make caps reservations per-client - fix up ESTALE handling - don't retry on ENOMEM on non-nofail requests in kick_requests - make cap import/export more efficient? - flock, fnctl locks - ACLs - init security xattrs - should we try to ref CAP_PIN on special inodes that are open? - fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it - inotify for updates from other clients? vfs issues - real_lookup() race: 1- hash lookup find no dentry 2- real_lookup() takes dir i_mutex, but then finds a dentry 3- drops mutex, then calld d_revalidate. if that fails, we return ENOENT (instead of looping?) - vfs_rename_dir() - a getattr mask would be really nice filestore - make min sync interval self-tuning (ala xfs, ext3?) - get file csum? btrfs - clone compressed inline extents - ioctl to pull out data csum? osd - gracefully handle ENOSPC - gracefully handle EIO? - client session object - track client's osdmap; and only share latest osdmap with them once! - what to do with lost objects.. continue peering? - segregate backlog from log ondisk? - preserve pg logs on disk for longer period - make scrub interruptible - optionally separate osd interfaces (ips) for clients and osds (replication, peering, etc.) - pg repair - pg split should be a work queue - optimize remove wrt recovery pushes? uclient - fix client_lock vs other mutex with C_SafeCond - clean up check_caps to more closely mirror kclient logic - readdir from cache - fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it - hadoop: clean up assert usage mds - don't sync log on every clientreplay request? - pass issued, wanted into eval(lock) when eval() already has it? (and otherwise optimize eval paths..) - add an up:shadow mode? - tail the mds log as it is written - periodically check head so that we trim, too - handle slow client reconnect (i.e. after mds has gone active) - anchor_destroy needs to xlock linklock.. which means it needs a Mutation wrapper? - ... when it gets a caller.. someday.. - add FILE_CAP_EXTEND capability bit - dir fragment - maybe just take dftlock for now, to keep it simple. - dir merge - snap - hard link backpointers - anchor source dir - build snaprealm for any hardlinked file - include snaps for all (primary+remote) parents - how do we properly clean up inodes when doing a snap purge? - when they are mid-recover? see 136470cf7ca876febf68a2b0610fa3bb77ad3532 - what if a recovery is queued, or in progress, and the inode is then cowed? can that happen? - proper handling of cache expire messages during rejoin phase? -> i think cache expires are fine; the rejoin_ack handler just has to behave if rejoining items go missing - clustered - on replay, but dirty scatter replicas on lists so that they get flushed? or does rejoin handle that? - linkage vs cdentry replicas and remote rename.... - rename: importing inode... also journal imported client map? mon - don't allow lpg_num expansion and osd addition at the same time? - how to shrink cluster? - how to tell osd to cleanly shut down - mds injectargs N should take mds# or id. * should bcast to standy mds's. - paxos need to clean up old states. - default: simple max of (state count, min age), so that we have at least N hours of history, say? - osd map: trim only old maps < oldest "in" osd up_from osdmon - monitor needs to monitor some osds... pgmon /- include osd vector with pg state - check for orphan pgs - monitor pg states, notify on out? - watch osd utilization; adjust overload in cluster map crush - allow forcefeed for more complicated rule structures. (e.g. make force_stack a list< set >) simplemessenger - close idle connections? objectcacher - read locks? - maintain more explicit inode grouping instead of wonky hashes cas - chunking. see TTTD in ESHGHI, K. A framework for analyzing and improving content-based chunking algorithms. Tech. Rep. HPL-2005-30(R.1), Hewlett Packard Laboratories, Palo Alto, 2005. radosgw - handle gracefully location related requests - logging control (?) - parse date/time better - upload using post - torrent - handle gracefully PUT/GET requestPayment