v0.6 /- fold observer into cmonctl/ceph? /- osd scrub /- async metadata v0.7 /- smart osd sync /- osd bug fixes /- fast truncate /- updated debian package /- improved start/stop scripts /- proc -> sysfs cleanup v0.7.1 /- O_DIRECT /- dentry lease renewal v0.7.2 /- make kclient handle osd ack + sync properly /- make fill_trace handle short traces; return short traces from mds /- rdcap renewal /- some mds clustering fixes /- sysfs -> debugfs /- queue delayed check upon receipt of unwanted, un-EXPIREABLE caps /- init/mkfs fixes /- warn on cmon startup if monmap doesn't match .conf /- make kclient timeouts tunable v0.8 /- clean up path_traverse interface, esp the usages in Server.cc /- piggyback lease/cap release on client_request? /- store metadata format version on disk... /- async xattr /- dcache readdir /- confutils memory leaks /- osd re-up attempt when marked down v0.9 - make mds exhert memory pressure on client caps, leases - optionally separate osd interfaces (ips) for clients and osds (replication, peering, etc.) later - client reconnect after long eviction; and slow delayed reconnect - ENOSPC - space reservation in ObjectStore, redeemed by Transactions? - reserved as PG goes active; reservation canceled when pg goes inactive - something similar during recovery - ? - repair - mds security enforcement - client, user authentication - cas - osd failure declarations - rename over old files should flush data, or revert back to old contents repair - are we concerned about - scrubbing - reconstruction after loss of subset of cdirs - reconstruction after loss of md log - data object - path backpointers? - parent dir pointer? - cdir objects - parent dir pointer - update on rename? or on cdir store? on cdir store is sufficient if mdlog survives... - or what the hell, full trace? - mds scrubbing kernel client - osd client needs to recalculate layout if osdmap changes (pg_num etc may change) - fix up mds selection, and ESTALE handling - make cap import/export efficient - simplify mds auth tracking? - use caps instead? - unwind writeback start error in addr.c (see fixme)... by redirtying pages? - inotify for updates from other clients? - optional or no fill_trace? - flock, fnctl locks - async xattrs - avoid pinning inodes with expireable caps? - ACLs - make writepages maybe skip pages with errors? - EIO, or ENOSPC? - ... writeback vs ENOSPC vs flush vs close()... hrm... - set mapping bits for ENOSPC, EIO? - fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it - reconnect after being disconnected from the mds - should we try to ref CAP_PIN on special inodes that are open? vfs issues - real_lookup() race: 1- hash lookup find no dentry 2- real_lookup() takes dir i_mutex, but then finds a dentry 3- drops mutex, then calld d_revalidate. if that fails, we return ENOENT (instead of looping?) - vfs_rename_dir() - a getattr mask would be really nice filestore - make min sync interval self-tuning (ala xfs, ext3?) - get file csum? btrfs - clone compressed inline extents - ioctl to pull out data csum? userspace client - handle session STALE - time out caps, wake up waiters on renewal - link caps with mds session - validate dn leases - fix lease validation to check session ttl - clean up ll_ interface, now that we have leases! - clean up client mds session vs mdsmap behavior? - stop using mds's inode_t? - fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it mds - fix up *_RDCACHE vs FILE_RDCACHE semantics - ability to read attribute value is distinct from being able to hold cached pages? - file recovery maybe needs to scan entire file range for a truncation event? (and object attr maybe needs file offset, not object offset, or original truncation?) - on replay, but dirty scatter replicas on lists so that they get flushed? or does rejoin handle that? - take some care with replayed client requests vs new requests - linkage vs cdentry replicas and remote rename.... - make recovery work with early replies - purge each session's unused preallocated inodes - file size recovery gives (wrong) 4MB-increment results? - hard link backpointers - anchor source dir - build snaprealm for any hardlinked file - include snaps for all (primary+remote) parents - how do we properly clean up inodes when doing a snap purge? - when they are mid-recover? see 136470cf7ca876febf68a2b0610fa3bb77ad3532 - what if a recovery is queued, or in progress, and the inode is then cowed? can that happen? - proper handling of cache expire messages during rejoin phase? -> i think cache expires are fine; the rejoin_ack handler just has to behave if rejoining items go missing - add an up:shadow mode? - tail the mds log as it is written - periodically check head so that we trim, too - rename: importing inode... also journal imported client map? - rerun destro trace against latest, with various journal lengths - cap/lease length heuristics - mds lock last_change stamp? - handle slow client reconnect (i.e. after mds has gone active) - fix reconnect/rejoin open file weirdness - anchor_destroy needs to xlock linklock.. which means it needs a Mutation wrapper? - ... when it gets a caller.. someday.. - FIXME how to journal/store root and stray inode content? - in particular, i care about dirfragtree.. get it on rejoin? - and dir sizes, if i add that... also on rejoin? - add FILE_CAP_EXTEND capability bit - return extra inode(s) in reply (namely, unlink)? journaler - fix up for large events (e.g. imports) - should we pad with zeros to avoid splitting individual entries? - make it a g_conf flag? - have to fix reader to skip over zeros (either <4 bytes for size, or zeroed sizes) - need to truncate at detected (valid) write_pos to clear out any other partial trailing writes mon - paxos need to clean up old states. - default: simple max of (state count, min age), so that we have at least N hours of history, say? - osd map: trim only old maps < oldest "in" osd up_from osdmon - monitor needs to monitor some osds... pgmon /- include osd vector with pg state - check for orphan pgs - monitor pg states, notify on out? - watch osd utilization; adjust overload in cluster map crush - allow forcefeed for more complicated rule structures. (e.g. make force_stack a list< set >) osd - pg split should be a work queue - pg split needs to fix up pg stats. this is tricky with the clone overlap business... - generalize ack semantics? or just change ack from memory to journal? memory/journal/disk... - rdlocks - optimize remove wrt recovery pushes simplemessenger - close idle connections? objectcacher - read locks? - maintain more explicit inode grouping instead of wonky hashes cas - chunking. see TTTD in ESHGHI, K. A framework for analyzing and improving content-based chunking algorithms. Tech. Rep. HPL-2005-30(R.1), Hewlett Packard Laboratories, Palo Alto, 2005.