v0.7 /- smart osd sync /- osd bug fixes /- fast truncate - start/stop script v0.8 - fully async file creation - ENOSPC - cas? big items - finish client failure recovery (reconnect after long eviction; and slow delayed reconnect) - ENOSPC - space reservation in ObjectStore, redeemed by Transactions? - reserved as PG goes active; reservation canceled when pg goes inactive - something similar during recovery - ? - repair - enforceable quotas? - mds security enforcement - client, user authentication - cas - osd failure declarations repair - are we concerned about - scrubbing - reconstruction after loss of subset of cdirs - reconstruction after loss of md log - data object - path backpointers? - parent dir pointer? - cdir objects - parent dir pointer - update on rename? or on cdir store? on cdir store is sufficient if mdlog survives... - or what the hell, full trace? - mds scrubbing timer - each SafeTimer should just be its own thread. kernel client - flock, fnctl locks - async xattrs - avoid pinning inodes with expireable caps? - avoid flushing tcp socket when sending client_lease release messages (when the request is about to follow) - make osd retry writes if failure after ack.. - ACLs - reconnect path should include pathbase, not just a string? - make writepages maybe skip pages with errors? - EIO, or ENOSPC? - ... writeback vs ENOSPC vs flush vs close()... hrm... - set mapping bits for ENOSPC, EIO? - flush caps on sync, fsync, etc. - do we need to block? how do we track that? - procfs/debugfs - adjust granular debug levels too - should we be using debugfs? - a dir for each client instance (client###)? - hooks to get mds, osd, monmap epoch #s - populate sysfs? - things that would be useful to see - fsid - map versions on client - outstanding mds, osd, mon requests? - fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it - reconnect after being disconnected from the mds vfs issues - real_lookup() race: 1- hash lookup find no dentry 2- real_lookup() takes dir i_mutex, but then finds a dentry 3- drops mutex, then calld d_revalidate. if that fails, we return ENOENT (instead of looping?) - vfs_rename_dir() - a getattr mask would be really nice filestore - get file csum? btrfs - clone compressed inline extents - ioctl to pull out data csum? userspace client - handle session STALE - time out caps, wake up waiters on renewal - link caps with mds session - validate dn leases - fix lease validation to check session ttl - clean up ll_ interface, now that we have leases! - clean up client mds session vs mdsmap behavior? - stop using mds's inode_t? - fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it mds - linkage vs cdentry replicas and remote rename.... - move root inode into stray dir - make recovery work with early replies - purge each session's unused preallocated inodes - dftlock is missing from rejoin phase - file size recovery gives (wrong) 4MB-increment results? - hard link backpointers - anchor source dir - build snaprealm for any hardlinked file - include snaps for all (primary+remote) parents - how do we properly clean up inodes when doing a snap purge? - when they are mid-recover? see 136470cf7ca876febf68a2b0610fa3bb77ad3532 - what if a recovery is queued, or in progress, and the inode is then cowed? can that happen? - proper handling of cache expire messages during rejoin phase? -> i think cache expires are fine; the rejoin_ack handler just has to behave if rejoining items go missing - add an up:shadow mode? - tail the mds log as it is written - periodically check head so that we trim, too - rename: importing inode... also journal imported client map? - rerun destro trace against latest, with various journal lengths - cap/lease length heuristics - mds lock last_change stamp? - handle slow client reconnect (i.e. after mds has gone active) - fix reconnect/rejoin open file weirdness - anchor_destroy needs to xlock linklock.. which means it needs a Mutation wrapper? - ... when it gets a caller.. someday.. - FIXME how to journal/store root and stray inode content? - in particular, i care about dirfragtree.. get it on rejoin? - and dir sizes, if i add that... also on rejoin? - add FILE_CAP_EXTEND capability bit journaler - fix up for large events (e.g. imports) - use set_floor_and_read for safe takeover from possibly-not-quite-dead otherguy. - should we pad with zeros to avoid splitting individual entries? - make it a g_conf flag? - have to fix reader to skip over zeros (either <4 bytes for size, or zeroed sizes) - need to truncate at detected (valid) write_pos to clear out any other partial trailing writes mon - paxos need to clean up old states. - default: simple max of (state count, min age), so that we have at least N hours of history, say? - osd map: trim only old maps < oldest "in" osd up_from osdmon - monitor needs to monitor some osds... pgmon /- include osd vector with pg state - check for orphan pgs - monitor pg states, notify on out? - watch osd utilization; adjust overload in cluster map crush - allow forcefeed for more complicated rule structures. (e.g. make force_stack a list< set >) osd - pg split should be a work queue - pg split needs to fix up pg stats. this is tricky with the clone overlap business... - generalize ack semantics? or just change ack from memory to journal? memory/journal/disk... - rdlocks - optimize remove wrt recovery pushes simplemessenger - close idle connections? objectcacher - read locks? - maintain more explicit inode grouping instead of wonky hashes