v0.5 - debug restart, cosd reformat, etc. - finish btrfs ioctl interface - efficient snap recovery - throttle osd recovery - forced unmount? - ENOSPC? v0.6 - cas? big items - finish client failure recovery (reconnect after long eviction; and slow delayed reconnect) - ENOSPC - space reservation in ObjectStore, redeemed by Transactions? - reserved as PG goes active; reservation canceled when pg goes inactive - something similar during recovery - ? - repair - enforceable quotas? - mds security enforcement - client, user authentication - cas - osd failure declarations - libuuid? repair - are we concerned about - scrubbing - reconstruction after loss of subset of cdirs - reconstruction after loss of md log - data object - path backpointers? - parent dir pointer? - cdir objects - parent dir pointer - update on rename? or on cdir store? on cdir store is sufficient if mdlog survives... - or what the hell, full trace? - mds scrubbing - rados scrubbing snaps on osd - garbage collection - don't start collection on replica until clean? - efficient recovery of clones using the clone diff info kernel client - fix osd client timeout - make osd retry writes if failure after ack.. - clean up cap flush on session close - ACLs - reconnect path should include pathbase, not just a string? - make writepages maybe skip pages with errors? - EIO, or ENOSPC? - ... writeback vs ENOSPC vs flush vs close()... hrm... - set mapping bits for ENOSPC, EIO? - flush caps on sync, fsync, etc. - do we need to block? how do we track that? - forced unmount? - procfs/debugfs - adjust granular debug levels too - should we be using debugfs? - a dir for each client instance (client###)? - hooks to get mds, osd, monmap epoch #s - populate sysfs? - things that would be useful to see - fsid - map versions on client - outstanding mds, osd, mon requests? - fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it - reconnect after being disconnected from the mds kclient items to review - fill_trace locking - async trunc - async writeback - cache invalidation race, locking problems - cap changes are serialized by i_lock, but (thorough) cache invalidation may block.. vfs issues - real_lookup() race: 1- hash lookup find no dentry 2- real_lookup() takes dir i_mutex, but then finds a dentry 3- drops mutex, then calld d_revalidate. if that fails, we return ENOENT (instead of looping?) - vfs_rename_dir() userspace client - handle session STALE - time out caps, wake up waiters on renewal - link caps with mds session - validate dn leases - fix lease validation to check session ttl - clean up ll_ interface, now that we have leases! - clean up client mds session vs mdsmap behavior? - stop using mds's inode_t? - fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it mds - hard link backpointers - anchor source dir - build snaprealm for any hardlinked file - include snaps for all (primary+remote) parents - whats with the 'clear if dirtyscattered' bit in decode_import_inode()? - what if a recovery is queued, or in progress, and the inode is then cowed? can that happen? - proper handling of cache expire messages during rejoin phase? -> i think cache expires are fine; the rejoin_ack handler just has to behave if rejoining items go missing - try_remove_unlinked_dn thing - rename: importing inode... also journal imported client map? - rerun destro trace against latest, with various journal lengths - lease length heuristics - mds lock last_change stamp? - handle slow client reconnect (i.e. after mds has gone active) - fix reconnect/rejoin open file weirdness - can we get rid of the dirlock remote auth_pin weirdness on subtree roots? - anchor_destroy needs to xlock linklock.. which means it needs a Mutation wrapper? - ... when it gets a caller.. someday.. - make truncate faster with a trunc_seq, attached to objects as attributes? - osd needs a set_floor_and_read op for safe failover/STOGITH-like semantics. - could mark dir complete in EMetaBlob by counting how many dentries are dirtied in the current log epoch in CDir... - FIXME how to journal/store root and stray inode content? - in particular, i care about dirfragtree.. get it on rejoin? - and dir sizes, if i add that... also on rejoin? - efficient stat for single writers - add FILE_CAP_EXTEND capability bit journaler - fix up for large events (e.g. imports) - use set_floor_and_read for safe takeover from possibly-not-quite-dead otherguy. - should we pad with zeros to avoid splitting individual entries? - make it a g_conf flag? - have to fix reader to skip over zeros (either <4 bytes for size, or zeroed sizes) - need to truncate at detected (valid) write_pos to clear out any other partial trailing writes osdmon - monitor needs to monitor some osds... crush - more efficient failure when all/too many osds are down - allow forcefeed for more complicated rule structures. (e.g. make force_stack a list< set >) - "knob" bucket pgmon - include osd vector with pg state - check for orphan pgs - monitor pg states, notify on out? - watch osd utilization; adjust overload in cluster map mon - paxos need to clean up old states. - some sort of tester for PaxosService... - osdmon needs to lower-bound old osdmap versions it keeps around? objecter - fix failure handler... - generic mon client? - maybe_request_map should set a timer event to periodically re-request. - transaction prepare/commit? - read+floor_lockout osd - snap_trimmers should detect, remove unused snap collections (and update snap_collections set) - how does an admin intervene when a pg needs a dead osd to repeer? - a more general fencing mechanism? per-object granularity isn't usually a good match. - consider implications of nvram writeahead logs - flag missing log entries on crash recovery --> WRNOOP? or WRLOST? - efficiently replicate clone() objects - fix heartbeat wrt new replication - mark residual pgs obsolete ??? - rdlocks - optimize remove wrt recovery pushes simplemessenger - close idle connections objectcacher - merge clean bh's - ocacher caps transitions vs locks - test read locks ebofs - btrees - checksums - dups - sets - optionally scrub deallocated extents - clone() - map ObjectStore - verify proper behavior of conflicting/overlapping reads of clones - combine inodes and/or cnodes into same blocks - fix bug in node rotation on insert (and reenable) - fix NEAR_LAST_FWD (?) - awareness of underlying software/hardware raid in allocator so that we write full stripes _only_. - hmm, that's basically just a large block size. - rewrite the btree code! - multithreaded - eliminate nodepools - allow btree sets - allow arbitrary embedded data? - allow arbitrary btrees - allow root node(s?) to be embedded in onode, or whereever. - keys and values can be uniform (fixed-size) or non-uniform. - fixed size (if any) is a value in the btree struct. - negative indicates bytes of length value? (1 -> 255bytes, 2 -> 65535 bytes, etc.?) - non-uniform records preceeded by length. - keys sorted via a comparator defined in btree root. - lexicographically, by default. - goal - object btree key->value payload, not just a data blob payload. - better threading behavior. - with transactional goodness! - onode - object attributes.. as a btree? - blob stream - map stream. - allow blob values. remaining hard problems - how to cope with file size changes and read/write sharing