ceph/src/TODO

v0.6
- ENOSPC
- async metadata ops

v0.7
- cas?

big items
- finish client failure recovery (reconnect after long eviction; and slow delayed reconnect)
- ENOSPC
  - space reservation in ObjectStore, redeemed by Transactions?
  - reserved as PG goes active; reservation canceled when pg goes inactive
  - something similar during recovery
  - ?
- repair
- enforceable quotas?
- mds security enforcement
- client, user authentication
- cas
- osd failure declarations
- libuuid?


repair
- are we concerned about
  - scrubbing
  - reconstruction after loss of subset of cdirs
  - reconstruction after loss of md log
- data object 
  - path backpointers?
  - parent dir pointer?
- cdir objects
  - parent dir pointer
    - update on rename?  or on cdir store?
      on cdir store is sufficient if mdlog survives...
  - or what the hell, full trace?
- mds scrubbing
/- rados scrubbing


timer
- each SafeTimer should just be its own thread.


kernel client
- make osd retry writes if failure after ack..
- ACLs
- reconnect path should include pathbase, not just a string?
- make writepages maybe skip pages with errors?
  - EIO, or ENOSPC?
  - ... writeback vs ENOSPC vs flush vs close()... hrm...
- set mapping bits for ENOSPC, EIO?
- flush caps on sync, fsync, etc.
  - do we need to block?  how do we track that?
- procfs/debugfs
  - adjust granular debug levels too
    - should we be using debugfs?
  - a dir for each client instance (client###)?
  - hooks to get mds, osd, monmap epoch #s
- populate sysfs?
  - things that would be useful to see
    - fsid
    - map versions on client
    - outstanding mds, osd, mon requests?
- fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it
- reconnect after being disconnected from the mds

kclient items to review
- fill_trace locking
- async trunc
- async writeback
- cache invalidation race, locking problems
  - cap changes are serialized by i_lock, but (thorough) cache invalidation may block..

vfs issues
- real_lookup() race:
  1- hash lookup find no dentry
  2- real_lookup() takes dir i_mutex, but then finds a dentry
  3- drops mutex, then calld d_revalidate.  if that fails, we return ENOENT (instead of looping?)
- vfs_rename_dir()

filestore
- sort object lists by ino
- get file csum?

btrfs
- inode_lock vs tree->lock lockdep warning
- clone compressed inline extents
- ioctl to pull out data csum?


userspace client
- handle session STALE
- time out caps, wake up waiters on renewal
  - link caps with mds session
- validate dn leases
- fix lease validation to check session ttl
- clean up ll_ interface, now that we have leases!
- clean up client mds session vs mdsmap behavior?
- stop using mds's inode_t?
- fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it

mds
- file size recovery doesn't update recursive stats?
- file size recovery gives (wrong) 4MB-increment results?
- hard link backpointers
  - anchor source dir
  - build snaprealm for any hardlinked file
  - include snaps for all (primary+remote) parents
- how do we properly clean up inodes when doing a snap purge?
  - when they are mid-recover?  see 136470cf7ca876febf68a2b0610fa3bb77ad3532
- whats with the 'clear if dirtyscattered' bit in decode_import_inode()?
- what if a recovery is queued, or in progress, and the inode is then cowed?  can that happen?  
- proper handling of cache expire messages during rejoin phase?
  -> i think cache expires are fine; the rejoin_ack handler just has to behave if rejoining items go missing
- try_remove_unlinked_dn thing
- rename: importing inode... also journal imported client map?
- rerun destro trace against latest, with various journal lengths
- lease length heuristics
  - mds lock last_change stamp?
- handle slow client reconnect (i.e. after mds has gone active)
- fix reconnect/rejoin open file weirdness
- can we get rid of the dirlock remote auth_pin weirdness on subtree roots?
- anchor_destroy needs to xlock linklock.. which means it needs a Mutation wrapper?
  - ... when it gets a caller.. someday..
- make truncate faster with a trunc_seq, attached to objects as attributes?
- osd needs a set_floor_and_read op for safe failover/STOGITH-like semantics.
- could mark dir complete in EMetaBlob by counting how many dentries are dirtied in the current log epoch in CDir...
- FIXME how to journal/store root and stray inode content? 
  - in particular, i care about dirfragtree.. get it on rejoin?
  - and dir sizes, if i add that... also on rejoin?
- add FILE_CAP_EXTEND capability bit


journaler
- fix up for large events (e.g. imports)
- use set_floor_and_read for safe takeover from possibly-not-quite-dead otherguy.
- should we pad with zeros to avoid splitting individual entries?
  - make it a g_conf flag?
  - have to fix reader to skip over zeros (either <4 bytes for size, or zeroed sizes)
- need to truncate at detected (valid) write_pos to clear out any other partial trailing writes


mon
- paxos need to clean up old states.
  - default: simple max of (state count, min age), so that we have at least N hours of history, say?
  - osd map: trim only old maps < oldest "in" osd up_from
- blacklist failed mds's

osdmon
- monitor needs to monitor some osds...

pgmon
- include osd vector with pg state
  - check for orphan pgs
- monitor pg states, notify on out?
- watch osd utilization; adjust overload in cluster map

crush
- allow forcefeed for more complicated rule structures.  (e.g. make force_stack a list< set<int> >)

osd
- pg split should be a work queue
- pg split needs to fix up pg stats.  this is tricky with the clone overlap business...
- generalize ack semantics?  or just change ack from memory to journal?  memory/journal/disk...
- rdlocks
- optimize remove wrt recovery pushes

simplemessenger
- close idle connections?

objectcacher
- read locks?
- maintain more explicit inode grouping instead of wonky hashes
todos 2008-10-08 23:45:31 +00:00			`v0.6`
todos 2008-11-06 18:56:51 +00:00			`- ENOSPC`
			`- async metadata ops`

			`v0.7`
todos 2008-10-08 23:45:31 +00:00			`- cas?`
todos 2008-06-19 03:59:50 +00:00
todos 2008-05-10 23:31:14 +00:00			`big items`
todos 2008-10-08 23:45:31 +00:00			`- finish client failure recovery (reconnect after long eviction; and slow delayed reconnect)`
todos 2008-06-16 22:52:07 +00:00			`- ENOSPC`
msgr: include priority in msg header, make dispatch a priority queue Generalizes previous hack that put messages from the monitor at the front of the dispatch queue. Monitor now just sets a (non-default) default_send_priority of CEPH_MSG_PRIO_HIGH. That value is used only if the message priority isn't set explicitly by set_priority() before being queued for send. 2008-10-08 17:49:12 +00:00			`- space reservation in ObjectStore, redeemed by Transactions?`
			`- reserved as PG goes active; reservation canceled when pg goes inactive`
			`- something similar during recovery`
			`- ?`
osd: push/pull data_subset, clone_subsets 2008-09-15 23:01:30 +00:00			`- repair`
todos 2008-07-25 23:23:00 +00:00			`- enforceable quotas?`
todos 2008-05-10 23:31:14 +00:00			`- mds security enforcement`
			`- client, user authentication`
			`- cas`
todos 2008-07-25 23:23:00 +00:00			`- osd failure declarations`
			`- libuuid?`


todos 2008-09-18 22:39:54 +00:00			`repair`
			`- are we concerned about`
			`- scrubbing`
			`- reconstruction after loss of subset of cdirs`
			`- reconstruction after loss of md log`
			`- data object`
			`- path backpointers?`
			`- parent dir pointer?`
			`- cdir objects`
			`- parent dir pointer`
			`- update on rename? or on cdir store?`
			`on cdir store is sufficient if mdlog survives...`
			`- or what the hell, full trace?`
			`- mds scrubbing`
todos 2008-12-10 00:47:40 +00:00			`/- rados scrubbing`
todos 2008-07-25 23:23:00 +00:00
kclient: changed per-ci delayed work cancellation 2008-04-18 15:10:33 +00:00
osd: send and process heartbeats in separate thread, channel Use a separate dispatch thread to process heartbeats. Use a separate thread to send them. This ensures something slow (e.g. a map update) does not make an osd appear to be down. This also means a spearate entity_addr for heartbeats, which puts them over a separate TCP stream. 2008-12-01 19:03:52 +00:00			`timer`
			`- each SafeTimer should just be its own thread.`


osdmap cleanup; osd failure detection cleanup git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@2096 29311d96-e01e-0410-9327-a35deaab8ce9 2007-11-20 18:19:11 +00:00			`kernel client`
kclient: addr.c comments, cleanup I'm leaving the ability to redirty a page within a given snapc in ceph_set_page_dirty (even though it isn't used) because I'm not sure about the failure paths.. I suspect we will need it later. 2008-10-16 23:34:44 +00:00			`- make osd retry writes if failure after ack..`
kclient: more mds_client comments, cleanup 2008-10-16 19:13:09 +00:00			`- ACLs`
			`- reconnect path should include pathbase, not just a string?`
todos 2008-06-23 13:45:49 +00:00			`- make writepages maybe skip pages with errors?`
			`- EIO, or ENOSPC?`
			`- ... writeback vs ENOSPC vs flush vs close()... hrm...`
todos 2008-06-16 22:52:07 +00:00			`- set mapping bits for ENOSPC, EIO?`
todos 2008-04-17 01:53:13 +00:00			`- flush caps on sync, fsync, etc.`
todos 2008-10-02 16:57:48 +00:00			`- do we need to block? how do we track that?`
todos 2008-03-07 18:37:33 +00:00			`- procfs/debugfs`
todo, client verbosity 2008-03-03 06:41:06 +00:00			`- adjust granular debug levels too`
			`- should we be using debugfs?`
todos 2008-03-07 18:37:33 +00:00			`- a dir for each client instance (client###)?`
			`- hooks to get mds, osd, monmap epoch #s`
todos 2008-10-02 16:57:48 +00:00			`- populate sysfs?`
			`- things that would be useful to see`
			`- fsid`
			`- map versions on client`
			`- outstanding mds, osd, mon requests?`
mds: auth_pin dir we are projecting in predirty_nested 2008-06-12 21:23:43 +00:00			`- fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it`
kclient: mds reset stub Get rid of compile warning. Add todo. 2008-10-01 22:17:27 +00:00			`- reconnect after being disconnected from the mds`
todo git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1378 29311d96-e01e-0410-9327-a35deaab8ce9 2007-05-25 20:10:48 +00:00
todos 2008-10-02 16:57:48 +00:00			`kclient items to review`
			`- fill_trace locking`
			`- async trunc`
			`- async writeback`
			`- cache invalidation race, locking problems`
			`- cap changes are serialized by i_lock, but (thorough) cache invalidation may block..`

todos 2008-06-23 13:45:49 +00:00			`vfs issues`
			`- real_lookup() race:`
			`1- hash lookup find no dentry`
			`2- real_lookup() takes dir i_mutex, but then finds a dentry`
			`3- drops mutex, then calld d_revalidate. if that fails, we return ENOENT (instead of looping?)`
			`- vfs_rename_dir()`
todo git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1378 29311d96-e01e-0410-9327-a35deaab8ce9 2007-05-25 20:10:48 +00:00
todos 2008-12-12 05:13:02 +00:00			`filestore`
			`- sort object lists by ino`
			`- get file csum?`

todos 2008-12-10 00:47:40 +00:00			`btrfs`
todos 2008-12-12 05:13:02 +00:00			`- inode_lock vs tree->lock lockdep warning`
todos 2008-12-10 00:47:40 +00:00			`- clone compressed inline extents`
			`- ioctl to pull out data csum?`


todos 2008-09-18 22:39:54 +00:00			`userspace client`
			`- handle session STALE`
			`- time out caps, wake up waiters on renewal`
			`- link caps with mds session`
			`- validate dn leases`
			`- fix lease validation to check session ttl`
			`- clean up ll_ interface, now that we have leases!`
osdc: take flags args 2008-03-20 16:41:09 +00:00			`- clean up client mds session vs mdsmap behavior?`
todos 2008-09-18 22:39:54 +00:00			`- stop using mds's inode_t?`
			`- fix readdir vs fragment race by keeping a separate frag pos, and ignoring dentries below it`
mon todos! git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@2231 29311d96-e01e-0410-9327-a35deaab8ce9 2007-12-19 04:54:23 +00:00
mds todos 2008-06-04 18:08:06 +00:00			`mds`
todo 2008-12-10 23:31:25 +00:00			`- file size recovery doesn't update recursive stats?`
			`- file size recovery gives (wrong) 4MB-increment results?`
todos 2008-09-18 22:39:54 +00:00			`- hard link backpointers`
			`- anchor source dir`
			`- build snaprealm for any hardlinked file`
			`- include snaps for all (primary+remote) parents`
mds todo 2008-10-20 22:37:49 +00:00			`- how do we properly clean up inodes when doing a snap purge?`
			`- when they are mid-recover? see 136470cf7ca876febf68a2b0610fa3bb77ad3532`
mds: mark scatterlock if we import dirty scatterlock dir data 2008-08-14 18:17:25 +00:00			`- whats with the 'clear if dirtyscattered' bit in decode_import_inode()?`
v0.4, todos 2008-10-01 18:53:38 +00:00			`- what if a recovery is queued, or in progress, and the inode is then cowed? can that happen?`
todo! git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1836 29311d96-e01e-0410-9327-a35deaab8ce9 2007-09-13 03:58:22 +00:00			`- proper handling of cache expire messages during rejoin phase?`
merged r1958:2075 from branches/sage/mds back into trunk git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@2076 29311d96-e01e-0410-9327-a35deaab8ce9 2007-11-16 21:46:52 +00:00			`-> i think cache expires are fine; the rejoin_ack handler just has to behave if rejoining items go missing`
			`- try_remove_unlinked_dn thing`
mds: separate snaprealm creation from snap creation 2008-07-18 21:04:26 +00:00			`- rename: importing inode... also journal imported client map?`
merged r1958:2075 from branches/sage/mds back into trunk git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@2076 29311d96-e01e-0410-9327-a35deaab8ce9 2007-11-16 21:46:52 +00:00			`- rerun destro trace against latest, with various journal lengths`
kclient: drop leases for setattr 2008-03-31 17:27:12 +00:00			`- lease length heuristics`
			`- mds lock last_change stamp?`
mds: slight cleanup of client reconnect failures 2008-06-05 14:14:45 +00:00			`- handle slow client reconnect (i.e. after mds has gone active)`
todos 2008-02-29 00:58:28 +00:00			`- fix reconnect/rejoin open file weirdness`
mds: slight cleanup of client reconnect failures 2008-06-05 14:14:45 +00:00			`- can we get rid of the dirlock remote auth_pin weirdness on subtree roots?`
mds todos 2008-06-04 18:08:06 +00:00			`- anchor_destroy needs to xlock linklock.. which means it needs a Mutation wrapper?`
			`- ... when it gets a caller.. someday..`
			`- make truncate faster with a trunc_seq, attached to objects as attributes?`
merged branches/sage/cephmds2 into trunk/ceph git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1407 29311d96-e01e-0410-9327-a35deaab8ce9 2007-06-06 22:43:47 +00:00			`- osd needs a set_floor_and_read op for safe failover/STOGITH-like semantics.`
merged r1936 from branches/sage/mds back into trunk git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1937 29311d96-e01e-0410-9327-a35deaab8ce9 2007-10-12 22:46:27 +00:00			`- could mark dir complete in EMetaBlob by counting how many dentries are dirtied in the current log epoch in CDir...`
			`- FIXME how to journal/store root and stray inode content?`
merged branches/sage/cephmds2 into trunk/ceph git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1359 29311d96-e01e-0410-9327-a35deaab8ce9 2007-05-16 21:53:22 +00:00			`- in particular, i care about dirfragtree.. get it on rejoin?`
			`- and dir sizes, if i add that... also on rejoin?`
merged r1936 from branches/sage/mds back into trunk git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1937 29311d96-e01e-0410-9327-a35deaab8ce9 2007-10-12 22:46:27 +00:00			`- add FILE_CAP_EXTEND capability bit`

merged branches/sage/cephmds2 into trunk/ceph git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1359 29311d96-e01e-0410-9327-a35deaab8ce9 2007-05-16 21:53:22 +00:00
merge from branches/sage/cephmds2 git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1106 29311d96-e01e-0410-9327-a35deaab8ce9 2007-02-17 22:49:47 +00:00			`journaler`
			`- fix up for large events (e.g. imports)`
			`- use set_floor_and_read for safe takeover from possibly-not-quite-dead otherguy.`
			`- should we pad with zeros to avoid splitting individual entries?`
			`- make it a g_conf flag?`
			`- have to fix reader to skip over zeros (either <4 bytes for size, or zeroed sizes)`
			`- need to truncate at detected (valid) write_pos to clear out any other partial trailing writes`


todos 2008-12-10 00:47:40 +00:00			`mon`
			`- paxos need to clean up old states.`
			`- default: simple max of (state count, min age), so that we have at least N hours of history, say?`
			`- osd map: trim only old maps < oldest "in" osd up_from`
todo 2008-12-10 23:31:25 +00:00			`- blacklist failed mds's`
todos 2008-12-10 00:47:40 +00:00
todos 2008-09-18 22:39:54 +00:00			`osdmon`
			`- monitor needs to monitor some osds...`

			`pgmon`
			`- include osd vector with pg state`
			`- check for orphan pgs`
			`- monitor pg states, notify on out?`
			`- watch osd utilization; adjust overload in cluster map`

todos 2008-12-10 00:47:40 +00:00			`crush`
			`- allow forcefeed for more complicated rule structures. (e.g. make force_stack a list< set<int> >)`
todos 2008-09-18 22:39:54 +00:00
mds: only check osdmap against snap table while active 2008-10-17 20:30:33 +00:00			`osd`
osd: allow admin to mark osd lost to kickstart recovery (disk format change) This is important when an osd (or osds) may contain modifications but is offline. If the data is truly lost, we can kickstart recovery. Note that if the osd was storing metadata, this could be especially dangerous! 2008-11-26 22:48:52 +00:00			`- pg split should be a work queue`
osd: move stats into PG::Info (disk format change) We want the pg stats to propagate along with last_update. Do so in merge_log. Also, stop doing delayed stats update on primary; we always update the in-core copy of Info, and only delay applying the transaction to disk. At least currently. 2008-11-26 19:23:15 +00:00			`- pg split needs to fix up pg stats. this is tricky with the clone overlap business...`
osd todos 2008-11-17 17:15:44 +00:00			`- generalize ack semantics? or just change ack from memory to journal? memory/journal/disk...`
objectcacher fixes mostly git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@786 29311d96-e01e-0410-9327-a35deaab8ce9 2006-08-03 19:59:26 +00:00			`- rdlocks`
cleanup of osd failure recovery git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@858 29311d96-e01e-0410-9327-a35deaab8ce9 2006-09-15 21:53:31 +00:00			`- optimize remove wrt recovery pushes`
* fixed a bug in buffer.h! yay! should be much more memory efficient now, too. git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1496 29311d96-e01e-0410-9327-a35deaab8ce9 2007-07-13 17:29:26 +00:00
merge from branches/sage/cephmds2 git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1106 29311d96-e01e-0410-9327-a35deaab8ce9 2007-02-17 22:49:47 +00:00			`simplemessenger`
todos 2008-12-10 00:47:40 +00:00			`- close idle connections?`
tons of rados and client stuff. untested! git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@784 29311d96-e01e-0410-9327-a35deaab8ce9 2006-08-02 18:00:24 +00:00
			`objectcacher`
todos 2008-11-07 00:14:18 +00:00			`- read locks?`
			`- maintain more explicit inode grouping instead of wonky hashes`