ceph/branches/sage/pgs/doc/journal.txt
sageweil 9213a23f14 eek
git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1138 29311d96-e01e-0410-9327-a35deaab8ce9
2007-02-28 18:42:55 +00:00

125 lines
2.8 KiB
Plaintext

- LogEvent.replay() is idempotent. we won't know whether the update is old or not.
journal is distributed among different nodes. because authority changes over time, it's not immedicatley clear to a recoverying node relaying the journal whether the data is "real" or not (it might be exported later in the journal).
possibilities:
ONE.. bloat the journal!
- journal entry includes full trace of dirty data (dentries, inodes) up until import point
- local renames implicit.. cache is reattached on replay
- exports are a list of exported dirs.. which are then dumped
...
recovery phase 1
- each entry includes full trace (inodes + dentries) up until the import point
- cache during recovery is fragmetned/dangling beneath import points
- when export is encountered items are discarded (marked clean)
recovery phase 2
- import roots ping store to determine attachment points (if not already known)
- if it was imported during period, attachment point is already known.
- renames affecting imports are logged too
- import roots discovered from other nodes, attached to hierarchy
then
- maybe resume normal operations
- if recovery is a background process on a takeover mds, "export" everything to that node.
-> journal contains lots of clean data.. maybe 5+ times bigger as a result!
possible fixes:
- collect dir traces into journal chunks so they aren't repeated as often
- each chunk summarizes traces in previous chunk
- hopefully next chunk will include many of the same traces
- if not, then the entry will include it
=== log entry types ===
- all inode, dentry, dir items include a dirty flag.
- dirs are implicitly _never_ complete; even if they are, a fetch before commit is necessary to confirm
ImportPath - log change in import path
Import - log import addition (w/ path, dirino)
InoAlloc - allocate ino
InoRelease - release ino
Inode - inode info, along with dentry+inode trace up to import point
Unlink - (null) dentry + trace, + flag (whether inode/dir is destroyed)
Link - (new) dentry + inode + trace
-----------------------------
TWO..
- directories in store contain path at time of commit (relative to import, and root)
- replay without attaching anything to heirarchy
- after replay, directories pinged in store to attach to hierarchy
-> phase 2 too slow!
-> and nested dirs may reattach... that won't be apparent from journal.
- put just parent dir+dentry in dir store.. even worse on phase 2!
THREE
-
metadata journal/log
event types:
chown, chmod, utime
InodeUpdate
mknod, mkdir, symlink
Mknod .. new inode + link
unlink, rmdir
Unlink
rename
Link + Unlink (foreign)
or Rename (local)
link
Link .. link existing inode
InodeUpdate
DentryLink
DentryUnlink
InodeCreate
InodeDestroy
Mkdir?