mirror of
https://github.com/ceph/ceph
synced 2025-01-21 18:45:23 +00:00
9213a23f14
git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1138 29311d96-e01e-0410-9327-a35deaab8ce9
125 lines
2.8 KiB
Plaintext
125 lines
2.8 KiB
Plaintext
|
|
|
|
- LogEvent.replay() is idempotent. we won't know whether the update is old or not.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
journal is distributed among different nodes. because authority changes over time, it's not immedicatley clear to a recoverying node relaying the journal whether the data is "real" or not (it might be exported later in the journal).
|
|
|
|
|
|
possibilities:
|
|
|
|
|
|
ONE.. bloat the journal!
|
|
|
|
- journal entry includes full trace of dirty data (dentries, inodes) up until import point
|
|
- local renames implicit.. cache is reattached on replay
|
|
- exports are a list of exported dirs.. which are then dumped
|
|
...
|
|
|
|
recovery phase 1
|
|
- each entry includes full trace (inodes + dentries) up until the import point
|
|
- cache during recovery is fragmetned/dangling beneath import points
|
|
- when export is encountered items are discarded (marked clean)
|
|
|
|
recovery phase 2
|
|
- import roots ping store to determine attachment points (if not already known)
|
|
- if it was imported during period, attachment point is already known.
|
|
- renames affecting imports are logged too
|
|
- import roots discovered from other nodes, attached to hierarchy
|
|
|
|
then
|
|
- maybe resume normal operations
|
|
- if recovery is a background process on a takeover mds, "export" everything to that node.
|
|
|
|
|
|
-> journal contains lots of clean data.. maybe 5+ times bigger as a result!
|
|
|
|
possible fixes:
|
|
- collect dir traces into journal chunks so they aren't repeated as often
|
|
- each chunk summarizes traces in previous chunk
|
|
- hopefully next chunk will include many of the same traces
|
|
- if not, then the entry will include it
|
|
|
|
|
|
|
|
|
|
=== log entry types ===
|
|
- all inode, dentry, dir items include a dirty flag.
|
|
- dirs are implicitly _never_ complete; even if they are, a fetch before commit is necessary to confirm
|
|
|
|
ImportPath - log change in import path
|
|
Import - log import addition (w/ path, dirino)
|
|
|
|
InoAlloc - allocate ino
|
|
InoRelease - release ino
|
|
|
|
Inode - inode info, along with dentry+inode trace up to import point
|
|
Unlink - (null) dentry + trace, + flag (whether inode/dir is destroyed)
|
|
Link - (new) dentry + inode + trace
|
|
|
|
|
|
-----------------------------
|
|
|
|
TWO..
|
|
- directories in store contain path at time of commit (relative to import, and root)
|
|
- replay without attaching anything to heirarchy
|
|
- after replay, directories pinged in store to attach to hierarchy
|
|
|
|
-> phase 2 too slow!
|
|
-> and nested dirs may reattach... that won't be apparent from journal.
|
|
- put just parent dir+dentry in dir store.. even worse on phase 2!
|
|
|
|
|
|
THREE
|
|
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
metadata journal/log
|
|
|
|
|
|
event types:
|
|
|
|
chown, chmod, utime
|
|
InodeUpdate
|
|
|
|
mknod, mkdir, symlink
|
|
Mknod .. new inode + link
|
|
|
|
unlink, rmdir
|
|
Unlink
|
|
|
|
rename
|
|
Link + Unlink (foreign)
|
|
or Rename (local)
|
|
|
|
link
|
|
Link .. link existing inode
|
|
|
|
|
|
|
|
|
|
InodeUpdate
|
|
DentryLink
|
|
DentryUnlink
|
|
InodeCreate
|
|
InodeDestroy
|
|
Mkdir?
|