Goals
- scale: how many clients can server serve?
- reasonable definition of consistency (top down)
- use local disk
Protocol
- NFS: clients check with server periodically
- AFSv1 also has this problem, too many TestAuth
AFSv1
- only cache file contents, directories are only kept at the server
open()
: fetch the entire file, store in local diskread() write()
: perform on local cache, can be at memory speeds (if file_size < memory)close()
: if modified, push it back to server
Problems
- One process per client costs too much
- Path-traversal costs are too high
- whole path is sent to server, it is entirely done by server, costs CPU
- The client issues too many TestAuth protocol messages
- TestAuth ask server if file changes
- this costs 60% of all calls to server
- polling-based approach (repeated)
- Imbalanced server loads
- use volumes, move volumes around
- Too many context switches in server
- use thread instead of multi-process
AFSv2
- keep the consistency model: whole-file local-disk caching
- install callback (polling) at server to reduce TestAuth communications
- interrupt driven (server promises to tell client about changes)
- Complexity lies in: client crashes/ slowers (does server wait for them?)
- file identifier (FID): a volume identifier, a file identifier and a “uniquifier”
- pathname integrates with caching: piecewise lookup
Path lookup
open("/root_local_x/remote_y/z/a.txt", O_RDONLY)
- access to all
- x: fetch dir, establish callback
- y: fetch dir, establish callback
- z: fetch dir, establish callback
- a.txt: fetch dir, establish callback
Consistency
- Weak consistency
- All accesses to syncronization variables are seen by all processes in the same order (sequentially)
- All accesses to other variables may be seen in different order on different processes
- Operations in between synchronization operations is the same in each process
- When the file is opened, a client will generally receive the latest consistent copy from the server
- NFS caches blocks rather than whole files, caches in memory rather than in local disk
- last writer wins: processes in different machines who is the last one close the file would overwrite the server file
- NFS would get a mixed block file
Crash Recovery
- clients would try to send TestAuth before fetch when client or server is down
- client crash and reboot: do not trust the callbacks
- server may try to communicate with client for a while, then give up to update the client
- client would rather to keep files, but remove callbacks
- server crash
- reboot and immediately start serving
- problems: other clients have callbacks
- may need to notify the crash to all clients
- client could also poll server periodically
- problems: other clients have callbacks
- reboot and immediately start serving
Vital features
- Scalability
- AFS file is fetched from server and cached on the local disk of a workstation, further references are directed to the local cache copy
- Any modification is promptly sent to the server
- Local disk is used as a persistent cache by AFS
- Security
- User password could obtain security token, which could be used to establish secure network connections later
- ACL
- Everything within cable is encrypted
- Simplified System Administration
- Only servers need backup, workstation disks are merely caches
Reviews
- cluster stores only file paths