AFS

Zhaoyu Luo bio photo By Zhaoyu Luo

Goals

  • scale: how many clients can server serve?
  • reasonable definition of consistency (top down)
  • use local disk

Protocol

  • NFS: clients check with server periodically
    • AFSv1 also has this problem, too many TestAuth

AFSv1

  • only cache file contents, directories are only kept at the server
  • open(): fetch the entire file, store in local disk
  • read() write(): perform on local cache, can be at memory speeds (if file_size < memory)
  • close(): if modified, push it back to server
Problems
  • One process per client costs too much
  • Path-traversal costs are too high
    • whole path is sent to server, it is entirely done by server, costs CPU
  • The client issues too many TestAuth protocol messages
    • TestAuth ask server if file changes
    • this costs 60% of all calls to server
    • polling-based approach (repeated)
  • Imbalanced server loads
    • use volumes, move volumes around
  • Too many context switches in server
    • use thread instead of multi-process

AFSv2

  • keep the consistency model: whole-file local-disk caching
  • install callback (polling) at server to reduce TestAuth communications
    • interrupt driven (server promises to tell client about changes)
    • Complexity lies in: client crashes/ slowers (does server wait for them?)
  • file identifier (FID): a volume identifier, a file identifier and a “uniquifier”
  • pathname integrates with caching: piecewise lookup

Path lookup

open("/root_local_x/remote_y/z/a.txt", O_RDONLY)

  1. access to all
    • x: fetch dir, establish callback
    • y: fetch dir, establish callback
    • z: fetch dir, establish callback
    • a.txt: fetch dir, establish callback

Consistency

  • Weak consistency
    • All accesses to syncronization variables are seen by all processes in the same order (sequentially)
    • All accesses to other variables may be seen in different order on different processes
    • Operations in between synchronization operations is the same in each process
  • When the file is opened, a client will generally receive the latest consistent copy from the server
    • NFS caches blocks rather than whole files, caches in memory rather than in local disk
  • last writer wins: processes in different machines who is the last one close the file would overwrite the server file
    • NFS would get a mixed block file

Crash Recovery

  • clients would try to send TestAuth before fetch when client or server is down
  • client crash and reboot: do not trust the callbacks
    • server may try to communicate with client for a while, then give up to update the client
    • client would rather to keep files, but remove callbacks
  • server crash
    • reboot and immediately start serving
      • problems: other clients have callbacks
        • may need to notify the crash to all clients
        • client could also poll server periodically

Vital features

  • Scalability
    • AFS file is fetched from server and cached on the local disk of a workstation, further references are directed to the local cache copy
    • Any modification is promptly sent to the server
    • Local disk is used as a persistent cache by AFS
  • Security
    • User password could obtain security token, which could be used to establish secure network connections later
    • ACL
    • Everything within cable is encrypted
  • Simplified System Administration
    • Only servers need backup, workstation disks are merely caches

Reviews

  • cluster stores only file paths