Remus and HA NFS

What kind of failures in hardware/software can it deal with?
- Fail-stop failure and only one of primary or backup fails
  - it could handle both of them crash temporarily, since the data will remain crash-consistent
What kind can’t it deal with?
- Can not recover from nonfail-stop conditions
- Both primary and backup fails permanently
- Software bugs, which could propagate application errors to backup
What happens when an unexpected failure arises?
- if primary fails
  1. the backup node will try to wait the new checkpoint transmission until time-out
- the alive node will upgrade itself as primary and resume execution from most recent checkpoint
- if backup fails
  1. the primary will try to wait for a time-out of the backup responding to commit requests
- the primary would have to replicate another backup

load imbalance
fault tolerant?
machine service/upgrade/etc
- but, migration at VMM level is easier
how to migrate?
- CPU: easy, just register
- Memory: large and changes quickly
- Disk: shared storage
- Net: redirect flows (networks are lossy)

Pre-copy: before migration, start copying pages to target
- Problem: redundant copies
Stop-and-copy: stop service, copy, start target
- Problem: slow
Post-copy: lazy fetch, copy-on-write: fetch on demand
- Problem: longer term dependency on Source host, slow

Hypothesis: Have migration, can we buil HA easier?

similarity: running backup is similar to migration (which running at target)
different:
- Frequency: HA > state transfer
- migration has acceptable service degradation
- planned vs unplanned

Good:
- Transparent HA via Primary/Backup
- unmodified systems can use this
Basic: Primary -> Basic
- 同步: slow (CPU, mem speed » net)
- 异步: backup large primary
  - need to prevent observation of this lag
- network indirection: buffer merge primary -> useful until backup has checkpointed state
N-1 mode: N primaries share 1 backup

memory transfer: frequent stop-and-copy
- all irrelevant memory activities are replicated: very low efficiency
net/disk buffering:
1. periodic checkpoint (10~100ms)
  - net/disk buffering
  - copy mem state P -> B
  - page-table based state tracking: only send delta - resume primary execution
disk: write through: backup commits checkpoint from mem to disk before resuming execution

make 2 servers, each be the bakcup of each other
- dual-ported disks that are accessible to two servers
- mirroring files on different disks
one server down may put more works on the remain server
DRBD

希言自然