Distributed Atomic Commit
Why hard?
- Failures occur
- Performance
- communication (latency of updates)
- storage (logging)
- Concurrency
Example: replicated block storage (distributed): mirroring
- client:
data := read(address)
write(address, data)
- one-copy semantics: write data to a logical address space, it is guaranteed serialized into storages
- usage: read one, write all
How to do an update?
- operation: write(addr, data)
- client tries to get everybody agrees on one thing (prepare)
- do necessary pre-work before actual commit - servers reply OK - client tells them to commit update, and servers reply OK (commit)
- client tries to get everybody agrees on one thing (prepare)
Concurrency
- need locking
- concurrent writes to the same address: inconsistent/ diverged replicas
- prevents concurrent access to block
- but have to be careful (there maybe deadlock)
- deadlocks cycle detection: complicated
- fix ordering: server A is able to lock before server B: too slow
- saying no: client may get a no and abort all acquired locks
- servers may not make any progress for a period of time (no one would acquire all locks)
- we could do random backoff carefully
- concurrent writes to the same address: inconsistent/ diverged replicas
Failure
- simple model
- crash + recover (fail-recover)
- general approach:
- try to remember enough information (but usually as little as possible)
- do logging
- server-side logging
- logging before reply OK of prepare to client, then reply OK to client’s prepare (forced)
- logging before reply OK of commit to client, then reply OK to client’s commit (forced)
- if server crash after logging prepare (but before commit from client)
- server wake up
- read log, then consult client/coordinator
- either commit/abort
- client/coordinator side
- maintain in-memory table of on-going updates/transactions info:
- transaction ID
- state: init, aborted/committed, preparing
- per server: server ID, vote? ack?
* 2 more points of logging (client-side)
- after all have voted, record outcome
- after all acks, record END. Then remove book keeping from in-memory table
- if client crash before all votes in: abort (e.g., server time-out -> ask client -> no info -> presume abort)
- maintain in-memory table of on-going updates/transactions info:
Basic protocol
- 2 round trips (communication)
- 4 logging events: 2 at each server, 2 at client
Presume Abort
- if we know votes outcome is abort: record nothing now, let it time-out (because I have other urgent things to do, I could record later)
- reduces client logging and round trips (no ack needed)
- but aborts don’t happen too frequently, so it does not acutually reduce too much
- don’t optimize a case you don’t want to face
- coordinator does not need the last ack (Server -> Client)
- client
- crash before collecting vote outcome: abort
- crash after force the vote: commit: tell asker to do commit
- server
- crash after vote log committed: ask coordinator: what’s up?
Presume Commit
- client
- log at initialization (before prepare vote) (forced)
- log at all vote before commit (forced)
- log at initialization (before prepare vote) (forced)
- but there are ways in the paper to reduce the two force log in client side (professor stops here)
2 Phase Commit
- force: the same term in ARIES means flush to disk, and do this urgency immediately
Normal 2PC
There is no default answer in this protocol, since it forces all the logs, the subordinate and coordinator know what is the status of transaction
It would lost memory about the transaction when the coordinator logs “end”
handling coordinator failure
- case1: before forcing commit/abort -> XACT is aborted
- case2: after forcing commit/abort -> recovery process drives to commit/abort
subordinate fails
- before forcing prepare/abort -> outcome is abort
- after forcing abort(means vote no) -> outcome is abort
- after forcing prepare(means vote yes) -> contact coordinator
- after forcing commit/abort -> commit or abort (send a ACK)
- coordinator may forget about the XACT, so it would ignore
Read-only
- coordinator “prepare”
- read-only subordinate write no log record, vote “read”
- coordinator ignores read-only sites during 2nd phase
Presumed-abort 2PC
default reply is abort when it has no memory about it
abort case
- coordinator “prepare”
- subordinate abort, no force, since not neccessary
- coordinator abort, no force, if it fails, it would just recover with abort
- other subordinate abort, no force, since coordinator default answer is abort
commit case
- coordinator “prepare”
- subordinate prepare, force, since it records the start
- coordinator commit, force
- other subordinate commit, force, since coordinator may finish the transaction and forget about it, then it would say abort
Presumed-commit
default reply is commit when it has no memory about it
abort case
- coordinator “collect”, “prepare”, force
- collect is the begin log log all participants, when coordinator recovers, it would not know the transaction and answer “commit” - subordinate abort, force - coordinator abort, force
- another paper says you could no force: Since you find collecting but nothing further (abort/end log), you abort - subordinate abort, force
- why force and why coordinator would block to receive ack here? See summary below
commit case
- coordinator “collect”, “prepare”, force
- collect is the begin log log all participants, since if there is no such log, when coordinator recovers, it would not know the transaction and answer “commit” - subordinate prepare, force - coordinator commit, force, since it indicates whether this transaction passes this phase
- TODO - subordinate commit, no force, since the default answer is “commit”