Grapevine: An Exercise in Distributed Computing
- Compared with hardware, software failure could be more arduous. Same version software could get crashed by the same bug which results in crash totally
nohup
is a must to alleviate from having all the servers crash in a short period before anyone notices that
Design
- do not interpret the content of the message(MIME), let clients do
- eventual consistency
- server propagates the change to the replicas of the entry in other servers
- other servers would merge change message with their own copy in the future
- two conflicting updates (e.g., add and delete) with different ordering in two different hosts is acceptable
- designer thinks it is not a problem since it does not communicate with outside system
- the clients should cope with transient inconsistencies
- server propagates the change to the replicas of the entry in other servers
- independent failure mode
- separate naming and addressing
- separate service from interface
Failure
- Change message get destroyed
- each registration server would resynchronize the data with others periodically
- use timestamp to conduct merges
- each registration server would resynchronize the data with others periodically
Registration Service
implementing distributed database
- authentication
- lookups/additions/deletions to database
- every entity has name (RName): two types of entity:
- groups: list of RNames
- individual: person(inbox sites), machine(network address)
- every entity has name (RName): two types of entity:
- Grapevine Registry: GV (fully replicated)
- GV is group contains all names of registry servers
- Bootstrap problem: how to find other servers?: manually input message servers
Replication
- replicates registry: all on particular registry server
- Goal: eventually consistent (can observe inconsistencies, but after time T, they will resolve)
- internal representation: (e.g., group)
- group: two lists: active_list, deleted_list (entity can be on one list at a time)
- garbage collection on deleted_list, maybe every 2 weeks
- group: two lists: active_list, deleted_list (entity can be on one list at a time)
Operations
- update: add(), delete()
- remove entity from active/deleted list
- generate timestamp, unique ID
- put entity on active/deleted list
- remove entity from active/deleted list
- merge
- process do local update, then generate change messages for other replicas
- takes two lists, merge into one using timestamps
- change messages may get lost: we may need to sync the servers after a period of time
- process do local update, then generate change messages for other replicas
Delivery Service
- implemented by group of message servers
accept()
: here is some mail to deliverpoll/check()
: do I have mail?retrieve
: get my mail
Message Delivery
- prepare email: body, list of recipients (inbox sites)
- Accept: pick any message server and give it the message
- nothing else is replicated, but only the service replicated, you could pick any server - message server: store locally and say OK
Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency
- Leases give great management flexibility.
- The server could control the term of leases, and it could choose from seeking write approval or wait for lease expire
- A heavily write-shared file might be given a lease term of zero
- A heavily read-shared file could be optimized by using a smaller number of leases, e.g., one lease per directory
- to eliminate the need for clients to request lease extensions
- The client could choose when to request extension, relinquish and approve write
- lease is contract: rights for limited time
- allow to make sensible progress (slow client will not block others too much, we have expiration)
- client wants to access file F
- request lease(time period) from server for F
- if re-access file before lease expires: use it without server interaction
- otherwise, when expires, renew it before re-use
- request lease(time period) from server for F
- different clients want to modify F
- have to contact all lease holders to get permission
- It is OK if one client doesn’t reply
- have to contact all lease holders to get permission
Design
- provide strict consistency in spite of non-Byzantine failures, including partitions
- a key assumption is that clocks are reasonably accurate
- use physical clocks, real clock
Leases
- lease extenstion overhead: extension after lease expires
- on-demand extension rather than periodic extension
- false sharing: leaseholder hold the lease without accessing the file, while others could not write
- Scaling Memcache at Facebook, thundering herds
- without lease, all cache misses would go to database query
- with lease, I would wait for a short time to let other leaseholder to set the data, so that I will not query the DB
Lease Length
- shorter term is good
- false sharing: reduces “false” conflicts
- server availability: server could reboot quickly (after crash)
- longer term is good
- reduces overheads for extensions
Crash Recovery
- simple:
- client: forget all leases upon reboot
- server: wait(until all leases have expired)