RPC

Zhaoyu Luo bio photo By Zhaoyu Luo

Before RPC

  • sockets are the fundamental mechanism with a poor read/write only interface

Typing

  • MPI has explicit type declaration
    • deal recursively with passing the pointer
  • Assuming name equivalence (structural equivalence in compiler)

Centralized Registry

  • Services are registered here, binding
  • Hard coded address is also an alternative

Failover

  • If the server fails and reboot, the client will retransimit the request with same unique identifier and different sequence number
  • The server would ACK every request
    • latency is better than TCP, throughput is worse then TCP
      • very slow in bulk data transferring

Reviews

  • Stubs are customized for every different client and server
  • No time-out criticism
    • If time-out is not supported, people could underestimate the consumption time of RPCs inside one loop. E.g., considering a local procedure call which has time-out of 1s, whose body is just a 1000 times loop of a = a + 1. This would be fine and never time-out. However, if we make the add operation into RPC call, even though it is easy and quick to return, the network latency would easy to accumulate more than 1s, at this time the programmer should be aware that he is calling over the network. By using time-out, he could find out the caller function’s time-out more quickly.
    • The RPC design should not count on a specific language, such as Mesa to handle the time-out bases on its own language feature.
    • RPC could not be treated as local procedure call. Since we don’t the load of a remote server, one quick RPC call could be super slow when large number of clients try to call it or the network itself becomes congested. In local procedure call, the procedure itself could figure how many processes are calling one particular service at the time. So no time-out simplification may cover up the potential congestion, especially when each client is not aware the existence of the others.
    • Time-out is not difficult to implement. In modern language, we only need set up a timer at the start of a function, then check the result when times up. If the return value hasn’t arrived, then we raise time-out exception

Reference

Distributed Systems

  • communication
  • failure
  • time

RPC

Xerox RPC

  • overview
  • stub compiler
  • naming
  • reliable RPC

    metrics low-level messages(TCP/IP) DSM RPC performance/ efficiency Y N Y? ease of use N? Y? Y security Y N Y generality/ flexibility Y Y Y fault-handling Y N Y

Abstractions

  • Low-level message: send/ receive
    • TCP/IP: reliable, congestion control, connection oriented, stream
    • UDP/IP
  • From OS: Virtual Memory
    • Distributed Shared Memory (DSM): use page fault machinery to build illusion
RPC
  • Distributed File System
  • Interface: read, write, open, close, create, delete
  • Stub compiler
    • int close(int fd)
    • int write(int fd, char *buf, int size) pass pointer?
  • client side/ server side is similar:
    1. (marshalling)
      • pack arguments -> message
      • send msg
      • wait for reply
      • unpack results
      • return n
  • naming
    • Grapevine: distributed replicated key-value store
    • map:
      • DFS -> home, bin
      • home -> net address
  • client call
    • failure (server crash, message loss)
    • basics: server exports interface
      • unique ID (detect server crash)
      • table index -> which interface on server we are interested in
    • procedure
      1. table index: which interface
        • unique ID (server creates, the client use it)
        • if server crashes, it would create new ID, prevent old ID - number (there is table in the server to look up): which function to call - arguments
  • Why not “TCP”?
    • heavyweight setup
    • doubling of messages (implicit ack by response)
  • Towards reliability:
    1. client send activity {machine ID, process ID, table index, function index, unique ID}, sequence number
      • server get new-activity: add activity to table record sequence #
      • server reply
      • client send subsequence call: same activity
      • if seqclient > seqtable: process
      • else: ack with old reply (because client may loss package and retry)
  • Goals:
    • “exactly once” semantics (“at most once”)
      • upon successful ack reply function was invoked once
      • failure: at most once
  • Complicated calls:
    • long-running RPCs: [are you alive?]
    • bulk messages: larger (content size) > packet [simple one-at-a-time transfer]