Storm, Heron

Zhaoyu Luo bio photo By Zhaoyu Luo

Storm

  • Topology: is a DAG where vertices represent computation and the edges represent the data flow
    • is a logical plan from a DB system perspective
  • more than one worker process on the same machine may be executing different part of the same topology
  • one process is one JVM (While Heron Instance is a JVM process, runs only a single task of spout or bolt)
    • worker processes serve as containers
    • each worker runs a supervisor that communicates with Nimbus
  • one JVM has multipe threads (executors)
    • executors provide intra-topology parallelism
  • one thread has multipe tasks
    • task provide intra-bolt/intra-spout parallelism
  • Nimbus is “JobTracker”: is the touchpoint between the user and the Storm system
    • Nimbus and Supervisor daemons are fail-fast and stateless, and all their state is kept in Zookeeper or on the local disk
    • Single point of failure in Storm, but not in Heron
  • Supervisor: every 3 seconds, it reads worker heartbeats from local state and classifies those workers as either valid, time out, not started, or disallowed

Approximate operators

  • two ways:
    • sampling
      • sample-and-hold: heavy-hitter elements
      • identify subset of elements, e.g., give me 5% of tweets
      • incoming element x: is it a hash table entry?
        • yes: increment its counts
      • no: sample P (the way you choose the probability)
        • yes: create a hash table entry
        • no: pass on the element
    • sketching (bloom filter)
      • loglog operator: estimating cardinality of a multi-set
      • bit vector
      • index = compute(hash(x))
    • count min-sketch: estimating frequency counts
  • BlinkDB, estimate the likelihood

Common

  • Topology: interconnection between spout and bolt
    • spout pull data from queues, such as Kafka. spout is producer
    • bolt is consumer
  • each spout and bolt could map to multiple tasks
  • each stream is an indefinite sequence of tuples (tweets)
  • provide guarantees:
    • at least once: naive implementation is to keep track of all linage for each tuple: but leads to high memory usage
      1. one tweet is considered one tuple, which is generated by a spout
    • there would be multiple steps after spout, which generate multiple tuples with corresponding new IDs
    • a ACK bolt is assigned to track the whole process of this tweet and calculate the IDs’ XOR
    • if XOR != 0 or packet loss or time out, replay the tweet
    • at most once: just disable the ACK mechanism above

Failure

  • on failure, replay tweet by tweet from start, but spark streaming recover from intermediate RDD
  • nodes failure: Spark could do some recovery

Zookeeper

  • where to store the state of one tweet?
    • paper does not say
  • keeping runtime state: nimbus and supervisor

Heron Different

  • Need debug-ability and clean mapping: from the logical units of computation to each physical process
  • Scheduling uncertainty: JVM scheduler, thread scheduler
  • Storm: there is no resource isolation between different tasks:
    • multiple tasks may run in the same JVM
    • multiple tasks are written into a single file
    • one task’s exception may bring down one JVM and affect other topologies’ execution
    • Heron: each container runs the same task (for debug visibility)
  • Each tuple has to pass four threads (for queuing) in Storm
  • backpressure: Heron would automatic adjust network throughput to prevent from making no progress
    • Storm just drops tuples
  • one of task fails, entire tweet needs to be restarted
    • Storm may not make any progress while consuming all its resources
  • JVM GC generates high latencies larger than 1 minute

Components

  • Topology Master: managing the topology throughout its existence. It provides a single point of contact for discovering the status of topology. ~ YARN’s Application Master
  • Stream Manager: manage the routing of tuples

Reference