SAMC, memory, flash, networks failure

Use the knowledge of a system to craft the semantic could help to alleviate the state-space explosion in test cases
Future test works would highly relate to the pattern extraction, since it is still manually so far. And of course, we do not need low-level test engineer that much
- Future test engineers need to learn how to model the codes into a specific protocol, then generate tests from these protocols
  - This kind of test methodology seems to be more precise than fuzzy test and could give more understanding of why the codes fail

Nowadays, system designers do not need to worry about read disturbance, since the modern SSD has mitigated the problem of reading operation which breaks neighbour blocks
I am also think temperature is not a problem, since:
- Technician in data-center or hardware engineer may come up a better solution to lower the temperature
- I doubt the temperature sensor has not been widely supported by the OS, so that we could not take it into account

What are they looking at?

host-correctable errors: Logical correction could assist in SSD ECC uncorrectable
snapshot of SSD

Flash chips

read/write

Log-structured design
- program at end of log
- introduce new problem: Mapping Info: Logical -> Physical translations
  1. how to persistent? since it is in memory
    - cheap SSD would scan whole and build index - size: big SSD needs bigger storage - how to handle garbage (GC)

Hypothesis: SSD use everything firstly, then mark bad blocks, which lower failure rate

If the system is not using stronger ECC, like chipkill-correct ECC, you have nothing to prevent from the system failure. So one thing system designer should consider is how to recover from the fault-prone system, i.e., fail as soon as possible.
- So one thing you do not need to worry is to how to prevent system crash from memory error. If there is memory error, just let it goes

The paper does not prove the invalidity of redundancy network. Since all the problems come from either the wrong configuration of back-up network or from bugs in fail-over mechanism. This is totally the problem of the software
- So I think the worry of the efficiency of network redundancy is least important. We still need redundancy network
Another thing is the price or model of the ToR, it does not affect network performance too much

希言自然