What is Byzantine Fault Detection?

security

#1

Distributed systems are subject to a variety of faults and attacks. A faulty/malicious node may exhibit arbitrary behavior. In particular, a faulty node may corrupt its local state and send arbitrary messages, including specific messages aimed at subverting the system. Many security attacks, such as censorship, freeloading, misrouting, and data corruption, can be modeled as Byzantine faults.

Systems can be protected with Byzantine fault tolerance (BFT) techniques, which mask a bounded number of Byzantine faults, e.g. using state machine replication. BFT is a very powerful technique, but it has its costs. In a practical system that needs to tolerate up to f concurrent Byzantine faults, BFT cannot be implemented with less than 3f+1 replicas. Moreover, BFT scales poorly to large replica groups; as more servers are added, the throughput of the system may actually decrease.

Byzantine Fault Detection is an alternative approach that aims at detecting rather than masking faulty behavior. With this approach, the system does not make any attempt to hide the symptoms of Byzantine faults. Rather, each node is equipped with a detector that monitors other nodes for signs of faulty behavior. If the detector determines that some node has become faulty, it notifies the application software, which can then take appropriate action. For example, nodes can cease to communicate with the faulty node; once all correct nodes have followed suit, the faulty node is isolated and the fault is contained. Lies (or faults) in Radix can be detected using merkle trees of commitments.

Read more about the case for Byzantine Fault Detection in this whitepaper.

Radix uses both BFT to defend against network splits/attacks and BFD to defend against malicious behavior like double spends, transaction suppressing, etc.