School of Computer Science
1, Cleveland St.,
NSW 2006 Sydney
Follow @VincentGramoli
19 Nov 2025 - Vincent
This year, the reproducibility crisis was exacerbated as researchers found that only 50% of 30 highly cited AI studies were reproducible [1]. By contrast, we made sure that all our experiments showing the superior fault tolerance of Redbelly are judged reproducible by an independent committee of scientists.
Reproducibility, the ability for independent researchers to run the same methods and obtain the same results, is one of the bedrock principles of the scientific advances. Without it, there is no way to understand the factors that lead to the observed results. If a result cannot be reproduced, then what produced it?
The race to the most efficient blockchain has led blockchain companies to claim much higher performance than what they could achieve in realistic settings [2]. Most results were communicated through online announcements with limited scientific publications, if any, to back them up. And there are barely any results communicated when it comes to their reliability. Finally, we are not aware of any of these experiments that were judged reproducible.
If this trajectory continues, the consequences are easy to imagine:
To examine this problem rigorously, we developed Stabl, a framework we designed to evaluate blockchain fault tolerance under realistic and adversarial conditions. I introduced Stabl in an earlier post and the takeaway is straightforward: Redbelly tolerates crashes, churn, partitions, and malicious failures more effectively than the other blockchains evaluated, including Algorand, Aptos, and Solana.
| Blockchain | Attack | Crash | Churn | Partition |
|---|---|---|---|---|
| Algorand | ✓ | ✓ | ✓ | ❌ |
| Aptos | ❌ | ✓ | ❌ | ❌ |
| Avalanche | ✓ | ❌ | ❌ | ❌ |
| Solana | ✓ | ❌ | ❌ | ❌ |
| Redbelly | ✓ | ✓ | ✓ | ✓ |
Not only were these results peer-reviewed before being accepted for publications in the proceedings of the ACM/IFIP Middleware conference 2025 [3], but the published experimental results have just been independently judged available, functional and reproducible by an independent artifact evaluation committee of scientists. The scientific article will be presented in the US in December 2025.
Blockchain is transitioning from an experimental technology to a foundational infrastructure. It now underpins assets, markets, and systems that cannot afford ambiguity. If reproducibility becomes optional, or worse, ignored, the industry risks building systemic infrastructure on unverifiable claims. This is a structural danger. Reproducibility must become a baseline expectation for blockchain research and performance reporting. Not a “nice to have”. A requirement.
If this industry wants to scale, it must be scientifically rigorous. And if it wants to support real-world systems, it must publish results that others can re-run, re-test and independently verify.
Redbelly has taken that path. And I hope others do as well.
[1] The Unreasonable Effectiveness of Open Science in AI: A Replication Study. Odd Erik Gundersen, Odd Cappelen, Martin Mølnå, Nicklas Grimstad Nilsen. 39th AAAI Conference on Artificial Intelligence (AAAI), 2025.
[2] Diablo: A Benchmark Suite for Blockchains. V. Gramoli, R. Guerraoui, A. Lebedev, C. Natoli, G. Voron. 18th ACM European Conference on Computer Systems (EuroSys), 2023.
[3] STABL: The Sensitivity of Blockchains to Failures. V. Gramoli, R. Guerraoui, A. Lebedev, G. Voron. 26th ACM/IFIP International Middleware Conference (Middleware), 2025.