In 2006, while at NetApp, I remember with horror the launch of Isilon.
Isilon’s product was everything Clustered ONTAP – aka GX – wanted to be.
One of the most intriguing aspects of the product was the use of Reed-Solomon codes to cut the amount of storage required. The downside, of course, was that rebuild was a bitch. The rebuild was so painful, that although the tech was interesting, our senior most architects were dismissive of the value.
They believed that the clustered storage solution and a clustered file system would deliver superior availability with better cost and faster rebuilds. Or something like that, I must admit that I have forgotten the details of the debates and don’t feel like pulling remembering everything.
The market failure of Reed Solomon codes, more or less convinced me that the right answer for the foreseeable future was 2x the storage costs.
And then I read this:
http://storagemojo.com/2013/06/21/facebooks-advanced-erasure-codes/
That is a nice summary of this paper: http://anrg.usc.edu/~maheswaran/Xorbas.pdf
This is a huge result. What it suggests is that storage availability is no longer tied to 2x the storage infrastructure without taking an unacceptable hit on recovery.
A new file system that embraces this kind of encoding could be a good solution for a large class of applications that don’t need the RTO of 2x the storage. Making storage cheaper has always been a winning strategy for growing market share.
A new clustered file system built around this kind of erasure code or even a variety of erasure codes could be a significant new addition to the tech eco-system.
I wonder if something built ground up would look very different from adapting an existing system.