Monthly Archives: July 2013

NetApp Did It – follow up to post on Facebook recovery codes

One of my favorite South Park episodes of all time is “The Simpson’s did it”. In that episode the creators of South Park enumerate every stupid asinine plot device the Simpsons ever used.

In the tech industry in the 1990’s we had a saying “IBM did it”. In other words, no matter how great an idea you think you might have had, IBM had already done it.

So I was tremendously gratified when a buddy of mine pointed out that NetApp did it in response to the Xorbas paper.

Here’s the patent and the key elements of the paper:

 “Specifically, a single diagonal parity block is calculated across a series of row parity groups. Upon the failure of any data blocks, each in an independent row parity group, the invention enables recovery of the data blocks using local row parity. Upon the failure of any two blocks within a single parity group, the invention facilitates recovery using a combination of local row parity and global diagonal parity calculations. Advantageously, the present invention allows more efficient (and easier) recovery of single failures in an array configured to enable recovery from the concurrent failure of two storage devices (disks) within a sub-array of the array. “

The good news is that the all of the the patent claims are written in terms of row and diagonal parity, protecting against exactly two failures.

Intriguingly when NetApp was looking at this technology we were focused on how do you make disks more reliable. What makes the Xorbas work fascinating is that they focused on how to make bricks (disk+compute) more reliable.

One of the objections to bricks, at the time, was that it was a very expensive product given the cost and performance of disk, CPU and memory. That kind of architecture would make sense if the compute was significantly more intensive than just file system operations. Otherwise you were spending a lot of money to add a small amount of disks compared to adding a shelf.

And the reality is that Isilon wasn’t quite as big a success as NetApp feared.

As a modest reflection on the past, at the time, I remember folks wondering how we could bring compute closer to the disks. And we had these ideas about vmware and virtualization and missed this whole big data thing entirely.

What we really missed was that the database layer was going to change radically to a scaleout architecture where a clustered file system that made disk more cost-effective and reliability more cost-effective would be tremendously valuable.

But we didn’t. And I was there, involved in those discussions and I was as blind to the opportunity as anybody else.

We were too focused on disk drives, not on the changing nature of compute. And missed how our technology could apply to that changing nature of compute.

So like IBM, NetApp did it and someone else will capitalize on it.



Time to invent a file system

In 2006, while at NetApp, I remember with horror the launch of Isilon.

Isilon’s product was everything Clustered ONTAP – aka GX – wanted to be.

One of the most intriguing aspects of the product was the use of Reed-Solomon codes to cut the amount of storage required. The downside, of course, was that rebuild was a bitch. The rebuild was so painful, that although the tech was interesting, our senior most architects were dismissive of the value.

They believed that the clustered storage solution and a clustered file system would deliver superior availability with better cost and faster rebuilds. Or something like that, I must admit that I have forgotten the details of the debates and don’t feel like pulling remembering everything.

The market failure of Reed Solomon codes, more or less convinced me that the right answer for the foreseeable future was 2x the storage costs.

And then I read this:

That is a nice summary of this paper:

This is a huge result. What it suggests is that storage availability is no longer tied  to 2x the storage infrastructure without taking an unacceptable hit on recovery.

A new file system that embraces this kind of encoding could be a good solution for a large class of applications that don’t need the RTO of 2x the storage. Making storage cheaper has always been a winning strategy for growing market share.

A new clustered file system built around this kind of erasure code or even a variety of erasure codes could be a significant new addition to the tech eco-system.

I wonder if something built ground up would look very different from adapting an existing system.


Starcraft II – Tactical depth

Been playing a lot of Starcraft II. Long story, but I finally got a rig that can play at 60 FPS (there is something magical at 60 FPS)…

Obviously this game has ridiculous tactical and strategic depth for operational combat – a fixed theater with a fixed set of resources. Given the level of talent and expertise that is devoted to playing this game, could it be anything else.

What has been elusive, thus far, to me is that strategic and tactical depth. As a noob who hasn’t really played an RTS in almost 10 years, it’s interesting to see how the tactical depth reveals itself.

I’d share with folks what I’ve learned, but assume what an idiot noob discovers first, and you got what I’ve learned.

Impressive game starcraft ii, very impressive game.



Winning the long game, Microsoft, Nokia, and Windows 8.1

2008_nov_windows_1_0The release of Windows 8.0 was a bold statement about the future that I agree with. The future of computing is touch screen devices with optional keyboards. And that an operating system that can make both work will win.

At some level, there is a large group of smart folks who disagree with the idea that the square can be circled, that the future is discrete distinct devices with keyboards dying a slow miserable death.

The challenge is that the majority of work is data entry. A keyboard is used for most data entry. And the most efficient typing device is a mechanical keyboard.

So the keyboard will continue to have a place in the market.

In this future, an operating system that allows both touch and keyboard data entry allows application developers to decrease their R&D. Instead of trying to build two distinct applications one for touch and for keyboard, they can think of touch and keyboard as two distinct views into their same underlying application.

And it is that reduction in R&D that will make keyboard + touch screen devices win out. If you make it more efficient to build  solutions, then the cheapest solutions to build tend to win out over the long term. And if you are Microsoft you can burn through cash to win in the long haul (BING!)

And that brings me to Windows 8.0. Windows 8.0 sucked and was awesome at the same time. Windows 8.0 was awesome because it absolutely nailed some of the frustrations around windows and app discovery and it definitely got me wishing for a touch screen on my laptop. Windows 8.0 was horrible because there were so many distinct usability flaws. For example, the fact you had to use the keyboard and the mouse to find an app, the annoyingly difficult ability to get the search icon, and I could go on.

Windows 8.1 is an incremental improvement.

And that got me thinking about Windows 1.0. I am certain when Steve Jobs saw Windows 1.0 he thought: nothing to fear here. And I am certain the UNIX guys saw Windows 1.0 and said: Nothing to see here. And then Windows 2.0 shipped, and still nothing changed. And then Windows 3.0 and it almost got usable. And then Windows 3.1 and the world finally tilted in Microsoft’s favor.

I have a strong belief in the value of incremental improvement winning out over magical product discovery. And Microsoft has always nailed incremental product improvement when they are moving in the right general direction.

The improvements in Windows 8.1 are noticeable. Is it a great product? No. But it took Microsoft 7 years to build Windows 3.1 and it took them 15 years to get to Windows XP – and that was the first version of the OS that actually worked.

So what can get in the way?

The real challenge for Microsoft is not that the path they are on is wrong. The real challenge is that from 1985 to 2000 Microsoft was the destination for the best and the brightest in the tech industry. The question is whether they can continue to attract the best and the brightest who can build that transformation…

Not dead yet.

In 2007 I bet that Nokia could figure out this iOS thing. And I was wrong. Nokia spectacularly failed to recognize the disruptive nature of iOS, sat on their lead and is now trying to tell us that they are not dead yet. I figured that with all of those resources, a competent CEO, a competent CTO and a strong technical team would seize the moment and realize like the British did with the Dreadnought that everything had suddenly changed and their lead had evaporated. And, without a shadow of a doubt, their CEO was incompetent and their technical team for all of Nokia’s incredible technical talent was unable to react to the iPhone.

Success or failure, ultimately is a function of being able to attract talent, point them in the right direction and have the ability to course correct over time. For Microsoft, the direction is right, the ability to course correct was demonstrated, now all that remains is whether they can attract and retain the talent to win.