Tech

Feature

Donald Becker and the Birth of the Beowulf Cluster

April 13, 2005

A profile of Beowulf cluster software creator Donald Becker

This interview originally appeared in Government Computer News

In early 1993, then-NASA employees Donald Becker and Thomas Sterling devised a way to yoke multiple low-cost desktop computers together so they could offer the combined performance of a much higher-cost supercomputer.

Twelve years later, Becker -- who has since left NASA and founded cluster software maker Sclyd Software -- can take a degree of pride in the fact that more than 50 percent of the machines on the Top 500 List of supercomputers are clusters of this sort. And many on the list are built on Becker and Sterling's own specific architecture, the Beowulf cluster.

In addition to co-developing Beowulf, Becker has also been one of the major contributors to Linux, contributing over 60 device drivers to the open-source operating system.

Although Becker still stays abreast of the Beowulf community -- he regularly attends the Baltimore-Washington Beowulf User Group meetings' -- he mostly keeps busy as chief scientist for Scyld of Annapolis, Md. Scyld, which Becker formed in 1998, is now a subsidiary of scalable-computer vendor Penguin Computing Inc. of San Francisco.

While you can build your own Beowulf cluster -- visit www.beowulf.org -- Scyld offers software to manage clusters in an enterprise setting.

Becker has a bachelor's degree in electrical engineering and computer science from the Massachusetts Institute of Technology.


A profile of Beowulf cluster software creator Donald Becker

GCN: Prior to creating Beowulf, what did you do at NASA?

Becker: I actually moved over from the Institute of Defense Analysis Supercomputer Research Center, which was essentially doing research for the National Security Agency. When I wasn't able to successfully start a project there, the co-founder of the Beowulf Project, Thomas Sterling, found funding through NASA. So I moved over to NASA's Center of Excellence in Space Data and Information Sciences, which is run by the University Space Research Association. So basically I moved from NSA funding to NASA funding, in both cases through a nonprofit institute.

NASA was interested [in the project] primarily for modeling climate data and processing sensory information. The clusters were to supplement supercomputers.

GCN: How did you and Sterling come up with the idea of a cluster?

Becker: I had worked on parallel processing, especially shared-memory parallel processing, since I was at the Massachusetts Institute of Technology. I worked on distributed computing with tightly coupled shared-memory machines. I felt those machines were expensive and tended to lag on the leading edge. The leading edge was starting to curve towards personal computers. Previously, you would find the best price-performance [ratio] at the very high end, with the supercomputers. But that was becoming increasingly less true. Workstations had the lead for a while, but clearly PCs were starting to offer the best price-performance.

For the very scientific uses, the only thing people cared about was how many computing cycles they could get out of the machine. So then the key element became how to put these machines together to make a more powerful machine.

From a price-performance curve, it was obvious to me, but so many people in the high-performance computing community completely rejected the idea without even considering it. In the government world, there was resistance against even small-scale funding of efforts.

GCN: Why did you choose the name Beowulf (hero of the 11th-century epic poem of the same name)?

Becker: The credit goes to Sterling, who is an Anglophile. Beowulf is the oldest written English. Some translations have the line 'Because my heart is pure, I have the strength of a thousand men.' With the Beowulf project, we were trying to follow the Linux model for development'not just to build one piece of software for one machine, but build a community effort.

GCN: What is the difference between a Beowulf cluster and other multiprocessor systems, such as failover clusters and symmetric multiprocessing (SMP) machines?

Becker: I think a lot of people don't really have a good definition of what a cluster is. To me, a cluster combines independent machines, machines capable of standalone operation, into a unified system, using a combination of software and networking.

Beowulf clusters are scalable performance machines. Failover clusters offer higher reliability with the unified system. SMP is a machine designed generally within one chassis, which has a number of tightly integrated processors. With a cluster, you have the opportunity to incrementally scale, where an SMP is generally built to a [preconfigured] size.

GCN: Were you surprised by the success of Beowulf?

Becker: I was surprised both by the success of Linux and the success of Beowulf clusters. In both cases, we were trying to influence the world, but not necessarily to directly succeed. If by providing examples, we helped other people to build better computer systems, that would have been enough to call it a success.

Interview by Joab Jackson

Back