Tuesday, December 6, 2011

Program: University of Delaware's New High Performance Computing Cluster
Speaker: Dr. Daniel J Grim, Chief Technology Officer,
               Information Technologies, University of Delaware


Background:

The Information Technologies (IT) group at the University of Delaware recently undertook to create a "Community Cluster" modelled after the experience at the Rosen Center for Advanced Computing at Purdue University.  During the process of inviting participation from researchers they simultaneously worked on developing a configuration for the cluster.  While the initial design included ten gigabit Ethernet for the cluster interconnect (presuming that to be a less costly technology) it was ultimately determined that QDR Infiniband could be used at a comparable or lesser cost.  The cluster is one of the first to take advantage of the newest AMD processor, code named Interlagos, which is built using AMD's new "Bulldozer" architecture.  In addition we purchased a Lustre file system with approximately 200 terabytes of storage. The cost of this initial cluster was subsidized by IT with "investors" paying just $3,000 per node for a system with a total value of $1.2 million.  The final configuration included 200 nodes and over 5,000 processor cores.


Meeting Notes:

Community Cluster Development at UD

The inspiration for the “Community” cluster idea came from the Purdue University Rosen computer center, which started with a single server cluster and built up to about 5 clusters and growing. The concept is to have a growth path to achieve a supercomputer facility as resources are available and requirements grow. UD will have it’s first cluster of this type available for use in January 2012. The current effort is best summarized by a poster which can be found at:
http://www.udel.edu/it/research/images/cluster_sc11exhibit.pdf

Dr. Grim outlined three areas of effort.
First, how to fund the computer equipment. Part of the story is that subscribers were sought for a 100 node network. This subscription effort was so successful that the final result was a 200 node network. 

Second; identifying, purchase and installation of the compter equipment. Plans were to have room for several more clusters in the future. The system was bid by Dell, HP and Penguin, with Penguin the winner on cost performance.

Third, the power requirements required a substantial upgrade to over 2 megawatts; 480V at 3000 amps. About half of this power was needed to run present and future computer systems, and half to cool the systems.