About the Author

Douglas EadlineDouglas Eadline PhD, is both a practitioner and a chronicler of the Linux Cluster HPC revolution. He has worked with parallel computers since 1988 and is a co-author of the original Beowulf How To document.  Prior to starting and editing the popular http://clustermonkey.net web site in 2005, he served as Editor-in-chief for ClusterWorld Magazine. He is currently Senior HPC Editor for Linux Magazine and a consultant to the HPC industry. Doug holds a Ph.D. in Analytical Chemistry from Lehigh University and has been building, deploying, and using Linux HPC clusters since 1995.

High Performance Computing

A blog about making HPC things (kind of) work
I have been helping a user install some new NVidia GPU hardware. The plan is to accelerate a molecular dynamics program (Amber) on this new system. Seems simple enough. I have read about the great success with Nvidia GPUs and molecular dynamics codes. I have some thoughts on the whole effort, but first, let's take a look at the installation process.

Beyond the hardware, you need various software pieces to get your applications running on GPUs. The first is you need is the latest Nvidia Cuda Tool Kit. Finding, downloading and installing this was straight forward. Next you need a modified version of Amber to use the Nvidia GPU(s). Amber is not freely available, but is delivered in source code. Most universities have a license and thus obtaining Amber is not big issue. You also need the latest version of Amber Tools. Once you have these two packages extracted there is a bunch of patching that needs to be done so that you can configure and compile for the NVidia Cuda hardware. (Note: I will be writing this procedure up and posting it on Cluster Monkey in the near future. After some false starts I got everything working and ran the test programs. Except for a few differences in the 3rd and 4th decimal place the Cuda version passed all the tests.

So far so good. The next step was to test the other GPUs in the machine (there are a total of four). To do this with Amber, you must run Amber in parallel using MPI. I downloaded and installed an MVAPICH2 rpm, which was recommend in the Amber Cuda documentation. The short story is that the downloaded binary version expected to find an InfiniBand (IB) card in the system and would not run otherwise. This system is a standalone unit and as such had no IB hardware present. After some further reading, I noticed the latest version of MVAPICH2 had some optimizations for local and remote data transfers on NVidia cards. After downloading and proper configuration I was able to produce a version of MVAPICH that does not require IB hardware. The Amber MPI tests all passed and I finally had a working system.

My intention for describing this lengthy process was not to provide a recipe for installing Amber on stand alone NVidia Cuda boxe (as mentioned that will be posted elsewhere), but rather to demonstrate and example of state-of-the-art in HPC software installation. It is not for faint of heart. I have considerable experience building and installing HPC software and found the process more complicated that it needs to be. My experience with MPICH2 was helpful in configuring MVAPICH2, however, I consider myself a special case. The average user would probably have a very difficult time with whole process. A systems administrator, may have less of problem getting things installed, but there are many "gotcha's" in the whole process.

Of course one may suggest that the hardware integrator should do much of this work. Don't count on it. The only other choice is a consultant, which is a valid solution, but places a large barrier to entry for many people. The very first PC's were similar. You really did need a consultant to get business software working on the the new putty colored box. Contrast that situation with the ability to download and run applications on a smart phone or tablet. We still have a ways to go in the HPC world.

A blog about making HPC things (kind of) work
There was a recent announcement from Tilera about the availability of their manycore Gx chips. They are touting the slogan "Manycore without Boundaries." The term manycore means, 16, 36, 64, or 100 cores in a square mesh (at least to Tilera). This development is interesting although not pointed at the HPC market.

Read more...

A blog about making HPC things (kind of) work
In a previous post, I pointed out cases where a single socket server is better than a multiple socket server for HPC applications. This conclusion is based on real testing and some hands-on experience with single socket systems. Of course your mileage may vary, but in my tests, the most economical and efficient systems seem to be those that use a single socket. Of course there is some cost amortization with the multiple socket nodes in terms of power supplies, cases, fans etc., but single socket processors and motherboards always cost less than their big multi-socket brothers and sisters.

Read more...