As of the summer of 2007, we have three computer clusters (prion, madcow and poss) as well as a variety of workstations available for research within the Bowers' group.
We have an older 32 node cluster named "prion" that consists of 16 dual-Athlon (MP 1900+) nodes plus 16 dual-Athlon (MP 2400+) nodes. They all use Tyan S2466N motherboards. Each of the compute nodes is equipped with 512 Mbytes of RAM. They are connected via fast Ethernet using two 3COM SuperStack3 24-port switches (stacked via a proprietary cable/interface) to a head node, which is also a dual-Athlon (MP 1900+) node but with SCSI hard drives and 1 Gbyte of memory. The head node has a UPS (uninterruptible power supply) that can keep the head node up for about 20 minutes in the case of a brief power outage. Each of the compute nodes has a single IDE hard drive that contains the operating system as well as local scratch space for i/o intensive jobs. The compute nodes mount users' home directories (in /home) and various installed software (in /usr/local) over the network using NFS from the head node. All of the compute nodes are 2U rackmount nodes, which means they are 3.5 inches in height (each "U" equals 1.75 inches). They are all mounted in old 19" instrument racks that were scavenged within the Chemistry Department here at UCSB. This cluster runs RedHat 7.3, which is auto updated using YUM. Nodes can be completely reinstalled from the head node over the network using PXE and RedHat's kickstart. All of the computers in prion were purchased from Atipa in 2001 and 2002. Not surprisingly, some of the older nodes have broken down, and the total cluster size is now down to about 28 nodes. Despite their age, these nodes are still quite useful.
We have a "middle-aged" cluster named "madcow" that consists of 21 dual-Xeon (3.06 GHz) 1U rackmount nodes with Supermicro X5DPA-G motherboards. One serves as the head node with dual Gigabit Ethernet interfaces, two hard drives (the second drive just being a "mirror" (RAID 1) of the first drive to preserve data), and a DVD-ROM drive. The head node has a UPS that can support short power outages. The other 20 compute nodes each have a single Gigabit Ethernet port, one hard drive, and no floppy or CD drives as everything is installed over the network from the head node. All of the nodes in madcow have 1 Gbyte of memory. They are connected via an SMC 8524T 24 port Gigabit Ethernet switch. All of the compute nodes access /home and /usr/local via NFS mounts from the head node (just like the cluster prion). This cluster runs SuSE Pro 9.2 and is updated using SuSE's YaST software. The compute nodes can be quickly reinstalled using the SystemImager software included by the vendor. All of the nodes in madcow were purchased from Western Scientific in 2005.
Poss is our newest cluster, consisting of a head node (dual single-core Opteron 250's (2.4GHz)) and 13 compute nodes (dual dual-core Opteron 280's (2.4GHz)). This gives 52 processor cores total for the compute nodes. Including the head node and the UPS, this cluster only takes up 15U, or 26.25 inches of vertical rack space. Every node has 2 GBytes of RAM and two gigabit Ethernet ports. Similar to madcow, the head node has a DVD/CD drive for initial installation, while all the compute nodes have no removable drives and thus are dependent on the network for installation (via PXE). The nodes use Iwill DK8ES motherboards, and are connected via a Netgear GS524T switch. This cluster was purchased from Western Scientific in 2006. It came installed with Fedora Core, but was completely reinstalled with CentOS 4.3 in house. There were also serious overheating problems at first, but these were resolved by adding physical barriers between the cold (front) and the hot (rear) ends of the chassis. As it turns out, the attention to detail that the vendor puts into a server may be somewhat correlated to the selling price. However, after much initial hair pulling, this cluster has been running smoothly. These nodes are particularly nice as they allow running on four processor cores within a single node. This lets us run SMP Gaussian calculations on four cores as opposed to only two on prion or madcow.
We now have a single dedicated backup server for backing up user and system files for all three clusters. The backup server is connected via three separate gigabit Ethernet controllers to each of the three clusters' switches. The backup server is equipped with several hard drives, including two 500 Gbyte SATA drives. Backups of all users' files (everything in /home from all three clusters) are done nightly, with daily backups held for one week, and weekly backups saved for one month. These backups are needed in case of hard drive failure in one of the cluster head nodes, but they are especially useful for digging up old data files that are accidentally deleted. The backups are handled using a free software package named "rsnapshot". This is just a series of scripts that use rsync to copy only the changes to the various filesystems, rather than needing to copy over all users' files repeatedly every night. This saves disk space on the backup node as well as network bandwidth and the amount of time the backups are running every night.
There are various workstations available for general use within the group including five Windows XP computers, one Macintosh and an XP laptop that is used for presentations in group meetings and at conferences. Several people in the group also have their own dedicated workstations. We have access to several printers including two laser printers (one color and one black-and-white). Each of the instruments has its own PC used for controlling the various experiments and recording and visualizing data as well.
HyperChem 7.5 - Useful for building molecules, visualization, optimization. Our license allows up to five users to run this on PC's at one time. This program is very useful for building structures by hand, but there is no "Undo" button (save your work!) and the interface is a bit awkward. Still, this is often the best option for building difficult or unusual structures, and with some practice and patience, you can usually get what you want. One advantage is that this package can also do ab-initio, molecular mechanics and semi-empirical calculations, so you can do some first-order minimization from within HyperChem itself. We have a set of printed manuals, and additional information is available at http://www.hyper.com/. There is also a tutorial that can be accessed via the Help pulldown menu.
Origin 7.5 - Origin is an analysis program, and can be used for fitting data as well as making publication-quality plots and figures. Only one person can run this at a time because we only have a single seat license, but so far we haven't had any conflicts. More info is available here: http://www.originlab.com/ (including the manual as a download).
Molekel - This is but one of many molecular viewers available. It is an easy-to-use viewer that can be used to view Gaussian and Gamess output files as well as PDB and XYZ format files. It is particularly useful for viewing molecular orbitals from Gaussian log files without the need to separately generate "cube" files. The binary is available free, and information on it can be found at http://www.cscs.ch/molekel/. Unfortunately, development seems to have ceased at this time, though there are indications that it may become active again at some point…
VMD - Another molecular viewer, this one comes from the University of Illinois. The interface can seem confusing with its many sub-windows, but it has been around a long time and can read many formats. It also has a tcl interface (text based) that experienced users find very efficient for some things. It is good for viewing amino acid and nucleotide based structures. The home page is at http://www.ks.uiuc.edu/Research/vmd/.
UCSF Chimera - Another free viewer, this one is relatively recent (at least for our group). New users are advised to go through the tutorial. The payoff for learning how this program works is the possibility to make stunning color/3D pictures suitable for publications and presentations. It is able to read Amber trajectory files. See http://www.cgl.ucsf.edu/chimera/ for more info.
Chem3D/ChemDraw - Although we have rather out-of-date versions of these, they are still extremely useful. Chem3D is a very intuitive 3D viewer that can also be used to modify and build structures. ChemDraw is also very easy to use and is useful for making figures that include molecules (from Chem3D) as well as 2D chemical diagrams including "flying wedge" bonds and simple orbitals. Program info can be found at http://www.cambridgesoft.com/.
Molden - This is an older and relatively crude molecular viewer that can read Gaussian and Gamess files as well as PDB's. It has a nice facility for viewing the results of frequency calculations. See http://www.cmbi.ru.nl/molden/molden.html.
MCS-plus/DataView - These are the applications for working with data files recorded using the EG&G Ortec multi-channel scalar plugin card. The MCS-plus software is the same as that used to control the MCS cards when taking data on an instrument. On a workstation without a plugin card, it can be used to load data files, and manipulate them as on the instrument computers. The program DataView is a Windows program written in Germany and brought to us by Patrick Weis. It can read in MCS files and then be used to crop data and save it in columnar form suitable for Origin or other data analysis programs (including user written programs).
Remote X applications - All of the above-mentioned software is available natively in Windows; however, there are a variety of programs that can be run over the network on one of the Linux clusters and remotely displayed on a Windows desktop. In order for this to work, the Windows computer must be running an X server, and the X display must be connected to the remote Linux server. The connection to the X server is easily accomplished using the program SSH Secure Shell for which UCSB has a site license. For this to work, SSH tunneling has to be turned on in SSH Secure Shell. This setting can be found in the pulldown menu Edit > Settings under the heading "Profile Settings" > "Tunneling". Contact someone in the group for information on possible X servers available for Windows.
SSH Secure Shell - This is a very nice program for running a remote "shell" (command line session) on one of the Linux clusters. In addition to providing a means for relatively secure remote logins (which encrypts everything going over the network including passwords) it can automatically set up an X "tunnel", and also contains a very convenient Windows-like file transfer utility which can be opened up via the Window > New File Transfer pulldown menu. This is the most direct and most common way of interacting with the Linux clusters.
Others - There are many other molecular viewers and chemistry related programs available for Windows, many of them free. GOpenMol, Swiss PDB Viewer, PyMOL and ViewMol are just a few examples.
Amber - Amber is currently the most frequently used program in the Bowers' group. The main program in the Amber suite is "sander" which performs molecular dynamics and energy minimization. Other important Amber programs are xleap/tleap and antechamber. Xleap provides a graphic interface for building molecules in Amber, and tleap is the corresponding text version that is amenable to writing scripts (it is suggested to always use xleap when possible in order to visually verify one's work.) Antechamber is a relatively recent addition to Amber which can be used to convert files for use with Amber as well as automate the calculation of charges needed for custom residues/molecules (using the Amber program RESP). All users should be familiar with the Amber web page, http://amber.scripps.edu, as well as the manual available as a nicely indexed pdf document. The current version of Amber is version 8. There are currently several locally developed GUI's useful for running Amber calculations named xmin, xdyn and xanneal (collectively called xamber) which prepare scripts and input files for running minimizations, dynamics and simulated annealing, respectively, as well as submitting the calculations to the queuing system, and even running in parallel in the case of xanneal.
Gaussian03 - Gaussian is the most commonly used program for performing ab-initio calculations on the planet. Input files are easy to prepare, and a wide variety of calculations can be performed such as HF (Hartree Fock), DFT (various density functionals, often B3LYP), MP2, MP4, CI (configuration interaction) and others. Gaussian can be run on both processors on dual-cpu nodes. While there is some useful information at http://www.gaussian.com, users are advised to look at the Gaussian Users' Guide. After preparing a Gaussian input file named suffix.com, a queue submission script can be created using either 'g03script' (for single cpu jobs) or 'g2' (for 2-cpu jobs). The procedure is identical for either one: g2 filesuffix jobname, which produces a queue submission script named 'jobname.q' which is then submitted using qsub jobname.q. For dual-cpu jobs, remember to put the line "%nproc=2" in the input file or the second processor will not be used. These job scripts are really very simple, and the actual command launched is just 'time g03 suffix.com suffix.log'.
Jaguar - Although not used as often as Gaussian (possibly because people are so used to using Gaussian), Jaguar can also do ab-initio calculations. Jaguar jobs are normally prepared by running the GUI called maestro. Both Jaguar and maestro come from Schrodinger, Inc. Maestro can read in many different file formats, and the GUI makes setting up an input by hand often unnecessary since jobs can be submitted to the queue (on prion only) from the interface. The main advantage of Jaguar is that some calculations show a huge speedup compared to Gaussian, especially when utilizing the built-in basis sets that have been tuned for speed. It is possible to convert converged SCF's to a format that Gaussian can read in, making it possible to move work done in Jaguar over to Gaussian either for comparison with other work done with Gaussian, or just for doing things that Jaguar doesn't provide. In order to launch a Jaguar job by hand, the command to use is: jaguar run –PROCS 1 –HOST prionpbs –t jobname (the input file jobname.in must be in the current directory). Documentation can be found in /usr/local/doc on prion (files beginning with "j50" are Jaguar docs).
Maestro - The Schrodinger interface for use with Jaguar (only on prion). To launch the maestro interface, just type 'maestro' at the command line (an X server must be running and connected to the remote Linux machine). Then choose the pulldown option Applications > Jaguar to open the Jaguar window. Click the "read" button to read in a structure, noting that there is a substantial list of possible input formats available. Next choose a method, basis set, etc. and then either launch the job directly from the interface by clicking the Jobs: Run button or save the input file by clicking on the Save button. If you save the input file, make any changes that need to make, and then use the 'jaguar run' command given in the description of Jaguar, above. Documentation can be found in /usr/local/doc on prion (files beginning with mae51 are the Maestro docs).
Sigma - This is a Fortran program written in the Bowers' group (originally by Gert von Helden) that calculates the projection cross section of a molecule. The molecule of interest is rotated to a random orientation, and then the 2D area of projection is calculated using Monte Carlo integration. This process is repeated, and the average cross section is recalculated until it converges within a given tolerance. This is a very straightforward way to calculate the collision cross section of a molecule; however, values for the collision radius with the buffer gas (almost always Helium) must be given for all elements present in the input file. More information can be found on the Theoretical Collision Cross Sections page in the Theory/Analysis section.
Mobcal - This is another Fortran program for calculating cross sections from Martin Jarrold's group. In addition to the projection cross section (like sigma), mobcal can calculate cross sections using trajectories, using either a hard sphere or Leonard Jones potential, the latter being a very slow calculation. We have found that, for systems larger than about 200 atoms, sigma gives cross sections that are too small (compared with experiment) and trajectory calculations become necessary. More information can be found on the Theoretical Collision Cross Sections page in the Theory/Analysis section.
Xamber - This is a suite of three GUI's developed by John Bushnell for running Amber calculations. It is written in python which use the GTK graphics library via the python interface pygtk. The GUI xmin is for doing simple minimizations, xdyn is for doing dynamics (only), and xanneal is for doing simulated annealing. These user interfaces will create the necessary input files, job scripts and queue submission scripts. In the past, these were all done using c-shell scripts that tended to vary from user to user as well as input files which also tended to change over time from user to user and created errors and general confusion. Using these interfaces, settings can be changed for a particular run without creating multiple versions of scripts and input files. In cases where the user needs to change settings that are not presented in the interface, all of the input files can be generated via the File pulldown menu and then edited by hand before submission to the queue. Thus they present a unified resource for running the vast majority of jobs needed in the Bowers' group, but also allow customization when necessary. Also, it is no longer necessary to remember the exact procedure for setting up and running these types of jobs. All one usually needs to remember is to type 'xanneal' at the command line and modify any defaults within the interface. See the Xanneal Users' Guide.
Small programs and scripts
av/avin - These are tiny compiled Fortran programs that average a set of numbers. Typing 'av' will prompt you for a filename containing the data to average, and then prints the average. The program 'avin' does the same thing except it reads data from standard input, making it easier to use in scripting, etc. Note that these read the first number on each line, so if there is more than one number per line, data after the first is ignored.
babel - A conversion program for converting between different molecular file formats. Just type 'babel' to see the syntax and supported file formats.
crd2pdb - This converts Amber 'crd' (coordinate) files to pdb files. It assumes that the topology file is in the same directory and is named 'prm.top'. Usage: crd2pdb xxx.crd which will result in a file named xxx.pdb.
crd2pdb_multi - Like crd2pdb but will convert a set of files. Usage: crd2pdb_multi xxx 1 100 which will convert files xxx.1.crd, xxx.2.crd, …, xxx.100.crd to xxx.1.pdb, xxx.2.pdb, …, xxx.100.pdb
E - This pulls out the final energy for a group of Amber minimization outputs. Usage: E min 1 300 which will result in a file named min.energies with a list of final energies from files min.1.out, min.2.out, …, min.300.out.
g03script - This script prepares a Gaussian03 job file that can be subsequently submitted to the queue system. Usage: g03script basename jobname which runs Gaussian03 taking basename.com as the input file, and names the job script jobname.q. The job can then be run with the command qsub jobname.q.
g2 - This script is identical to the g03script described above, except that it requests two processors (on the same node) in order to run a two processor Gaussian job. Note that the basename.com input file must contain the line %Nproc=2 or else the second cpu will not be used (and will sit idle and wasted).
oi - This stands for "output info" and just prints a summary of energy and force/displacement cutoffs for a Gaussian optimization. Usage: oi basename.log.
joi - Just like 'oi' above, but works on Jaguar output rather than Gaussian.
stt - This calls a couple awk scripts plus a compiled filter to strip out coordinates from Gaussian log files. Usage: stt xxxxx.log [-1] [-n] where the optional –1 (minus one) flag will only print out the last set of coordinates found, and the optional –n flag suppresses the extra line between the number of atoms and the list of atom coordinates. Note that sigma and Chem3D do not want the extra line, while xyz input into Maestro/Jaguar does want the extra line. Also note that the optional flags come AFTER the filename unlike the standard command syntax. Yes, I'm too lazy to fix this (yet). Also note: the output is written to a file named "xyz.out" regardless of the name of the log file!
stt_in - This is identical to stt except that it strips out the "input orientation" from Gaussian log files as opposed to the "standard orientation". This may be needed for runs done with symmetry turned off. Like stt, the output is written to a file named "xyz.out".
shownode - This is a simple awk script that gives the same information as doing ‘pbsnodes –a' except the user specifies which node to report on. For example: shownode 32.
jag2xyz - This is similar to stt but for use with Jaguar output files (with the .out suffix). It calls an awk script and a compiled filter program and sends the output to a file with the same basename but the .out replaced with .xyz. The default behavior is to suppress the extra line between the number of atoms and the atom coordinates. The extra line can be added with the –x optional flag. Note that this behavior is opposite the default behavior for stt and stt_in. Usage: jag2xyz xxxxx.out [-1] [-x] (the output is named xxxxx.xyz).