From CMMS
Markov Cluster
Contents |
[edit] Access
Access to the Markov cluster is obtained by SSH to markov.cmms.pitt.edu. Jobs are submitted using Torque PBS.
Submission scripts can be generated using
/usr/prog/bin/submit2pbs
This will generate a batch file which will then need to be modified before submitting with qsub.
[edit] Installed Software
O/S, Complilers, Tools
- SuSE 10
- Portland Group Compilers (6.0-5)
- Intel Compilers (9 & 10)
- Torque/MAUI (2.1.3/3.2.6p19)
- LAM/MPI (7.0 & 7.1)
- MPICH-1 (1.2.7)
- MPICH-2 (1.0.7)
- openMPI (1.2.8)
Applications
- Gaussian 03
- DFTB (26.11.1998)
- Orient (4.5.04)
- ABINIT (5.6.4)
- Quantum Espresso (4.0.4)
- Molpro (2008.1)
- VASP (4.6.34)
- TURBOMOLE (5.10)
- QChem (3.1.0.2)
- NAMD (2.6b2)
- AMBER (10)
- Molden (4.3)
- CPMD (3.11.1)
- CP2K/Quickstep (1.16)
[edit] Queue Specifications
The following queus are available for use. Please note that if you are only using a portion of the CPUs available to a node that you use a directly proportional amount of disk and memory. Please contact the administrator if you have jobs with special requirements.
[edit] small_io
Location: Eberly Hall; node033-node064
Configuration: 32 dual-processor 2.4 Ghz Opteron nodes
Total of 1 GByte RAM, 58 GByte scratch disk per node
Location: Benedum Hall; node001-node032
Configuration: 32 dual-processor nodes (2 cores)
Total of 1 GByte RAM, 58 GByte scratch disk per node
Connected by Gigabit Ethernet
[edit] big_io
Location: Eberly Hall; node065-node080
Configuration: 16 dual-processor 2.4 Ghz Opteron nodes (2 cores)
Total of 5 GBytes RAM, 120 GByte scratch disks per node
Connected by Gigabit Ethernet
[edit] cmms_dc
Location: Ebery Hall; node084-node088 Configuration: 5 two dual-core 2.2 Ghz Opteron processor nodes (4 cores) Total of 4 GBytes RAM, 58 GByte scratch disks per node
[edit] eight_way
Location: Eberly Hall; node100 and node101 Configuration: 2 four dual-core 2.4 Ghz Opteron processor nodes (8 cores) Total of 8 GBytes RAM, 230 GByte scratch disk per node
[edit] verlet
Location: Eberly Hall; node102-node125
Configuration: 12 double dual-core 2.6 GHz Opteron nodes (4 cores)
Total of 5 GBytes RAM, 51 GByte scratch disk per node
Connected by InfiniBand
[edit] ib_big_mem
Location: Eberly Hall; node126-node137
Configuration: 12 two dual-core 2.6 GHz Opteron nodes (4 cores)
Total of 9 GBytes RAM, 51 GByte scratch disk per node
Connected by InfiniBand
[edit] verlet_jordan
Location: Eberly Hall; node138-node149
Configuration: 12 double dual-core 2.6 GHz Opteron nodes (4 cores)
Total of 9 GBytes RAM, 51 GByte scratch disk per node
Connected by InfiniBand
[edit] huge_io
Location: Eberly Hall; node150-151
Configuration: 2 dual-core 2.8 GHz Opteron nodes (4 cores)
Total of 32 GBytes RAM, 804 GByte scratch disk per node
[edit] kohn_cmms
Location: Eberly Hall; node152-node162, node187-node199
Configuration: 24 double quad-core 2.66 GHz Xeon nodes (8 cores)
Total of 8 GBytes RAM, 123 GByte scratch disk per node
Connected by DDR InfiniBand
[edit] kohn
Location: Eberly Hall; node163-node186
Configuration: 24 double quad-core 2.66 GHz Xeon nodes (8 cores)
Total of 8 GBytes RAM; 123 GByte scratch disk per node
Connected by DDR InfiniBand
[edit] kohn_big_mem
Location: Eberly Hall; node200-node217
Configuration: 18 double quad-core 2.66 GHz Xeon nodes (8 cores)
Total of 16 GBytes RAM, 123 GByte scratch disk per node
Connected by DDR InfiniBand
[edit] MultiCore Opteron nodes
cmms_dc, eight_way, and verlet queues feature AMD's Multi-Core technology allowing one to use up to 8 processors via shared memory. The Multi-Core nodes are intended for highly parallel applications. Following the rules on the job submission section, jobs are submitted to these machines using the following directives
#PBS -l nodes=1:ppn=4 #PBS -q [cmms_dc/verlet] #PBS -l nodes=1:ppn=8 #PBS -q eight_way
You can also run MPI applications using the above PBS directives and all processors will be utilized.
[edit] InfiniBand Interconnect
The compute nodes in the verlet queue are connected by low-latency InfiniBand interconnects.
To use the Infiniband network, applications have to be compiled with the proper libraries. Several programs have already been compiled to use the Infiniband network, including:
- VASP
- vasppbs_ib.sh
- Amber
- amberpbs_ib.sh
- CPMD
- cpmd_ib.sh
- DL_POLY 2.16
- dlpoly_ib.sh
Please contact the administrator if you have a software package that you would like recompiled for use on the Infiniband network.
Users who run code their own code in parallel on Markov can also recompile their code. The compilers for InfiniBand are at:
/opt/ofed/mpi/intel/mvapich-0.9.7-mlx2.2.0/bin/mpif90 /opt/ofed/mpi/intel/mvapich-0.9.7-mlx2.2.0/bin/mpicc
There is a script for preparing jobs to run Infinibands through the queue system called mkib2pbs.sh. This script will set up your jobs to use the correct parallel environment for running.
Please contact the administrator if you have any questions about using the Infiniband network.
[edit] Job Limits
- Serial Jobs (single CPU) running on the small_io queue should be limited to 450 MB RAM and 28 GByte disk.
- Serial Jobs (single CPU) running on the big_io queue should be limited to 2.4 GBytes RAM and 60 GB of scratch.
- Parallel jobs run on the small_io queu should be limited to 16 CPUs. These can be spread over 16 nodes, or the user can specify that two CPUs are used per node, in which case a maximum of 8 nodes can be used. In any case, the user should be careful to keep memory usage to under 450 MByte per CPU and disk usage to under 58 GByte per CPU.
- Parallel jobs run on the big_io queue should be limited to 8 CPUs. These can be spread over 8 nodes, or the user can specify that two CPUs are used per node. The user should keep memory usage to under 2.4 GBytes per CPU and disk usage to under 60 GBytes per CPU.
- Parallel jobs run on the cmms_dc and verlet queue can use up to 20 CPUs across 5 nodes with 4 GBytes of memory on each node.
[edit] Special Considerations
Several applications including Gaussian03, Molpro 2002.6, VASP, Jaguar, and Turbomole have been installed on the cluster. Before running any of these in parallel, it is important that the user be familiar with how the code is parallelized and which features in the code are parallelized. For example, we are running the shared memory version of Gaussian 03 which can use only the CPUs on a specific node. To properly run Gaussian03 on these machines, prepare your input with the following line in the the Link 0 Commands section:
%Nproc=4 for 4 cores %Nproc=8 for 8 cores