University of Pittsburgh | Pitt Home | Pitt Home | Find People | Contact Us Pitt Home |

From CMMS

Jump to: navigation, search

Markov Cluster


Contents

[edit] Access

Access to the Markov cluster is obtained by SSH to markov.cmms.pitt.edu.  Jobs are submitted using Torque PBS.

Submission scripts can be generated using

 
/usr/prog/bin/submit2pbs

This will generate a batch file which will then need to be modified before submitting with qsub.


[edit] Installed Software

O/S, Complilers, Tools

  • SuSE 10
  • Portland Group Compilers (6.0-5)
  • Intel Compilers (9 & 10)
  • Torque/MAUI (2.1.3/3.2.6p19)
  • LAM/MPI (7.0 & 7.1)
  • MPICH-1 (1.2.7)
  • MPICH-2 (1.0.7)
  • openMPI (1.2.8)


Applications

  • Gaussian 03
  • DFTB (26.11.1998)
  • Orient (4.5.04)
  • ABINIT (5.6.4)
  • Quantum Espresso (4.0.4)
  • Molpro (2008.1)
  • VASP (4.6.34)
  • TURBOMOLE (5.10)
  • QChem (3.1.0.2)
  • NAMD (2.6b2)
  • AMBER (10)
  • Molden (4.3)
  • CPMD (3.11.1)
  • CP2K/Quickstep (1.16)

[edit] Queue Specifications

The following queus are available for use. Please note that if you are only using a portion of the CPUs available to a node that you use a directly proportional amount of disk and memory. Please contact the administrator if you have jobs with special requirements.

[edit] small_io

      Location: Eberly Hall; node033-node064
      Configuration: 32 dual-processor 2.4 Ghz Opteron nodes
      Total of 1 GByte RAM, 58 GByte scratch disk per node

      Location: Benedum Hall; node001-node032
      Configuration: 32 dual-processor nodes (2 cores)
      Total of 1 GByte RAM, 58 GByte scratch disk per node
      Connected by Gigabit Ethernet

[edit] big_io

     Location: Eberly Hall; node065-node080
     Configuration: 16 dual-processor 2.4 Ghz Opteron nodes (2 cores)
     Total of 5 GBytes RAM, 120 GByte scratch disks per node
     Connected by Gigabit Ethernet

[edit] cmms_dc

     Location: Ebery Hall; node084-node088
     Configuration: 5 two dual-core 2.2 Ghz Opteron processor nodes (4 cores)
     Total of 4 GBytes RAM, 58 GByte scratch disks per node

[edit] eight_way

     Location: Eberly Hall; node100 and node101
     Configuration: 2 four dual-core 2.4 Ghz Opteron processor nodes (8 cores)
     Total of 8 GBytes RAM, 230 GByte scratch disk per node

[edit] verlet

     Location: Eberly Hall; node102-node125
     Configuration: 12 double dual-core 2.6 GHz Opteron nodes (4 cores)
     Total of 5 GBytes RAM, 51 GByte scratch disk per node
     Connected by InfiniBand

[edit] ib_big_mem

     Location: Eberly Hall; node126-node137
     Configuration: 12 two dual-core 2.6 GHz Opteron nodes (4 cores)
     Total of 9 GBytes RAM, 51 GByte scratch disk per node
     Connected by InfiniBand

[edit] verlet_jordan

     Location: Eberly Hall; node138-node149
     Configuration: 12 double dual-core 2.6 GHz Opteron nodes (4 cores)
     Total of 9 GBytes RAM, 51 GByte scratch disk per node
     Connected by InfiniBand

[edit] huge_io

     Location: Eberly Hall; node150-151
     Configuration: 2 dual-core 2.8 GHz Opteron nodes (4 cores)
     Total of 32 GBytes RAM, 804 GByte scratch disk per node

[edit] kohn_cmms

     Location: Eberly Hall; node152-node162, node187-node199
     Configuration: 24 double quad-core 2.66 GHz Xeon nodes (8 cores)
     Total of 8 GBytes RAM, 123 GByte scratch disk per node
     Connected by DDR InfiniBand

[edit] kohn

     Location: Eberly Hall; node163-node186
     Configuration: 24 double quad-core 2.66 GHz Xeon nodes (8 cores) 
     Total of 8 GBytes RAM; 123 GByte scratch disk per node 
     Connected by DDR InfiniBand

[edit] kohn_big_mem

     Location: Eberly Hall; node200-node217
     Configuration: 18 double quad-core 2.66 GHz Xeon nodes (8 cores)
     Total of 16 GBytes RAM, 123 GByte scratch disk per node
     Connected by DDR InfiniBand

[edit] MultiCore Opteron nodes

cmms_dc, eight_way, and verlet queues feature AMD's Multi-Core technology allowing one to use up to 8 processors via shared memory. The Multi-Core nodes are intended for highly parallel applications. Following the rules on the job submission section, jobs are submitted to these machines using the following directives

   #PBS -l nodes=1:ppn=4
   #PBS -q [cmms_dc/verlet]

   #PBS -l nodes=1:ppn=8
   #PBS -q eight_way

You can also run MPI applications using the above PBS directives and all processors will be utilized.


[edit] InfiniBand Interconnect

The compute nodes in the verlet queue are connected by low-latency InfiniBand interconnects.

To use the Infiniband network, applications have to be compiled with the proper libraries. Several programs have already been compiled to use the Infiniband network, including:

VASP 
vasppbs_ib.sh
Amber 
amberpbs_ib.sh
CPMD 
cpmd_ib.sh
DL_POLY 2.16 
dlpoly_ib.sh

Please contact the administrator if you have a software package that you would like recompiled for use on the Infiniband network.

Users who run code their own code in parallel on Markov can also recompile their code. The compilers for InfiniBand are at:

/opt/ofed/mpi/intel/mvapich-0.9.7-mlx2.2.0/bin/mpif90
/opt/ofed/mpi/intel/mvapich-0.9.7-mlx2.2.0/bin/mpicc

There is a script for preparing jobs to run Infinibands through the queue system called mkib2pbs.sh. This script will set up your jobs to use the correct parallel environment for running.

Please contact the administrator if you have any questions about using the Infiniband network.


[edit] Job Limits

  • Serial Jobs (single CPU) running on the small_io queue should be limited to 450 MB RAM and 28 GByte disk.
  • Serial Jobs (single CPU) running on the big_io queue should be limited to 2.4 GBytes RAM and 60 GB of scratch.
  • Parallel jobs run on the small_io queu should be limited to 16 CPUs.  These can be spread over 16 nodes, or the user can specify that two CPUs are used per node, in which case a maximum of 8 nodes can be used.  In any case, the user should be careful to keep memory usage to under 450 MByte per CPU and disk usage to under 58 GByte per CPU.
  • Parallel jobs run on the big_io queue should be limited to 8 CPUs.  These can be spread over 8 nodes, or the user can specify that two CPUs are used per node.  The user should keep memory usage to under 2.4 GBytes per CPU and disk usage to under 60 GBytes per CPU.
  • Parallel jobs run on the cmms_dc and verlet queue can use up to 20 CPUs across 5 nodes with 4 GBytes of memory on each node.


[edit] Special Considerations

Several applications including Gaussian03, Molpro 2002.6, VASP, Jaguar, and Turbomole have been installed on the cluster.  Before running any of these in parallel, it is important that the user be familiar with how the code is parallelized and which features in the code are parallelized. For example, we are running the shared memory version of Gaussian 03 which can use only the CPUs on a specific node. To properly run Gaussian03 on these machines, prepare your input with the following line in the the Link 0 Commands section:

   %Nproc=4 for 4 cores
   
   %Nproc=8 for 8 cores
Views
Personal tools