In this page you find a description of the cluster and a good practice section.
lphelc1a and lphelc1b
The lpehlc1a and lphelc1b machines are our two interactive nodes.
These two interactive nodes are for development and testing purposes.
Please run long or CPU-intensive jobs via the batch system.
The lphelcsrv2 is our head node.
It hosts the RAID6 home disk and manages the batch system.
There is rarely a need for a user to ever log into this machine.
The cluster has 20 identical workernodes which accept jobs through our batch system and two additional nodes for testing.
- 16 x Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
- Scientific Linux CERN SLC release 6.5 (Carbon) 64 bit
- 32 Gb RAM
- 600 Gb Scratch Disk
- Automatically mounted /home and /panfs dirs
Cluster Usage: good practices
- Avoid filling up the batch queue with many long jobs
- Submit smaller jobs if possible to make use of the many cores
- Run test jobs on the interactive nodes and test machines beforehand to time job execution.
- Submit jobs with adequate and accurate requests for resources.
- Run everything local to a node.
- Be aware that requesting large amounts of walltime will prevent your jobs from running in some situations.
- Store large datasets on the shared disks.
Avoid Filling the Batch Queue with Long Jobs
The batch fairshare is configured such that the resources available are allocated to each user equally. This does not mean that you can only use 1/Nth of the available resources. If there are only a few people running jobs, then you will be able to use a larger share of the resources. This can result in one person being able to use all the resources at once.
Seconds before you submit your jobs, someone saturates the queue with their jobs such that there are no resouces left. Now your jobs won’t be able to run until their jobs are finished. So if you think you are going to send a lot of jobs just care to exclude a couple of nodes which can be used by others. Also see the next point.
Submit Smaller Jobs if Possible
The way we generally run our jobs is like this:
- Test job on interactive node for a small sample.
- Job works and is ready for running on the batch system.
- Submit many jobs at once, enough to generate your MC or completely run over your dataset.
- Get back output and analyse results.
We don’t really want to have to limit the number of jobs people can submit at once. But problems can occur if people take all resources for a long time.
If you have a very large number of jobs or a large dataset you need to analyse, then it is a good idea to make your jobs run for a few hours at most. This will allow for a better usage of resources and a better sharing of priorities. This may mean that you have to submit more jobs with each job running over fewer files or creating less events but it means that you will not block other people’s jobs for too long!
Run Test Jobs Beforehand
The batch system only knows what resources your jobs will need if you tell it.
This is achieved via the sbatch flags or in the jobs script itself.
Examples of how to set your required resources can be found here.
Try to be as accurate as possible since this means the throughput of the system can be optimised.
One of the main resources you will need to calcluate is walltime, i.e. the time a job will take to execute in then real-world.
You can use the “time” command to do this:
the output should look something like this:
<commandOutput> real 0m3.235s user 0m0.101s sys 0m1.078s
Record the “real” value and then run the test job again but with more events/more files.
After running a few test jobs you should have an idea about how the execution time of your job scales with the number of events or number of files.
Use this information to calculate how long a typical job should take then add 10%, just to be safe.
Run Everything Local to a Node
By “local” we mean that you should minimise all net traffic between machines. The /home disk is mounted on the lphelcsrv2 machine so constantly reading/writing files in your home area is discouraged. The preferred method is to use the /scratch directory of the node you are running on.
Note that slurm expects a shared filesystem so I/O intensive jobs should really write their data to the panfs filesystem.
In order to do this effectively, you will need to create a unique directory for each job you submit. Do not worry – you can easily do this from within the job script itself with the following few lines in bash:
#!/bin/bash MYID=`echo $PBS_JOBID | cut -f 1 -d .` WORKDIR=/scratch/$USER/$MYID mkdir -p $WORKDIR