This is a short tutorial on how to use the secluster-02. The Cluster consists of
The cluster works with queue’s (just like we like to simulate so much:)). The jobs in the queue will be distributed as evenly as possible if the parameters of the jobs allow it. This example uses 1 CPU core per job, which implies that 80 jobs can run simultaneously.
Each simulation has to be submitted into the queue separately. The easiest way to start these simulations is to use Python to submit these jobs. Detailed documentation about the commands you should use (
qdel, etc.) can found in the Rocks 4.3 user guide, section 2.2.
Warning: Each computation node has 2 Dualcore cpu's. This means four jobs can run on one machine simultaneously. Since the stochasticity in chi depends on the moment of starting the simulation, two or more jobs can produce exactly the same result! This is because stochastic functions in chi produce identical results if two simulations are started at the same instance (they have the same seed). To solve this problem a short delay has to be built in.
An alternative solution would be to give each simulation a uniq start seed using the (I think)
I don't know how to use this option, ping localhost worked fine too [EvdRijt].
…. (more to come soon) …
In this example a simulation called chisim is started with parameters x and y. Herein runs x from 1 to 10 with steps of 1 and y from 10 to 20 with steps of 2. Since only a linux script textfile can be used to submit a simulation into the queue, a job script file must be generated for each simulation in the directory called
jobs. When the jobs scripts and the simulation output directories are made the jobs are submitted into the queue.
# usage: clus.py [# sims] # # Starts simulating [# sims] identical jobs # on cluster with parameters X and Y # # Warning: # Use with care! # # Created: Oct 11, 2006 V1 Emiel van de Rijt # Modified: # Oct 12, 2006 V2 Comments added import time, os, sys # The process which submits the job to the queue def smJob(intI, sX, sY): # The name of the job myJobName = "job." + sX + "-" + sY # the command to submit the job to the queue myCmd = "qsub -N " + myJobName + "." + str(intI) + \ " -e jobs/" + myJobName + ".err." + str(intI) + \ " -o jobs/" + myJobName + ".out." + str(intI) + \ " -q workq" \ " -l nodes=1:ppn=1,walltime=24:00:00" \ " jobs/" + myJobName + "." + str(intI) # -N: ???submit naar nodes # -e: write error to file # -o: write screen output to file # -q: queue name to submit to # -l: see manual orca # the actual job script file to execute # send the command os.system(myCmd) # ==================================================================================================== # Process to make the job scripts def mkJobs(intStart, intStop, sX, sY, outputdir): # read the path from the global variable global myPath # filename for the script file in the jobs folder myFOut = "job" # construct a name for the job myJobName = sX + "-" + sY for simnr in range(intStart, intStop): # create/open file for writing fOut = open(myPath + "jobs/" + myFOut + '.' + myJobName + '.' + str(simnr), 'w') # write the job script myCmd = "./chisim -e 1e3 " + \ sX + " " + \ sY + " " + \ outputdir + "/out." + sX + "-" + sY + "." + str(simnr) + ".txt" # simulation to run with end time # param 1 for the chi file # param 2 for the chi file # param 3 for the chi file # go to the work directory # make sure the jobs starts at a random time # print on screen the cmd to run # run the command myTxt = "cd " + myPath + "\n" + \ "ping -c 2 127.0.0.1\n" + \ "echo ran: " + myCmd + "\n" + \ myCmd + "\n" fOut.write(myTxt) fOut.close() # ==================================================================================================== # get current path myPath = os.getcwd() + "/" #print myPath # run x from 1 to 10 with steps of 1 xb = 1 # begin xe = 10 # end xs = 1 # step size # run y from 10 to 20 with steps of 2 yb = 10 # begin ye = 20 # end ys = 2 # step size # Read number of identical simulations # to do from the prompt intStart = 1 # take start + the first argument of the prompt intStop = intStart + int(sys.argv) # measure process time and wall time # set start time t0_process_time = time.clock() t0_wall_time = time.time() # make directory for job script files # in the current directory os.system("mkdir jobs") # do for all x for x in range(xb, xe+1, xs): # do for all y for y in range(yb, ye+1, ys): # convert to string sX = str(x) sY = str(y) # make output dir string outputdir = str("sim-" + sX + "-" + sY) # make the output directory # for a simulation output os.system("mkdir " + outputdir) # make the linux script files mkJobs(intStart, intStop, sX, sY, outputdir) # start submitting the jobs to the queue for simnr in range(intStart, intStop): smJob(simnr, sX, sY) # measure process time and wall time # calculate time print time.clock() - t0_process_time, "seconds process time" print time.time() - t0_wall_time, "seconds wall time"